Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation

Abstract Purpose This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy. Methods Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT...

Full description

Saved in:
Bibliographic Details
Main Authors: Suning Wang, Ying Wang, Linlin Jiang, Yong Chang, Shiji zhang, Kun Zhao, Lu Chen, Chunzheng Gao
Format: Article
Language:English
Published: BMC 2025-01-01
Series:European Journal of Medical Research
Subjects:
Online Access:https://doi.org/10.1186/s40001-025-02296-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585955506126848
author Suning Wang
Ying Wang
Linlin Jiang
Yong Chang
Shiji zhang
Kun Zhao
Lu Chen
Chunzheng Gao
author_facet Suning Wang
Ying Wang
Linlin Jiang
Yong Chang
Shiji zhang
Kun Zhao
Lu Chen
Chunzheng Gao
author_sort Suning Wang
collection DOAJ
description Abstract Purpose This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy. Methods Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT 4o and ChatGPT 4o mini. Five orthopedic surgeons assessed their responses using a 5-point Likert scale for accuracy and completeness, and a 7-point scale for reliability. Flesch Reading Ease scores were calculated to assess readability. Additionally, ChatGPT 4o analyzed lumbar images from 53 patients, comparing its recognizable agreement with orthopedic surgeons using Kappa values. Results Both models demonstrated strong clinical support capabilities with no significant differences in accuracy or reliability. However, ChatGPT 4o provided more comprehensive and consistent responses. The Flesch Reading Ease scores for both models indicated that their generated content was “very difficult to read,” potentially limiting patient accessibility. In evaluating lumbar disc herniation images, ChatGPT 4o achieved an overall accuracy of 0.81, with LDH recognition precision, recall, and F1 scores exceeding 0.80. The AUC was 0.80, and the Kappa value was 0.61, indicating moderate agreement between the model’s predictions and actual diagnoses, though with room for improvement. Conclusion While both models are effective, ChatGPT 4o offers more comprehensive clinical responses, making it more suitable for high-integrity medical tasks. However, the difficulty in reading AI-generated content and occasional use of misleading terms, such as “tumor,” indicate a need for further improvements to reduce patient anxiety.
format Article
id doaj-art-ebd1f70af56e43ecae508bc58c28c71a
institution Kabale University
issn 2047-783X
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series European Journal of Medical Research
spelling doaj-art-ebd1f70af56e43ecae508bc58c28c71a2025-01-26T12:21:38ZengBMCEuropean Journal of Medical Research2047-783X2025-01-013011910.1186/s40001-025-02296-xAssessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniationSuning Wang0Ying Wang1Linlin Jiang2Yong Chang3Shiji zhang4Kun Zhao5Lu Chen6Chunzheng Gao7Department of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityShandong UniversityDepartment of Orthopedics, Qilu Hospital of Shandong University, The Second Hospital of Shandong UniversityDepartment of Orthopedics, Qilu Hospital of Shandong University, The Second Hospital of Shandong UniversityDepartment of Orthopedics, Qilu Hospital of Shandong University, The Second Hospital of Shandong UniversityDepartment of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityDepartment of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityDepartment of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityAbstract Purpose This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy. Methods Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT 4o and ChatGPT 4o mini. Five orthopedic surgeons assessed their responses using a 5-point Likert scale for accuracy and completeness, and a 7-point scale for reliability. Flesch Reading Ease scores were calculated to assess readability. Additionally, ChatGPT 4o analyzed lumbar images from 53 patients, comparing its recognizable agreement with orthopedic surgeons using Kappa values. Results Both models demonstrated strong clinical support capabilities with no significant differences in accuracy or reliability. However, ChatGPT 4o provided more comprehensive and consistent responses. The Flesch Reading Ease scores for both models indicated that their generated content was “very difficult to read,” potentially limiting patient accessibility. In evaluating lumbar disc herniation images, ChatGPT 4o achieved an overall accuracy of 0.81, with LDH recognition precision, recall, and F1 scores exceeding 0.80. The AUC was 0.80, and the Kappa value was 0.61, indicating moderate agreement between the model’s predictions and actual diagnoses, though with room for improvement. Conclusion While both models are effective, ChatGPT 4o offers more comprehensive clinical responses, making it more suitable for high-integrity medical tasks. However, the difficulty in reading AI-generated content and occasional use of misleading terms, such as “tumor,” indicate a need for further improvements to reduce patient anxiety.https://doi.org/10.1186/s40001-025-02296-xChatGPTLumbar disc herniationClinical guidelinesArtificial intelligenceSpine
spellingShingle Suning Wang
Ying Wang
Linlin Jiang
Yong Chang
Shiji zhang
Kun Zhao
Lu Chen
Chunzheng Gao
Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
European Journal of Medical Research
ChatGPT
Lumbar disc herniation
Clinical guidelines
Artificial intelligence
Spine
title Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_full Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_fullStr Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_full_unstemmed Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_short Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_sort assessing the clinical support capabilities of chatgpt 4o and chatgpt 4o mini in managing lumbar disc herniation
topic ChatGPT
Lumbar disc herniation
Clinical guidelines
Artificial intelligence
Spine
url https://doi.org/10.1186/s40001-025-02296-x
work_keys_str_mv AT suningwang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT yingwang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT linlinjiang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT yongchang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT shijizhang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT kunzhao assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT luchen assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation
AT chunzhenggao assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation