Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation

Abstract Purpose This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy. Methods Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT...

Full description

Saved in:

Bibliographic Details
Main Authors:	Suning Wang, Ying Wang, Linlin Jiang, Yong Chang, Shiji zhang, Kun Zhao, Lu Chen, Chunzheng Gao
Format:	Article
Language:	English
Published:	BMC 2025-01-01
Series:	European Journal of Medical Research
Subjects:	ChatGPT Lumbar disc herniation Clinical guidelines Artificial intelligence Spine
Online Access:	https://doi.org/10.1186/s40001-025-02296-x
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832585955506126848
author	Suning Wang Ying Wang Linlin Jiang Yong Chang Shiji zhang Kun Zhao Lu Chen Chunzheng Gao
author_facet	Suning Wang Ying Wang Linlin Jiang Yong Chang Shiji zhang Kun Zhao Lu Chen Chunzheng Gao
author_sort	Suning Wang
collection	DOAJ
description	Abstract Purpose This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy. Methods Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT 4o and ChatGPT 4o mini. Five orthopedic surgeons assessed their responses using a 5-point Likert scale for accuracy and completeness, and a 7-point scale for reliability. Flesch Reading Ease scores were calculated to assess readability. Additionally, ChatGPT 4o analyzed lumbar images from 53 patients, comparing its recognizable agreement with orthopedic surgeons using Kappa values. Results Both models demonstrated strong clinical support capabilities with no significant differences in accuracy or reliability. However, ChatGPT 4o provided more comprehensive and consistent responses. The Flesch Reading Ease scores for both models indicated that their generated content was “very difficult to read,” potentially limiting patient accessibility. In evaluating lumbar disc herniation images, ChatGPT 4o achieved an overall accuracy of 0.81, with LDH recognition precision, recall, and F1 scores exceeding 0.80. The AUC was 0.80, and the Kappa value was 0.61, indicating moderate agreement between the model’s predictions and actual diagnoses, though with room for improvement. Conclusion While both models are effective, ChatGPT 4o offers more comprehensive clinical responses, making it more suitable for high-integrity medical tasks. However, the difficulty in reading AI-generated content and occasional use of misleading terms, such as “tumor,” indicate a need for further improvements to reduce patient anxiety.
format	Article
id	doaj-art-ebd1f70af56e43ecae508bc58c28c71a
institution	Kabale University
issn	2047-783X
language	English
publishDate	2025-01-01
publisher	BMC
record_format	Article
series	European Journal of Medical Research
spelling	doaj-art-ebd1f70af56e43ecae508bc58c28c71a2025-01-26T12:21:38ZengBMCEuropean Journal of Medical Research2047-783X2025-01-013011910.1186/s40001-025-02296-xAssessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniationSuning Wang0Ying Wang1Linlin Jiang2Yong Chang3Shiji zhang4Kun Zhao5Lu Chen6Chunzheng Gao7Department of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityShandong UniversityDepartment of Orthopedics, Qilu Hospital of Shandong University, The Second Hospital of Shandong UniversityDepartment of Orthopedics, Qilu Hospital of Shandong University, The Second Hospital of Shandong UniversityDepartment of Orthopedics, Qilu Hospital of Shandong University, The Second Hospital of Shandong UniversityDepartment of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityDepartment of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityDepartment of Orthopedics, The Second Hospital of Shandong University, Qilu Hospital of Shandong University, Shandong UniversityAbstract Purpose This study evaluated and compared the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in diagnosing and treating lumbar disc herniation (LDH) with radiculopathy. Methods Twenty-one questions (across 5 categories) from NASS Clinical Guidelines were input into ChatGPT 4o and ChatGPT 4o mini. Five orthopedic surgeons assessed their responses using a 5-point Likert scale for accuracy and completeness, and a 7-point scale for reliability. Flesch Reading Ease scores were calculated to assess readability. Additionally, ChatGPT 4o analyzed lumbar images from 53 patients, comparing its recognizable agreement with orthopedic surgeons using Kappa values. Results Both models demonstrated strong clinical support capabilities with no significant differences in accuracy or reliability. However, ChatGPT 4o provided more comprehensive and consistent responses. The Flesch Reading Ease scores for both models indicated that their generated content was “very difficult to read,” potentially limiting patient accessibility. In evaluating lumbar disc herniation images, ChatGPT 4o achieved an overall accuracy of 0.81, with LDH recognition precision, recall, and F1 scores exceeding 0.80. The AUC was 0.80, and the Kappa value was 0.61, indicating moderate agreement between the model’s predictions and actual diagnoses, though with room for improvement. Conclusion While both models are effective, ChatGPT 4o offers more comprehensive clinical responses, making it more suitable for high-integrity medical tasks. However, the difficulty in reading AI-generated content and occasional use of misleading terms, such as “tumor,” indicate a need for further improvements to reduce patient anxiety.https://doi.org/10.1186/s40001-025-02296-xChatGPTLumbar disc herniationClinical guidelinesArtificial intelligenceSpine
spellingShingle	Suning Wang Ying Wang Linlin Jiang Yong Chang Shiji zhang Kun Zhao Lu Chen Chunzheng Gao Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation European Journal of Medical Research ChatGPT Lumbar disc herniation Clinical guidelines Artificial intelligence Spine
title	Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_full	Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_fullStr	Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_full_unstemmed	Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_short	Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation
title_sort	assessing the clinical support capabilities of chatgpt 4o and chatgpt 4o mini in managing lumbar disc herniation
topic	ChatGPT Lumbar disc herniation Clinical guidelines Artificial intelligence Spine
url	https://doi.org/10.1186/s40001-025-02296-x
work_keys_str_mv	AT suningwang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT yingwang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT linlinjiang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT yongchang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT shijizhang assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT kunzhao assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT luchen assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation AT chunzhenggao assessingtheclinicalsupportcapabilitiesofchatgpt4oandchatgpt4ominiinmanaginglumbardischerniation

Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation

Similar Items