Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease

BackgroundAlthough large language models (LLMs) such as ChatGPT show promise for providing specialized information, their quality requires further evaluation. This is especially true considering that these models are trained on internet text and the quality of health-related...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zelin Yan, Jingwen Liu, Yihong Fan, Shiyuan Lu, Dingting Xu, Yun Yang, Honggang Wang, Jie Mao, Hou-Chiang Tseng, Tao-Hsing Chang, Yan Chen
Format:	Article
Language:	English
Published:	JMIR Publications 2025-03-01
Series:	Journal of Medical Internet Research
Online Access:	https://www.jmir.org/2025/1/e62857
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850275276051709952
author	Zelin Yan Jingwen Liu Yihong Fan Shiyuan Lu Dingting Xu Yun Yang Honggang Wang Jie Mao Hou-Chiang Tseng Tao-Hsing Chang Yan Chen
author_facet	Zelin Yan Jingwen Liu Yihong Fan Shiyuan Lu Dingting Xu Yun Yang Honggang Wang Jie Mao Hou-Chiang Tseng Tao-Hsing Chang Yan Chen
author_sort	Zelin Yan
collection	DOAJ
description	BackgroundAlthough large language models (LLMs) such as ChatGPT show promise for providing specialized information, their quality requires further evaluation. This is especially true considering that these models are trained on internet text and the quality of health-related information available online varies widely. ObjectiveThe aim of this study was to evaluate the performance of ChatGPT in the context of patient education for individuals with chronic diseases, comparing it with that of industry experts to elucidate its strengths and limitations. MethodsThis evaluation was conducted in September 2023 by analyzing the responses of ChatGPT and specialist doctors to questions posed by patients with inflammatory bowel disease (IBD). We compared their performance in terms of subjective accuracy, empathy, completeness, and overall quality, as well as readability to support objective analysis. ResultsIn a series of 1578 binary choice assessments, ChatGPT was preferred in 48.4% (95% CI 45.9%-50.9%) of instances. There were 12 instances where ChatGPT’s responses were unanimously preferred by all evaluators, compared with 17 instances for specialist doctors. In terms of overall quality, there was no significant difference between the responses of ChatGPT (3.98, 95% CI 3.93-4.02) and those of specialist doctors (3.95, 95% CI 3.90-4.00; t524=0.95, P=.34), both being considered “good.” Although differences in accuracy (t521=0.48, P=.63) and empathy (t511=2.19, P=.03) lacked statistical significance, the completeness of textual output (t509=9.27, P<.001) was a distinct advantage of the LLM (ChatGPT). In the sections of the questionnaire where patients and doctors responded together (Q223-Q242), ChatGPT demonstrated inferior performance (t36=2.91, P=.006). Regarding readability, no statistical difference was found between the responses of specialist doctors (median: 7th grade; Q1: 4th grade; Q3: 8th grade) and those of ChatGPT (median: 7th grade; Q1: 7th grade; Q3: 8th grade) according to the Mann-Whitney U test (P=.09). The overall quality of ChatGPT’s output exhibited strong correlations with other subdimensions (with empathy: r=0.842; with accuracy: r=0.839; with completeness: r=0.795), and there was also a high correlation between the subdimensions of accuracy and completeness (r=0.762). ConclusionsChatGPT demonstrated more stable performance across various dimensions. Its output of health information content is more structurally sound, addressing the issue of variability in the information from individual specialist doctors. ChatGPT’s performance highlights its potential as an auxiliary tool for health information, despite limitations such as artificial intelligence hallucinations. It is recommended that patients be involved in the creation and evaluation of health information to enhance the quality and relevance of the information.
format	Article
id	doaj-art-9aa6ca6564cd4705ae5f060e6c1516ed
institution	OA Journals
issn	1438-8871
language	English
publishDate	2025-03-01
publisher	JMIR Publications
record_format	Article
series	Journal of Medical Internet Research
spelling	doaj-art-9aa6ca6564cd4705ae5f060e6c1516ed2025-08-20T01:50:49ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-03-0127e6285710.2196/62857Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel DiseaseZelin Yanhttps://orcid.org/0009-0005-3123-4875Jingwen Liuhttps://orcid.org/0000-0001-5179-1592Yihong Fanhttps://orcid.org/0000-0001-9554-0541Shiyuan Luhttps://orcid.org/0000-0002-2825-8244Dingting Xuhttps://orcid.org/0000-0001-8728-2378Yun Yanghttps://orcid.org/0009-0009-3323-7569Honggang Wanghttps://orcid.org/0000-0003-4761-0407Jie Maohttps://orcid.org/0009-0006-9597-1249Hou-Chiang Tsenghttps://orcid.org/0000-0001-5249-7280Tao-Hsing Changhttps://orcid.org/0000-0003-4390-5021Yan Chenhttps://orcid.org/0000-0003-3457-2560 BackgroundAlthough large language models (LLMs) such as ChatGPT show promise for providing specialized information, their quality requires further evaluation. This is especially true considering that these models are trained on internet text and the quality of health-related information available online varies widely. ObjectiveThe aim of this study was to evaluate the performance of ChatGPT in the context of patient education for individuals with chronic diseases, comparing it with that of industry experts to elucidate its strengths and limitations. MethodsThis evaluation was conducted in September 2023 by analyzing the responses of ChatGPT and specialist doctors to questions posed by patients with inflammatory bowel disease (IBD). We compared their performance in terms of subjective accuracy, empathy, completeness, and overall quality, as well as readability to support objective analysis. ResultsIn a series of 1578 binary choice assessments, ChatGPT was preferred in 48.4% (95% CI 45.9%-50.9%) of instances. There were 12 instances where ChatGPT’s responses were unanimously preferred by all evaluators, compared with 17 instances for specialist doctors. In terms of overall quality, there was no significant difference between the responses of ChatGPT (3.98, 95% CI 3.93-4.02) and those of specialist doctors (3.95, 95% CI 3.90-4.00; t524=0.95, P=.34), both being considered “good.” Although differences in accuracy (t521=0.48, P=.63) and empathy (t511=2.19, P=.03) lacked statistical significance, the completeness of textual output (t509=9.27, P<.001) was a distinct advantage of the LLM (ChatGPT). In the sections of the questionnaire where patients and doctors responded together (Q223-Q242), ChatGPT demonstrated inferior performance (t36=2.91, P=.006). Regarding readability, no statistical difference was found between the responses of specialist doctors (median: 7th grade; Q1: 4th grade; Q3: 8th grade) and those of ChatGPT (median: 7th grade; Q1: 7th grade; Q3: 8th grade) according to the Mann-Whitney U test (P=.09). The overall quality of ChatGPT’s output exhibited strong correlations with other subdimensions (with empathy: r=0.842; with accuracy: r=0.839; with completeness: r=0.795), and there was also a high correlation between the subdimensions of accuracy and completeness (r=0.762). ConclusionsChatGPT demonstrated more stable performance across various dimensions. Its output of health information content is more structurally sound, addressing the issue of variability in the information from individual specialist doctors. ChatGPT’s performance highlights its potential as an auxiliary tool for health information, despite limitations such as artificial intelligence hallucinations. It is recommended that patients be involved in the creation and evaluation of health information to enhance the quality and relevance of the information.https://www.jmir.org/2025/1/e62857
spellingShingle	Zelin Yan Jingwen Liu Yihong Fan Shiyuan Lu Dingting Xu Yun Yang Honggang Wang Jie Mao Hou-Chiang Tseng Tao-Hsing Chang Yan Chen Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease Journal of Medical Internet Research
title	Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease
title_full	Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease
title_fullStr	Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease
title_full_unstemmed	Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease
title_short	Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease
title_sort	ability of chatgpt to replace doctors in patient education cross sectional comparative analysis of inflammatory bowel disease
url	https://www.jmir.org/2025/1/e62857
work_keys_str_mv	AT zelinyan abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT jingwenliu abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT yihongfan abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT shiyuanlu abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT dingtingxu abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT yunyang abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT honggangwang abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT jiemao abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT houchiangtseng abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT taohsingchang abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease AT yanchen abilityofchatgpttoreplacedoctorsinpatienteducationcrosssectionalcomparativeanalysisofinflammatoryboweldisease

Ability of ChatGPT to Replace Doctors in Patient Education: Cross-Sectional Comparative Analysis of Inflammatory Bowel Disease

Similar Items