A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study

Background With the development of the information age, an increasing number of patients are seeking information about related diseases on the Internet. In the medical field, several studies have confirmed that ChatGPT has great potential for use in medical education, generating imaging reports, and...

Full description

Saved in:
Bibliographic Details
Main Authors: Tianyang Mao, Xin Zhao, Kangyi Jiang, Qingyun Xie, Manyu Yang, Ruoxuan Wang, Fengwei Gao
Format: Article
Language:English
Published: SAGE Publishing 2025-04-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251331804
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849762723711156224
author Tianyang Mao
Xin Zhao
Kangyi Jiang
Qingyun Xie
Manyu Yang
Ruoxuan Wang
Fengwei Gao
author_facet Tianyang Mao
Xin Zhao
Kangyi Jiang
Qingyun Xie
Manyu Yang
Ruoxuan Wang
Fengwei Gao
author_sort Tianyang Mao
collection DOAJ
description Background With the development of the information age, an increasing number of patients are seeking information about related diseases on the Internet. In the medical field, several studies have confirmed that ChatGPT has great potential for use in medical education, generating imaging reports, and even providing clinical diagnosis and treatment decisions, but its ability to answer questions related to gallstones has not yet been reported in the literature. Objective The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in cholelithiasis, compared to answers provided by clinical expert. Methods This study designs an answer task based on the clinical practice guidelines for cholelithiasis. The answers are presented in the form of keywords. The questions are categorized into general questions and professional questions. To evaluate the performance of ChatGPT and clinical expert answers, the study employs a modified matching scoring system, a keyword proportion evaluation system, and the DISCERN tool. Results ChatGPT often provides more keywords in its responses, but its accuracy is significantly lower than that of doctors ( P  < .001). In the evaluation of 33 general questions, ChatGPT and doctors demonstrated similar performance in both the modified matching score system and keyword proportion evaluation ( P  = .856 and P  = .829, respectively). However, in the evaluation of 32 professional questions, doctors consistently outperformed ChatGPT ( P  = .004 and P  = .016). Additionally, while the DISCERN tool showed differences between general and professional questions ( P  = .001), both types of questions were evaluated at a high level overall. Conclusions Currently, ChatGPT performs similarly to clinical experts in answering general questions related to cholelithiasis. However, it cannot replace clinical experts in professional clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of cholelithiasis. Nevertheless, in more specialized areas, careful attention and continuous evaluation will be necessary to ensure its accuracy, reliability, and safety in the medical field.
format Article
id doaj-art-cd8002f893ed4814b798d71dcb9e9b73
institution DOAJ
issn 2055-2076
language English
publishDate 2025-04-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-cd8002f893ed4814b798d71dcb9e9b732025-08-20T03:05:39ZengSAGE PublishingDigital Health2055-20762025-04-011110.1177/20552076251331804A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional studyTianyang Mao0Xin Zhao1Kangyi Jiang2Qingyun Xie3Manyu Yang4Ruoxuan Wang5Fengwei Gao6 Department of Clinical Medicine, Sichuan North Medical College, Nanchong, China Department of Hepato-Pancreato-Biliary Surgery, , Leshan, China Department of Hepato-Pancreato-Biliary Surgery, , Leshan, China Liver Transplantation Center, State Key Laboratory of Biotherapy and Cancer Center, , Sichuan University and Collaborative Innovation Center of Biotherapy, Chengdu, China Department of Clinical Medicine, Sichuan North Medical College, Nanchong, China Department of Clinical Medicine, Sichuan North Medical College, Nanchong, China Liver Transplantation Center, State Key Laboratory of Biotherapy and Cancer Center, , Sichuan University and Collaborative Innovation Center of Biotherapy, Chengdu, ChinaBackground With the development of the information age, an increasing number of patients are seeking information about related diseases on the Internet. In the medical field, several studies have confirmed that ChatGPT has great potential for use in medical education, generating imaging reports, and even providing clinical diagnosis and treatment decisions, but its ability to answer questions related to gallstones has not yet been reported in the literature. Objective The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in cholelithiasis, compared to answers provided by clinical expert. Methods This study designs an answer task based on the clinical practice guidelines for cholelithiasis. The answers are presented in the form of keywords. The questions are categorized into general questions and professional questions. To evaluate the performance of ChatGPT and clinical expert answers, the study employs a modified matching scoring system, a keyword proportion evaluation system, and the DISCERN tool. Results ChatGPT often provides more keywords in its responses, but its accuracy is significantly lower than that of doctors ( P  < .001). In the evaluation of 33 general questions, ChatGPT and doctors demonstrated similar performance in both the modified matching score system and keyword proportion evaluation ( P  = .856 and P  = .829, respectively). However, in the evaluation of 32 professional questions, doctors consistently outperformed ChatGPT ( P  = .004 and P  = .016). Additionally, while the DISCERN tool showed differences between general and professional questions ( P  = .001), both types of questions were evaluated at a high level overall. Conclusions Currently, ChatGPT performs similarly to clinical experts in answering general questions related to cholelithiasis. However, it cannot replace clinical experts in professional clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of cholelithiasis. Nevertheless, in more specialized areas, careful attention and continuous evaluation will be necessary to ensure its accuracy, reliability, and safety in the medical field.https://doi.org/10.1177/20552076251331804
spellingShingle Tianyang Mao
Xin Zhao
Kangyi Jiang
Qingyun Xie
Manyu Yang
Ruoxuan Wang
Fengwei Gao
A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study
Digital Health
title A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study
title_full A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study
title_fullStr A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study
title_full_unstemmed A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study
title_short A comparison of the responses between ChatGPT and doctors in the field of cholelithiasis based on clinical practice guidelines: a cross-sectional study
title_sort comparison of the responses between chatgpt and doctors in the field of cholelithiasis based on clinical practice guidelines a cross sectional study
url https://doi.org/10.1177/20552076251331804
work_keys_str_mv AT tianyangmao acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT xinzhao acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT kangyijiang acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT qingyunxie acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT manyuyang acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT ruoxuanwang acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT fengweigao acomparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT tianyangmao comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT xinzhao comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT kangyijiang comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT qingyunxie comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT manyuyang comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT ruoxuanwang comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy
AT fengweigao comparisonoftheresponsesbetweenchatgptanddoctorsinthefieldofcholelithiasisbasedonclinicalpracticeguidelinesacrosssectionalstudy