Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
Background Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SAGE Publishing
2025-07-01
|
| Series: | Digital Health |
| Online Access: | https://doi.org/10.1177/20552076251355797 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849428507145273344 |
|---|---|
| author | Lili Zhan Xiumin Dang Zhenghua Xie Chaoying Zeng Weixing Wu Xiaoyu Zhang Li Zhang Xinjian Cai |
| author_facet | Lili Zhan Xiumin Dang Zhenghua Xie Chaoying Zeng Weixing Wu Xiaoyu Zhang Li Zhang Xinjian Cai |
| author_sort | Lili Zhan |
| collection | DOAJ |
| description | Background Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study aimed to evaluate the accuracy and completeness of GPT-4o in comparison to resident physicians and senior infectious disease specialists in diagnosing and managing bacterial, fungal, and viral infections. Methods A comparative study was conducted involving GPT-4o, three resident physicians, and three senior infectious disease experts. Participants answered 75 questions—comprising true/false, open-ended, and clinical case-based scenarios—developed according to international guidelines and clinical practice. Accuracy and completeness were assessed via blinded expert review using Likert scales. Statistical analysis included Chi-square, Fisher's exact, and Kruskal–Wallis tests. Results In true/false questions, GPT-4o showed comparable accuracy (87.5%) to specialists (90.3%) and exceeded residents (77.8%). Specialists outperformed GPT-4o in accuracy on open-ended (p = .008) and clinical case-based questions ( P = .02). However, GPT-4o demonstrated significantly greater completeness than residents on open-ended ( P < .0001) and clinical case-based questions ( P = .01), providing more comprehensive explanations. Conclusions GPT-4o shows promise as a tool for providing comprehensive responses in infectious disease management, although specialists still outperform it in accuracy. Continuous human oversight is recommended to mitigate potential inaccuracies in clinical decision-making. These findings suggest that while GPT-4o may be considered a valuable supplementary tool for medical advice, it should not replace expert consultation in complex clinical decision-making. |
| format | Article |
| id | doaj-art-950fc4a0dfde46ca8431413ed7cde526 |
| institution | Kabale University |
| issn | 2055-2076 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | SAGE Publishing |
| record_format | Article |
| series | Digital Health |
| spelling | doaj-art-950fc4a0dfde46ca8431413ed7cde5262025-08-20T03:28:41ZengSAGE PublishingDigital Health2055-20762025-07-011110.1177/20552076251355797Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potentialLili Zhan0Xiumin Dang1Zhenghua Xie2Chaoying Zeng3Weixing Wu4Xiaoyu Zhang5Li Zhang6Xinjian Cai7 Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory, , Haikou, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, ChinaBackground Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study aimed to evaluate the accuracy and completeness of GPT-4o in comparison to resident physicians and senior infectious disease specialists in diagnosing and managing bacterial, fungal, and viral infections. Methods A comparative study was conducted involving GPT-4o, three resident physicians, and three senior infectious disease experts. Participants answered 75 questions—comprising true/false, open-ended, and clinical case-based scenarios—developed according to international guidelines and clinical practice. Accuracy and completeness were assessed via blinded expert review using Likert scales. Statistical analysis included Chi-square, Fisher's exact, and Kruskal–Wallis tests. Results In true/false questions, GPT-4o showed comparable accuracy (87.5%) to specialists (90.3%) and exceeded residents (77.8%). Specialists outperformed GPT-4o in accuracy on open-ended (p = .008) and clinical case-based questions ( P = .02). However, GPT-4o demonstrated significantly greater completeness than residents on open-ended ( P < .0001) and clinical case-based questions ( P = .01), providing more comprehensive explanations. Conclusions GPT-4o shows promise as a tool for providing comprehensive responses in infectious disease management, although specialists still outperform it in accuracy. Continuous human oversight is recommended to mitigate potential inaccuracies in clinical decision-making. These findings suggest that while GPT-4o may be considered a valuable supplementary tool for medical advice, it should not replace expert consultation in complex clinical decision-making.https://doi.org/10.1177/20552076251355797 |
| spellingShingle | Lili Zhan Xiumin Dang Zhenghua Xie Chaoying Zeng Weixing Wu Xiaoyu Zhang Li Zhang Xinjian Cai Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential Digital Health |
| title | Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential |
| title_full | Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential |
| title_fullStr | Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential |
| title_full_unstemmed | Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential |
| title_short | Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential |
| title_sort | evaluating gpt 4o in infectious disease diagnostics and management a comparative study with residents and specialists on accuracy completeness and clinical support potential |
| url | https://doi.org/10.1177/20552076251355797 |
| work_keys_str_mv | AT lilizhan evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT xiumindang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT zhenghuaxie evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT chaoyingzeng evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT weixingwu evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT xiaoyuzhang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT lizhang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT xinjiancai evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential |