Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential

Background Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lili Zhan, Xiumin Dang, Zhenghua Xie, Chaoying Zeng, Weixing Wu, Xiaoyu Zhang, Li Zhang, Xinjian Cai
Format:	Article
Language:	English
Published:	SAGE Publishing 2025-07-01
Series:	Digital Health
Online Access:	https://doi.org/10.1177/20552076251355797
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849428507145273344
author	Lili Zhan Xiumin Dang Zhenghua Xie Chaoying Zeng Weixing Wu Xiaoyu Zhang Li Zhang Xinjian Cai
author_facet	Lili Zhan Xiumin Dang Zhenghua Xie Chaoying Zeng Weixing Wu Xiaoyu Zhang Li Zhang Xinjian Cai
author_sort	Lili Zhan
collection	DOAJ
description	Background Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study aimed to evaluate the accuracy and completeness of GPT-4o in comparison to resident physicians and senior infectious disease specialists in diagnosing and managing bacterial, fungal, and viral infections. Methods A comparative study was conducted involving GPT-4o, three resident physicians, and three senior infectious disease experts. Participants answered 75 questions—comprising true/false, open-ended, and clinical case-based scenarios—developed according to international guidelines and clinical practice. Accuracy and completeness were assessed via blinded expert review using Likert scales. Statistical analysis included Chi-square, Fisher's exact, and Kruskal–Wallis tests. Results In true/false questions, GPT-4o showed comparable accuracy (87.5%) to specialists (90.3%) and exceeded residents (77.8%). Specialists outperformed GPT-4o in accuracy on open-ended (p = .008) and clinical case-based questions ( P = .02). However, GPT-4o demonstrated significantly greater completeness than residents on open-ended ( P < .0001) and clinical case-based questions ( P = .01), providing more comprehensive explanations. Conclusions GPT-4o shows promise as a tool for providing comprehensive responses in infectious disease management, although specialists still outperform it in accuracy. Continuous human oversight is recommended to mitigate potential inaccuracies in clinical decision-making. These findings suggest that while GPT-4o may be considered a valuable supplementary tool for medical advice, it should not replace expert consultation in complex clinical decision-making.
format	Article
id	doaj-art-950fc4a0dfde46ca8431413ed7cde526
institution	Kabale University
issn	2055-2076
language	English
publishDate	2025-07-01
publisher	SAGE Publishing
record_format	Article
series	Digital Health
spelling	doaj-art-950fc4a0dfde46ca8431413ed7cde5262025-08-20T03:28:41ZengSAGE PublishingDigital Health2055-20762025-07-011110.1177/20552076251355797Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potentialLili Zhan0Xiumin Dang1Zhenghua Xie2Chaoying Zeng3Weixing Wu4Xiaoyu Zhang5Li Zhang6Xinjian Cai7 Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory, , Haikou, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, ChinaBackground Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study aimed to evaluate the accuracy and completeness of GPT-4o in comparison to resident physicians and senior infectious disease specialists in diagnosing and managing bacterial, fungal, and viral infections. Methods A comparative study was conducted involving GPT-4o, three resident physicians, and three senior infectious disease experts. Participants answered 75 questions—comprising true/false, open-ended, and clinical case-based scenarios—developed according to international guidelines and clinical practice. Accuracy and completeness were assessed via blinded expert review using Likert scales. Statistical analysis included Chi-square, Fisher's exact, and Kruskal–Wallis tests. Results In true/false questions, GPT-4o showed comparable accuracy (87.5%) to specialists (90.3%) and exceeded residents (77.8%). Specialists outperformed GPT-4o in accuracy on open-ended (p = .008) and clinical case-based questions ( P = .02). However, GPT-4o demonstrated significantly greater completeness than residents on open-ended ( P < .0001) and clinical case-based questions ( P = .01), providing more comprehensive explanations. Conclusions GPT-4o shows promise as a tool for providing comprehensive responses in infectious disease management, although specialists still outperform it in accuracy. Continuous human oversight is recommended to mitigate potential inaccuracies in clinical decision-making. These findings suggest that while GPT-4o may be considered a valuable supplementary tool for medical advice, it should not replace expert consultation in complex clinical decision-making.https://doi.org/10.1177/20552076251355797
spellingShingle	Lili Zhan Xiumin Dang Zhenghua Xie Chaoying Zeng Weixing Wu Xiaoyu Zhang Li Zhang Xinjian Cai Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential Digital Health
title	Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_full	Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_fullStr	Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_full_unstemmed	Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_short	Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_sort	evaluating gpt 4o in infectious disease diagnostics and management a comparative study with residents and specialists on accuracy completeness and clinical support potential
url	https://doi.org/10.1177/20552076251355797
work_keys_str_mv	AT lilizhan evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT xiumindang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT zhenghuaxie evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT chaoyingzeng evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT weixingwu evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT xiaoyuzhang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT lizhang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential AT xinjiancai evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential

Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential

Similar Items