Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential

Background Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study...

Full description

Saved in:
Bibliographic Details
Main Authors: Lili Zhan, Xiumin Dang, Zhenghua Xie, Chaoying Zeng, Weixing Wu, Xiaoyu Zhang, Li Zhang, Xinjian Cai
Format: Article
Language:English
Published: SAGE Publishing 2025-07-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251355797
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849428507145273344
author Lili Zhan
Xiumin Dang
Zhenghua Xie
Chaoying Zeng
Weixing Wu
Xiaoyu Zhang
Li Zhang
Xinjian Cai
author_facet Lili Zhan
Xiumin Dang
Zhenghua Xie
Chaoying Zeng
Weixing Wu
Xiaoyu Zhang
Li Zhang
Xinjian Cai
author_sort Lili Zhan
collection DOAJ
description Background Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study aimed to evaluate the accuracy and completeness of GPT-4o in comparison to resident physicians and senior infectious disease specialists in diagnosing and managing bacterial, fungal, and viral infections. Methods A comparative study was conducted involving GPT-4o, three resident physicians, and three senior infectious disease experts. Participants answered 75 questions—comprising true/false, open-ended, and clinical case-based scenarios—developed according to international guidelines and clinical practice. Accuracy and completeness were assessed via blinded expert review using Likert scales. Statistical analysis included Chi-square, Fisher's exact, and Kruskal–Wallis tests. Results In true/false questions, GPT-4o showed comparable accuracy (87.5%) to specialists (90.3%) and exceeded residents (77.8%). Specialists outperformed GPT-4o in accuracy on open-ended (p = .008) and clinical case-based questions ( P  = .02). However, GPT-4o demonstrated significantly greater completeness than residents on open-ended ( P  < .0001) and clinical case-based questions ( P  = .01), providing more comprehensive explanations. Conclusions GPT-4o shows promise as a tool for providing comprehensive responses in infectious disease management, although specialists still outperform it in accuracy. Continuous human oversight is recommended to mitigate potential inaccuracies in clinical decision-making. These findings suggest that while GPT-4o may be considered a valuable supplementary tool for medical advice, it should not replace expert consultation in complex clinical decision-making.
format Article
id doaj-art-950fc4a0dfde46ca8431413ed7cde526
institution Kabale University
issn 2055-2076
language English
publishDate 2025-07-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-950fc4a0dfde46ca8431413ed7cde5262025-08-20T03:28:41ZengSAGE PublishingDigital Health2055-20762025-07-011110.1177/20552076251355797Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potentialLili Zhan0Xiumin Dang1Zhenghua Xie2Chaoying Zeng3Weixing Wu4Xiaoyu Zhang5Li Zhang6Xinjian Cai7 Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory, , Haikou, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China Department of Clinical Laboratory Medicine, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, ChinaBackground Artificial intelligence (AI), particularly GPT models like GPT-4o (omnimodal), is increasingly being integrated into healthcare for providing diagnostic and treatment recommendations. However, the accuracy and clinical applicability of such AI systems remain unclear. Objective This study aimed to evaluate the accuracy and completeness of GPT-4o in comparison to resident physicians and senior infectious disease specialists in diagnosing and managing bacterial, fungal, and viral infections. Methods A comparative study was conducted involving GPT-4o, three resident physicians, and three senior infectious disease experts. Participants answered 75 questions—comprising true/false, open-ended, and clinical case-based scenarios—developed according to international guidelines and clinical practice. Accuracy and completeness were assessed via blinded expert review using Likert scales. Statistical analysis included Chi-square, Fisher's exact, and Kruskal–Wallis tests. Results In true/false questions, GPT-4o showed comparable accuracy (87.5%) to specialists (90.3%) and exceeded residents (77.8%). Specialists outperformed GPT-4o in accuracy on open-ended (p = .008) and clinical case-based questions ( P  = .02). However, GPT-4o demonstrated significantly greater completeness than residents on open-ended ( P  < .0001) and clinical case-based questions ( P  = .01), providing more comprehensive explanations. Conclusions GPT-4o shows promise as a tool for providing comprehensive responses in infectious disease management, although specialists still outperform it in accuracy. Continuous human oversight is recommended to mitigate potential inaccuracies in clinical decision-making. These findings suggest that while GPT-4o may be considered a valuable supplementary tool for medical advice, it should not replace expert consultation in complex clinical decision-making.https://doi.org/10.1177/20552076251355797
spellingShingle Lili Zhan
Xiumin Dang
Zhenghua Xie
Chaoying Zeng
Weixing Wu
Xiaoyu Zhang
Li Zhang
Xinjian Cai
Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
Digital Health
title Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_full Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_fullStr Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_full_unstemmed Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_short Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential
title_sort evaluating gpt 4o in infectious disease diagnostics and management a comparative study with residents and specialists on accuracy completeness and clinical support potential
url https://doi.org/10.1177/20552076251355797
work_keys_str_mv AT lilizhan evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT xiumindang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT zhenghuaxie evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT chaoyingzeng evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT weixingwu evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT xiaoyuzhang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT lizhang evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential
AT xinjiancai evaluatinggpt4oininfectiousdiseasediagnosticsandmanagementacomparativestudywithresidentsandspecialistsonaccuracycompletenessandclinicalsupportpotential