Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature

Background: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive di...

Full description

Saved in:

Bibliographic Details
Main Authors:	B.L. Fabre, M.A.F. Magalhaes Filho, P.N. Aguiar, Jr, F.M. da Costa, B. Gutierres, W.N. William, Jr, A. Del Giglio
Format:	Article
Language:	English
Published:	Elsevier 2024-06-01
Series:	ESMO Real World Data and Digital Oncology
Subjects:	artificial intelligence in health care clinical decision support systems
Online Access:	http://www.sciencedirect.com/science/article/pii/S2949820124000201
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850139229686857728
author	B.L. Fabre M.A.F. Magalhaes Filho P.N. Aguiar, Jr F.M. da Costa B. Gutierres W.N. William, Jr A. Del Giglio
author_facet	B.L. Fabre M.A.F. Magalhaes Filho P.N. Aguiar, Jr F.M. da Costa B. Gutierres W.N. William, Jr A. Del Giglio
author_sort	B.L. Fabre
collection	DOAJ
description	Background: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures. Materials and methods: We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up. Results: The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen. Conclusions: Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.
format	Article
id	doaj-art-9df553df3bf348bdb86dc3fb405a960a
institution	OA Journals
issn	2949-8201
language	English
publishDate	2024-06-01
publisher	Elsevier
record_format	Article
series	ESMO Real World Data and Digital Oncology
spelling	doaj-art-9df553df3bf348bdb86dc3fb405a960a2025-08-20T02:30:23ZengElsevierESMO Real World Data and Digital Oncology2949-82012024-06-01410004210.1016/j.esmorw.2024.100042Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literatureB.L. Fabre0M.A.F. Magalhaes Filho1P.N. Aguiar, Jr2F.M. da Costa3B. Gutierres4W.N. William, Jr5A. Del Giglio6Hospital Beneficência Portuguesa de São Paulo, São PauloHospital Beneficência Portuguesa de São Paulo, São Paulo; Correspondence to: Dr Marcos Aurélio Fonseca Magalhães Filho, R. Martiniano de Carvalho, 965-Bela Vista, São Paulo-SP, 01323-001, Brazil. Tel: +55-11999198828Oncoclínicas de São Paulo, São PauloHospital Beneficência Portuguesa de São Paulo, São PauloOncoclínicas de São Paulo, São PauloOncoclínicas de São Paulo, São PauloFaculdade de Medicina do ABC, São Paulo, BrazilBackground: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures. Materials and methods: We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up. Results: The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen. Conclusions: Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.http://www.sciencedirect.com/science/article/pii/S2949820124000201artificial intelligence in health careclinical decision support systems
spellingShingle	B.L. Fabre M.A.F. Magalhaes Filho P.N. Aguiar, Jr F.M. da Costa B. Gutierres W.N. William, Jr A. Del Giglio Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature ESMO Real World Data and Digital Oncology artificial intelligence in health care clinical decision support systems
title	Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_full	Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_fullStr	Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_full_unstemmed	Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_short	Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_sort	evaluating gpt 4 as an academic support tool for clinicians a comparative analysis of case records from the literature
topic	artificial intelligence in health care clinical decision support systems
url	http://www.sciencedirect.com/science/article/pii/S2949820124000201
work_keys_str_mv	AT blfabre evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT mafmagalhaesfilho evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT pnaguiarjr evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT fmdacosta evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT bgutierres evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT wnwilliamjr evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT adelgiglio evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature

Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature

Similar Items