Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature

Background: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive di...

Full description

Saved in:
Bibliographic Details
Main Authors: B.L. Fabre, M.A.F. Magalhaes Filho, P.N. Aguiar, Jr, F.M. da Costa, B. Gutierres, W.N. William, Jr, A. Del Giglio
Format: Article
Language:English
Published: Elsevier 2024-06-01
Series:ESMO Real World Data and Digital Oncology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949820124000201
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850139229686857728
author B.L. Fabre
M.A.F. Magalhaes Filho
P.N. Aguiar, Jr
F.M. da Costa
B. Gutierres
W.N. William, Jr
A. Del Giglio
author_facet B.L. Fabre
M.A.F. Magalhaes Filho
P.N. Aguiar, Jr
F.M. da Costa
B. Gutierres
W.N. William, Jr
A. Del Giglio
author_sort B.L. Fabre
collection DOAJ
description Background: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures. Materials and methods: We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up. Results: The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen. Conclusions: Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.
format Article
id doaj-art-9df553df3bf348bdb86dc3fb405a960a
institution OA Journals
issn 2949-8201
language English
publishDate 2024-06-01
publisher Elsevier
record_format Article
series ESMO Real World Data and Digital Oncology
spelling doaj-art-9df553df3bf348bdb86dc3fb405a960a2025-08-20T02:30:23ZengElsevierESMO Real World Data and Digital Oncology2949-82012024-06-01410004210.1016/j.esmorw.2024.100042Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literatureB.L. Fabre0M.A.F. Magalhaes Filho1P.N. Aguiar, Jr2F.M. da Costa3B. Gutierres4W.N. William, Jr5A. Del Giglio6Hospital Beneficência Portuguesa de São Paulo, São PauloHospital Beneficência Portuguesa de São Paulo, São Paulo; Correspondence to: Dr Marcos Aurélio Fonseca Magalhães Filho, R. Martiniano de Carvalho, 965-Bela Vista, São Paulo-SP, 01323-001, Brazil. Tel: +55-11999198828Oncoclínicas de São Paulo, São PauloHospital Beneficência Portuguesa de São Paulo, São PauloOncoclínicas de São Paulo, São PauloOncoclínicas de São Paulo, São PauloFaculdade de Medicina do ABC, São Paulo, BrazilBackground: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures. Materials and methods: We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up. Results: The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen. Conclusions: Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.http://www.sciencedirect.com/science/article/pii/S2949820124000201artificial intelligence in health careclinical decision support systems
spellingShingle B.L. Fabre
M.A.F. Magalhaes Filho
P.N. Aguiar, Jr
F.M. da Costa
B. Gutierres
W.N. William, Jr
A. Del Giglio
Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
ESMO Real World Data and Digital Oncology
artificial intelligence in health care
clinical decision support systems
title Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_full Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_fullStr Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_full_unstemmed Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_short Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
title_sort evaluating gpt 4 as an academic support tool for clinicians a comparative analysis of case records from the literature
topic artificial intelligence in health care
clinical decision support systems
url http://www.sciencedirect.com/science/article/pii/S2949820124000201
work_keys_str_mv AT blfabre evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature
AT mafmagalhaesfilho evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature
AT pnaguiarjr evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature
AT fmdacosta evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature
AT bgutierres evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature
AT wnwilliamjr evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature
AT adelgiglio evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature