Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature
Background: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive di...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-06-01
|
| Series: | ESMO Real World Data and Digital Oncology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2949820124000201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850139229686857728 |
|---|---|
| author | B.L. Fabre M.A.F. Magalhaes Filho P.N. Aguiar, Jr F.M. da Costa B. Gutierres W.N. William, Jr A. Del Giglio |
| author_facet | B.L. Fabre M.A.F. Magalhaes Filho P.N. Aguiar, Jr F.M. da Costa B. Gutierres W.N. William, Jr A. Del Giglio |
| author_sort | B.L. Fabre |
| collection | DOAJ |
| description | Background: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures. Materials and methods: We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up. Results: The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen. Conclusions: Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy. |
| format | Article |
| id | doaj-art-9df553df3bf348bdb86dc3fb405a960a |
| institution | OA Journals |
| issn | 2949-8201 |
| language | English |
| publishDate | 2024-06-01 |
| publisher | Elsevier |
| record_format | Article |
| series | ESMO Real World Data and Digital Oncology |
| spelling | doaj-art-9df553df3bf348bdb86dc3fb405a960a2025-08-20T02:30:23ZengElsevierESMO Real World Data and Digital Oncology2949-82012024-06-01410004210.1016/j.esmorw.2024.100042Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literatureB.L. Fabre0M.A.F. Magalhaes Filho1P.N. Aguiar, Jr2F.M. da Costa3B. Gutierres4W.N. William, Jr5A. Del Giglio6Hospital Beneficência Portuguesa de São Paulo, São PauloHospital Beneficência Portuguesa de São Paulo, São Paulo; Correspondence to: Dr Marcos Aurélio Fonseca Magalhães Filho, R. Martiniano de Carvalho, 965-Bela Vista, São Paulo-SP, 01323-001, Brazil. Tel: +55-11999198828Oncoclínicas de São Paulo, São PauloHospital Beneficência Portuguesa de São Paulo, São PauloOncoclínicas de São Paulo, São PauloOncoclínicas de São Paulo, São PauloFaculdade de Medicina do ABC, São Paulo, BrazilBackground: Artificial intelligence (AI) and natural language processing (NLP) advancements have led to sophisticated tools like GPT-4.0, allowing clinicians to explore its utility as a health care management support tool. Our study aimed to assess the capability of GPT-4 in suggesting definitive diagnoses and appropriate work-ups to minimize unnecessary procedures. Materials and methods: We conducted a retrospective comparative analysis, extracting clinical data from 10 cases published in the New England Journal of Medicine after 2022 and inputting this data into GPT-4 to generate diagnostic and work-up recommendations. Primary endpoint: the ability to correctly identify the final diagnosis. Secondary endpoints: its ability to list the definitive diagnosis as the first of the five most likely differential diagnoses and determine an adequate work-up. Results: The AI could not identify the definitive diagnosis in 2 out of 10 cases (20% inaccuracy). Among the eight cases correctly identified by the AI, five (63%) listed the definitive diagnosis at the top of the differential diagnosis list. In terms of diagnostic tests and exams, the AI suggested unnecessary procedures in two cases, representing 40% of the cases where it failed to correctly identify the final diagnosis. Moreover, the AI could not suggest adequate treatment for seven cases (70%). Among them, the AI suggested inappropriate management for two cases, and the remaining five received incomplete or non-specific advice, such as chemotherapy, without specifying the best regimen. Conclusions: Our study demonstrated GPT-4’s potential as an academic support tool, although it cannot correctly identify the final diagnosis in 20% of the cases and the AI requested unnecessary additional diagnostic tests for 40% of the patients. Future research should focus on evaluating the performance of GPT-4 using a more extensive and diverse sample, incorporating prospective assessments, and investigating its ability to improve diagnostic and therapeutic accuracy.http://www.sciencedirect.com/science/article/pii/S2949820124000201artificial intelligence in health careclinical decision support systems |
| spellingShingle | B.L. Fabre M.A.F. Magalhaes Filho P.N. Aguiar, Jr F.M. da Costa B. Gutierres W.N. William, Jr A. Del Giglio Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature ESMO Real World Data and Digital Oncology artificial intelligence in health care clinical decision support systems |
| title | Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature |
| title_full | Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature |
| title_fullStr | Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature |
| title_full_unstemmed | Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature |
| title_short | Evaluating GPT-4 as an academic support tool for clinicians: a comparative analysis of case records from the literature |
| title_sort | evaluating gpt 4 as an academic support tool for clinicians a comparative analysis of case records from the literature |
| topic | artificial intelligence in health care clinical decision support systems |
| url | http://www.sciencedirect.com/science/article/pii/S2949820124000201 |
| work_keys_str_mv | AT blfabre evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT mafmagalhaesfilho evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT pnaguiarjr evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT fmdacosta evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT bgutierres evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT wnwilliamjr evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature AT adelgiglio evaluatinggpt4asanacademicsupporttoolforcliniciansacomparativeanalysisofcaserecordsfromtheliterature |