A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports

The limitation of artificial intelligence (AI) large language models to diagnose diseases from the perspective of patient safety remains underexplored and potential challenges, such as diagnostic errors and legal challenges, need to be addressed. To demonstrate the limitations of AI, we used ChatGPT...

Full description

Saved in:
Bibliographic Details
Main Authors: Joseph Anika, Joseph Kevin, Joseph Angelyn
Format: Article
Language:English
Published: De Gruyter 2024-12-01
Series:Translational Neuroscience
Subjects:
Online Access:https://doi.org/10.1515/tnsci-2022-0361
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850084382503600128
author Joseph Anika
Joseph Kevin
Joseph Angelyn
author_facet Joseph Anika
Joseph Kevin
Joseph Angelyn
author_sort Joseph Anika
collection DOAJ
description The limitation of artificial intelligence (AI) large language models to diagnose diseases from the perspective of patient safety remains underexplored and potential challenges, such as diagnostic errors and legal challenges, need to be addressed. To demonstrate the limitations of AI, we used ChatGPT-3.5 developed by OpenAI, as a tool for medical diagnosis using text-based case reports of multiple sclerosis (MS), which was selected as a prototypic disease. We analyzed 98 peer-reviewed case reports selected based on free-full text availability and published within the past decade (2014–2024), excluding any mention of an MS diagnosis to avoid bias. ChatGPT-3.5 was used to interpret clinical presentations and laboratory data from these reports. The model correctly diagnosed MS in 77 cases, achieving an accuracy rate of 78.6%. However, the remaining 21 cases were misdiagnosed, highlighting the model’s limitations. Factors contributing to the errors include variability in data presentation and the inherent complexity of MS diagnosis, which requires imaging modalities in addition to clinical presentations and laboratory data. While these findings suggest that AI can support disease diagnosis and healthcare providers in decision-making, inadequate training with large datasets may lead to significant inaccuracies. Integrating AI into clinical practice necessitates rigorous validation and robust regulatory frameworks to ensure responsible use.
format Article
id doaj-art-1410c3a6d8374971b28764fb579a3513
institution DOAJ
issn 2081-6936
language English
publishDate 2024-12-01
publisher De Gruyter
record_format Article
series Translational Neuroscience
spelling doaj-art-1410c3a6d8374971b28764fb579a35132025-08-20T02:44:03ZengDe GruyterTranslational Neuroscience2081-69362024-12-01151445610.1515/tnsci-2022-0361A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reportsJoseph Anika0Joseph Kevin1Joseph Angelyn2Health Sciences Program, University of Ottawa, 75 Laurier Ave E, Ottawa, ON K1N 6N5, CanadaBiomedical Science Program, University of Ottawa, 75 Laurier Ave E, Ottawa, ON K1N 6N5, CanadaMerivale High School, 1755 Merivale Rd, Nepean, ON K2G 1E2, CanadaThe limitation of artificial intelligence (AI) large language models to diagnose diseases from the perspective of patient safety remains underexplored and potential challenges, such as diagnostic errors and legal challenges, need to be addressed. To demonstrate the limitations of AI, we used ChatGPT-3.5 developed by OpenAI, as a tool for medical diagnosis using text-based case reports of multiple sclerosis (MS), which was selected as a prototypic disease. We analyzed 98 peer-reviewed case reports selected based on free-full text availability and published within the past decade (2014–2024), excluding any mention of an MS diagnosis to avoid bias. ChatGPT-3.5 was used to interpret clinical presentations and laboratory data from these reports. The model correctly diagnosed MS in 77 cases, achieving an accuracy rate of 78.6%. However, the remaining 21 cases were misdiagnosed, highlighting the model’s limitations. Factors contributing to the errors include variability in data presentation and the inherent complexity of MS diagnosis, which requires imaging modalities in addition to clinical presentations and laboratory data. While these findings suggest that AI can support disease diagnosis and healthcare providers in decision-making, inadequate training with large datasets may lead to significant inaccuracies. Integrating AI into clinical practice necessitates rigorous validation and robust regulatory frameworks to ensure responsible use.https://doi.org/10.1515/tnsci-2022-0361artificial intelligencemultiple sclerosiscase reportslegal
spellingShingle Joseph Anika
Joseph Kevin
Joseph Angelyn
A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
Translational Neuroscience
artificial intelligence
multiple sclerosis
case reports
legal
title A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
title_full A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
title_fullStr A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
title_full_unstemmed A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
title_short A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
title_sort pilot evaluation of the diagnostic accuracy of chatgpt 3 5 for multiple sclerosis from case reports
topic artificial intelligence
multiple sclerosis
case reports
legal
url https://doi.org/10.1515/tnsci-2022-0361
work_keys_str_mv AT josephanika apilotevaluationofthediagnosticaccuracyofchatgpt35formultiplesclerosisfromcasereports
AT josephkevin apilotevaluationofthediagnosticaccuracyofchatgpt35formultiplesclerosisfromcasereports
AT josephangelyn apilotevaluationofthediagnosticaccuracyofchatgpt35formultiplesclerosisfromcasereports
AT josephanika pilotevaluationofthediagnosticaccuracyofchatgpt35formultiplesclerosisfromcasereports
AT josephkevin pilotevaluationofthediagnosticaccuracyofchatgpt35formultiplesclerosisfromcasereports
AT josephangelyn pilotevaluationofthediagnosticaccuracyofchatgpt35formultiplesclerosisfromcasereports