Challenging cases of hyponatremia incorrectly interpreted by ChatGPT

Abstract Background In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their us...

Full description

Saved in:
Bibliographic Details
Main Authors: Kenrick Berend, Ashley Duits, Reinold O. B. Gans
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Medical Education
Subjects:
Online Access:https://doi.org/10.1186/s12909-025-07235-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849325906729893888
author Kenrick Berend
Ashley Duits
Reinold O. B. Gans
author_facet Kenrick Berend
Ashley Duits
Reinold O. B. Gans
author_sort Kenrick Berend
collection DOAJ
description Abstract Background In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their use may possibly overcome current assessment limitations. There is no literature concerning Chat Generative Pre-trained Transformer (ChatGPT-3.5) use for evaluating difficult hyponatremia cases. Because of the interesting pathophysiology, hyponatremia cases are often used in medical education for students to evaluate patients with students increasingly using artificial intelligence as a diagnostic tool. To evaluate this possibility, four challenging hyponatremia cases published previously, were presented to the free ChatGPT-3.5 for diagnosis and treatment suggestions. Methods We used four challenging hyponatremia cases, that were evaluated by 46 physicians in Canada, the Netherlands, South-Africa, Taiwan, and USA, and published previously. These four cases were presented two times in the free ChatGPT, version 3.5 in December 2023 as well as in September 2024 with the request to recommend diagnosis and therapy. Responses by ChatGPT were compared with those of the clinicians. Results Case 1 and 3 have a single cause of hyponatremia. Case 2 and 4 have two contributing hyponatremia features. Neither ChatGPT, in 2023, nor the previously published assessment by 46 clinicians, whose assessment was described in the original publication, recognized the most crucial cause of hyponatremia with major therapeutic consequences in all four cases. In 2024 ChatGPT properly diagnosed and suggested adequate management in one case. Concurrent Addison’s disease was correctly recognized in case 1 by ChatGPT in 2023 and 2024, whereas 81% of the clinicians missed this diagnosis. No proper therapeutic recommendations were given by ChatGPT in 2023 in any of the four cases, but in one case adequate advice was given by ChatGPT in 2024. The 46 clinicians recommended inadequate therapy in 65%, 57%, 2%, and 76%, respectively in case 1 to 4. Conclusion Our study currently does not support the use of the free version ChatGPT 3.5 in difficult hyponatremia cases, but a small improvement was observed after ten months with the same ChatGPT 3.5 version. Patients, health professionals, medical educators and students should be aware of the shortcomings of diagnosis and therapy suggestions by ChatGPT.
format Article
id doaj-art-72014e3f922945d9b8cde701f4fbd74a
institution Kabale University
issn 1472-6920
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series BMC Medical Education
spelling doaj-art-72014e3f922945d9b8cde701f4fbd74a2025-08-20T03:48:18ZengBMCBMC Medical Education1472-69202025-05-012511810.1186/s12909-025-07235-2Challenging cases of hyponatremia incorrectly interpreted by ChatGPTKenrick Berend0Ashley Duits1Reinold O. B. Gans2Department of Medicine, Curaçao Medical CenterInstitute for Medical Education, University of Groningen, University Medical Center GroningenDepartment of Medicine, Curaçao Medical CenterAbstract Background In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their use may possibly overcome current assessment limitations. There is no literature concerning Chat Generative Pre-trained Transformer (ChatGPT-3.5) use for evaluating difficult hyponatremia cases. Because of the interesting pathophysiology, hyponatremia cases are often used in medical education for students to evaluate patients with students increasingly using artificial intelligence as a diagnostic tool. To evaluate this possibility, four challenging hyponatremia cases published previously, were presented to the free ChatGPT-3.5 for diagnosis and treatment suggestions. Methods We used four challenging hyponatremia cases, that were evaluated by 46 physicians in Canada, the Netherlands, South-Africa, Taiwan, and USA, and published previously. These four cases were presented two times in the free ChatGPT, version 3.5 in December 2023 as well as in September 2024 with the request to recommend diagnosis and therapy. Responses by ChatGPT were compared with those of the clinicians. Results Case 1 and 3 have a single cause of hyponatremia. Case 2 and 4 have two contributing hyponatremia features. Neither ChatGPT, in 2023, nor the previously published assessment by 46 clinicians, whose assessment was described in the original publication, recognized the most crucial cause of hyponatremia with major therapeutic consequences in all four cases. In 2024 ChatGPT properly diagnosed and suggested adequate management in one case. Concurrent Addison’s disease was correctly recognized in case 1 by ChatGPT in 2023 and 2024, whereas 81% of the clinicians missed this diagnosis. No proper therapeutic recommendations were given by ChatGPT in 2023 in any of the four cases, but in one case adequate advice was given by ChatGPT in 2024. The 46 clinicians recommended inadequate therapy in 65%, 57%, 2%, and 76%, respectively in case 1 to 4. Conclusion Our study currently does not support the use of the free version ChatGPT 3.5 in difficult hyponatremia cases, but a small improvement was observed after ten months with the same ChatGPT 3.5 version. Patients, health professionals, medical educators and students should be aware of the shortcomings of diagnosis and therapy suggestions by ChatGPT.https://doi.org/10.1186/s12909-025-07235-2EcstasyHyponatremiaLow osmol intakeMineralocorticoid deficiencySIADH
spellingShingle Kenrick Berend
Ashley Duits
Reinold O. B. Gans
Challenging cases of hyponatremia incorrectly interpreted by ChatGPT
BMC Medical Education
Ecstasy
Hyponatremia
Low osmol intake
Mineralocorticoid deficiency
SIADH
title Challenging cases of hyponatremia incorrectly interpreted by ChatGPT
title_full Challenging cases of hyponatremia incorrectly interpreted by ChatGPT
title_fullStr Challenging cases of hyponatremia incorrectly interpreted by ChatGPT
title_full_unstemmed Challenging cases of hyponatremia incorrectly interpreted by ChatGPT
title_short Challenging cases of hyponatremia incorrectly interpreted by ChatGPT
title_sort challenging cases of hyponatremia incorrectly interpreted by chatgpt
topic Ecstasy
Hyponatremia
Low osmol intake
Mineralocorticoid deficiency
SIADH
url https://doi.org/10.1186/s12909-025-07235-2
work_keys_str_mv AT kenrickberend challengingcasesofhyponatremiaincorrectlyinterpretedbychatgpt
AT ashleyduits challengingcasesofhyponatremiaincorrectlyinterpretedbychatgpt
AT reinoldobgans challengingcasesofhyponatremiaincorrectlyinterpretedbychatgpt