Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview

Christopher Seifen,1,* Katharina Bahr-Hamm,1,* Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz,...

Full description

Saved in:
Bibliographic Details
Main Authors: Seifen C, Bahr-Hamm K, Gouveris H, Pordzik J, Blaikie A, Matthias C, Kuhn S, Buhr CR
Format: Article
Language:English
Published: Dove Medical Press 2025-04-01
Series:Nature and Science of Sleep
Subjects:
Online Access:https://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSS
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Christopher Seifen,1,&ast; Katharina Bahr-Hamm,1,&ast; Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, Germany; 2School of Medicine, University of St Andrews, St Andrews, UK; 3Institute for Digital Medicine, Philipps University Marburg, University Hospital Giessen and Marburg, Marburg, Germany&ast;These authors contributed equally to this workCorrespondence: Christopher Seifen, Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, 55131, Germany, Email kim.seifen@unimedizin-mainz.dePurpose: Timely identification of comorbidities is critical in sleep medicine, where large language models (LLMs) like ChatGPT are currently emerging as transformative tools. Here, we investigate whether the novel LLM ChatGPT o1 preview can identify individual health risks or potentially existing comorbidities from the medical data of fictitious sleep medicine patients.Methods: We conducted a simulation-based study using 30 fictitious patients, designed to represent realistic variations in demographic and clinical parameters commonly seen in sleep medicine. Each profile included personal data (eg, body mass index, smoking status, drinking habits), blood pressure, and routine blood test results, along with a predefined sleep medicine diagnosis. Each patient profile was evaluated independently by the LLM and a sleep medicine specialist (SMS) for identification of potential comorbidities or individual health risks. Their recommendations were compared for concordance across lifestyle changes and further medical measures.Results: The LLM achieved high concordance with the SMS for lifestyle modification recommendations, including 100% concordance on smoking cessation (κ = 1; p < 0.001), 97% on alcohol reduction (κ = 0.92; p < 0.001) and endocrinological examination (κ = 0.92; p < 0.001) or 93% on weight loss (κ = 0.86; p < 0.001). However, it exhibited a tendency to over-recommend further medical measures (particularly 57% concordance for cardiological examination (κ = 0.08; p = 0.28) and 33% for gastrointestinal examination (κ = 0.1; p = 0.22)) compared to the SMS.Conclusion: Despite the obvious limitation of using fictitious data, the findings suggest that LLMs like ChatGPT have the potential to complement clinical workflows in sleep medicine by identifying individual health risks and comorbidities. As LLMs continue to evolve, their integration into healthcare could redefine the approach to patient evaluation and risk stratification. Future research should contextualize the findings within broader clinical applications ideally testing locally run LLMs meeting data protection requirements.Keywords: ChatGPT, large language model, obstructive sleep apnea, comorbidities, health risk
ISSN:1179-1608