Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview

Christopher Seifen,1,* Katharina Bahr-Hamm,1,* Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz,...

Full description

Saved in:
Bibliographic Details
Main Authors: Seifen C, Bahr-Hamm K, Gouveris H, Pordzik J, Blaikie A, Matthias C, Kuhn S, Buhr CR
Format: Article
Language:English
Published: Dove Medical Press 2025-04-01
Series:Nature and Science of Sleep
Subjects:
Online Access:https://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSS
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849697794748579840
author Seifen C
Bahr-Hamm K
Gouveris H
Pordzik J
Blaikie A
Matthias C
Kuhn S
Buhr CR
author_facet Seifen C
Bahr-Hamm K
Gouveris H
Pordzik J
Blaikie A
Matthias C
Kuhn S
Buhr CR
author_sort Seifen C
collection DOAJ
description Christopher Seifen,1,&ast; Katharina Bahr-Hamm,1,&ast; Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, Germany; 2School of Medicine, University of St Andrews, St Andrews, UK; 3Institute for Digital Medicine, Philipps University Marburg, University Hospital Giessen and Marburg, Marburg, Germany&ast;These authors contributed equally to this workCorrespondence: Christopher Seifen, Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, 55131, Germany, Email kim.seifen@unimedizin-mainz.dePurpose: Timely identification of comorbidities is critical in sleep medicine, where large language models (LLMs) like ChatGPT are currently emerging as transformative tools. Here, we investigate whether the novel LLM ChatGPT o1 preview can identify individual health risks or potentially existing comorbidities from the medical data of fictitious sleep medicine patients.Methods: We conducted a simulation-based study using 30 fictitious patients, designed to represent realistic variations in demographic and clinical parameters commonly seen in sleep medicine. Each profile included personal data (eg, body mass index, smoking status, drinking habits), blood pressure, and routine blood test results, along with a predefined sleep medicine diagnosis. Each patient profile was evaluated independently by the LLM and a sleep medicine specialist (SMS) for identification of potential comorbidities or individual health risks. Their recommendations were compared for concordance across lifestyle changes and further medical measures.Results: The LLM achieved high concordance with the SMS for lifestyle modification recommendations, including 100% concordance on smoking cessation (κ = 1; p < 0.001), 97% on alcohol reduction (κ = 0.92; p < 0.001) and endocrinological examination (κ = 0.92; p < 0.001) or 93% on weight loss (κ = 0.86; p < 0.001). However, it exhibited a tendency to over-recommend further medical measures (particularly 57% concordance for cardiological examination (κ = 0.08; p = 0.28) and 33% for gastrointestinal examination (κ = 0.1; p = 0.22)) compared to the SMS.Conclusion: Despite the obvious limitation of using fictitious data, the findings suggest that LLMs like ChatGPT have the potential to complement clinical workflows in sleep medicine by identifying individual health risks and comorbidities. As LLMs continue to evolve, their integration into healthcare could redefine the approach to patient evaluation and risk stratification. Future research should contextualize the findings within broader clinical applications ideally testing locally run LLMs meeting data protection requirements.Keywords: ChatGPT, large language model, obstructive sleep apnea, comorbidities, health risk
format Article
id doaj-art-8649a5ad6f6a4c1e9bb9d136fbd0a4fb
institution DOAJ
issn 1179-1608
language English
publishDate 2025-04-01
publisher Dove Medical Press
record_format Article
series Nature and Science of Sleep
spelling doaj-art-8649a5ad6f6a4c1e9bb9d136fbd0a4fb2025-08-20T03:19:07ZengDove Medical PressNature and Science of Sleep1179-16082025-04-01Volume 17677688102500Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 PreviewSeifen CBahr-Hamm KGouveris HPordzik JBlaikie AMatthias CKuhn SBuhr CRChristopher Seifen,1,&ast; Katharina Bahr-Hamm,1,&ast; Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, Germany; 2School of Medicine, University of St Andrews, St Andrews, UK; 3Institute for Digital Medicine, Philipps University Marburg, University Hospital Giessen and Marburg, Marburg, Germany&ast;These authors contributed equally to this workCorrespondence: Christopher Seifen, Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, 55131, Germany, Email kim.seifen@unimedizin-mainz.dePurpose: Timely identification of comorbidities is critical in sleep medicine, where large language models (LLMs) like ChatGPT are currently emerging as transformative tools. Here, we investigate whether the novel LLM ChatGPT o1 preview can identify individual health risks or potentially existing comorbidities from the medical data of fictitious sleep medicine patients.Methods: We conducted a simulation-based study using 30 fictitious patients, designed to represent realistic variations in demographic and clinical parameters commonly seen in sleep medicine. Each profile included personal data (eg, body mass index, smoking status, drinking habits), blood pressure, and routine blood test results, along with a predefined sleep medicine diagnosis. Each patient profile was evaluated independently by the LLM and a sleep medicine specialist (SMS) for identification of potential comorbidities or individual health risks. Their recommendations were compared for concordance across lifestyle changes and further medical measures.Results: The LLM achieved high concordance with the SMS for lifestyle modification recommendations, including 100% concordance on smoking cessation (κ = 1; p < 0.001), 97% on alcohol reduction (κ = 0.92; p < 0.001) and endocrinological examination (κ = 0.92; p < 0.001) or 93% on weight loss (κ = 0.86; p < 0.001). However, it exhibited a tendency to over-recommend further medical measures (particularly 57% concordance for cardiological examination (κ = 0.08; p = 0.28) and 33% for gastrointestinal examination (κ = 0.1; p = 0.22)) compared to the SMS.Conclusion: Despite the obvious limitation of using fictitious data, the findings suggest that LLMs like ChatGPT have the potential to complement clinical workflows in sleep medicine by identifying individual health risks and comorbidities. As LLMs continue to evolve, their integration into healthcare could redefine the approach to patient evaluation and risk stratification. Future research should contextualize the findings within broader clinical applications ideally testing locally run LLMs meeting data protection requirements.Keywords: ChatGPT, large language model, obstructive sleep apnea, comorbidities, health riskhttps://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSSchatgptlarge language modelobstructive sleep apneacomorbiditieshealth risk
spellingShingle Seifen C
Bahr-Hamm K
Gouveris H
Pordzik J
Blaikie A
Matthias C
Kuhn S
Buhr CR
Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview
Nature and Science of Sleep
chatgpt
large language model
obstructive sleep apnea
comorbidities
health risk
title Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview
title_full Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview
title_fullStr Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview
title_full_unstemmed Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview
title_short Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview
title_sort simulation based evaluation of large language models for comorbidity detection in sleep medicine ndash a pilot study on chatgpt o1 preview
topic chatgpt
large language model
obstructive sleep apnea
comorbidities
health risk
url https://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSS
work_keys_str_mv AT seifenc simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT bahrhammk simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT gouverish simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT pordzikj simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT blaikiea simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT matthiasc simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT kuhns simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview
AT buhrcr simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview