Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview

Christopher Seifen,1,&ast; Katharina Bahr-Hamm,1,&ast; Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Seifen C, Bahr-Hamm K, Gouveris H, Pordzik J, Blaikie A, Matthias C, Kuhn S, Buhr CR
Format:	Article
Language:	English
Published:	Dove Medical Press 2025-04-01
Series:	Nature and Science of Sleep
Subjects:	chatgpt large language model obstructive sleep apnea comorbidities health risk
Online Access:	https://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSS
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849697794748579840
author	Seifen C Bahr-Hamm K Gouveris H Pordzik J Blaikie A Matthias C Kuhn S Buhr CR
author_facet	Seifen C Bahr-Hamm K Gouveris H Pordzik J Blaikie A Matthias C Kuhn S Buhr CR
author_sort	Seifen C
collection	DOAJ
description	Christopher Seifen,1,&ast; Katharina Bahr-Hamm,1,&ast; Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, Germany; 2School of Medicine, University of St Andrews, St Andrews, UK; 3Institute for Digital Medicine, Philipps University Marburg, University Hospital Giessen and Marburg, Marburg, Germany&ast;These authors contributed equally to this workCorrespondence: Christopher Seifen, Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, 55131, Germany, Email kim.seifen@unimedizin-mainz.dePurpose: Timely identification of comorbidities is critical in sleep medicine, where large language models (LLMs) like ChatGPT are currently emerging as transformative tools. Here, we investigate whether the novel LLM ChatGPT o1 preview can identify individual health risks or potentially existing comorbidities from the medical data of fictitious sleep medicine patients.Methods: We conducted a simulation-based study using 30 fictitious patients, designed to represent realistic variations in demographic and clinical parameters commonly seen in sleep medicine. Each profile included personal data (eg, body mass index, smoking status, drinking habits), blood pressure, and routine blood test results, along with a predefined sleep medicine diagnosis. Each patient profile was evaluated independently by the LLM and a sleep medicine specialist (SMS) for identification of potential comorbidities or individual health risks. Their recommendations were compared for concordance across lifestyle changes and further medical measures.Results: The LLM achieved high concordance with the SMS for lifestyle modification recommendations, including 100% concordance on smoking cessation (κ = 1; p < 0.001), 97% on alcohol reduction (κ = 0.92; p < 0.001) and endocrinological examination (κ = 0.92; p < 0.001) or 93% on weight loss (κ = 0.86; p < 0.001). However, it exhibited a tendency to over-recommend further medical measures (particularly 57% concordance for cardiological examination (κ = 0.08; p = 0.28) and 33% for gastrointestinal examination (κ = 0.1; p = 0.22)) compared to the SMS.Conclusion: Despite the obvious limitation of using fictitious data, the findings suggest that LLMs like ChatGPT have the potential to complement clinical workflows in sleep medicine by identifying individual health risks and comorbidities. As LLMs continue to evolve, their integration into healthcare could redefine the approach to patient evaluation and risk stratification. Future research should contextualize the findings within broader clinical applications ideally testing locally run LLMs meeting data protection requirements.Keywords: ChatGPT, large language model, obstructive sleep apnea, comorbidities, health risk
format	Article
id	doaj-art-8649a5ad6f6a4c1e9bb9d136fbd0a4fb
institution	DOAJ
issn	1179-1608
language	English
publishDate	2025-04-01
publisher	Dove Medical Press
record_format	Article
series	Nature and Science of Sleep
spelling	doaj-art-8649a5ad6f6a4c1e9bb9d136fbd0a4fb2025-08-20T03:19:07ZengDove Medical PressNature and Science of Sleep1179-16082025-04-01Volume 17677688102500Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 PreviewSeifen CBahr-Hamm KGouveris HPordzik JBlaikie AMatthias CKuhn SBuhr CRChristopher Seifen,1,&ast; Katharina Bahr-Hamm,1,&ast; Haralampos Gouveris,1 Johannes Pordzik,1 Andrew Blaikie,2 Christoph Matthias,1 Sebastian Kuhn,3 Christoph Raphael Buhr1,2 1Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, Germany; 2School of Medicine, University of St Andrews, St Andrews, UK; 3Institute for Digital Medicine, Philipps University Marburg, University Hospital Giessen and Marburg, Marburg, Germany&ast;These authors contributed equally to this workCorrespondence: Christopher Seifen, Sleep Medicine Center & Department of Otolaryngology, Head and Neck Surgery, University Medical Center Mainz, Mainz, 55131, Germany, Email kim.seifen@unimedizin-mainz.dePurpose: Timely identification of comorbidities is critical in sleep medicine, where large language models (LLMs) like ChatGPT are currently emerging as transformative tools. Here, we investigate whether the novel LLM ChatGPT o1 preview can identify individual health risks or potentially existing comorbidities from the medical data of fictitious sleep medicine patients.Methods: We conducted a simulation-based study using 30 fictitious patients, designed to represent realistic variations in demographic and clinical parameters commonly seen in sleep medicine. Each profile included personal data (eg, body mass index, smoking status, drinking habits), blood pressure, and routine blood test results, along with a predefined sleep medicine diagnosis. Each patient profile was evaluated independently by the LLM and a sleep medicine specialist (SMS) for identification of potential comorbidities or individual health risks. Their recommendations were compared for concordance across lifestyle changes and further medical measures.Results: The LLM achieved high concordance with the SMS for lifestyle modification recommendations, including 100% concordance on smoking cessation (κ = 1; p < 0.001), 97% on alcohol reduction (κ = 0.92; p < 0.001) and endocrinological examination (κ = 0.92; p < 0.001) or 93% on weight loss (κ = 0.86; p < 0.001). However, it exhibited a tendency to over-recommend further medical measures (particularly 57% concordance for cardiological examination (κ = 0.08; p = 0.28) and 33% for gastrointestinal examination (κ = 0.1; p = 0.22)) compared to the SMS.Conclusion: Despite the obvious limitation of using fictitious data, the findings suggest that LLMs like ChatGPT have the potential to complement clinical workflows in sleep medicine by identifying individual health risks and comorbidities. As LLMs continue to evolve, their integration into healthcare could redefine the approach to patient evaluation and risk stratification. Future research should contextualize the findings within broader clinical applications ideally testing locally run LLMs meeting data protection requirements.Keywords: ChatGPT, large language model, obstructive sleep apnea, comorbidities, health riskhttps://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSSchatgptlarge language modelobstructive sleep apneacomorbiditieshealth risk
spellingShingle	Seifen C Bahr-Hamm K Gouveris H Pordzik J Blaikie A Matthias C Kuhn S Buhr CR Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview Nature and Science of Sleep chatgpt large language model obstructive sleep apnea comorbidities health risk
title	Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview
title_full	Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview
title_fullStr	Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview
title_full_unstemmed	Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview
title_short	Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview
title_sort	simulation based evaluation of large language models for comorbidity detection in sleep medicine ndash a pilot study on chatgpt o1 preview
topic	chatgpt large language model obstructive sleep apnea comorbidities health risk
url	https://www.dovepress.com/simulation-based-evaluation-of-large-language-models-for-comorbidity-d-peer-reviewed-fulltext-article-NSS
work_keys_str_mv	AT seifenc simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT bahrhammk simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT gouverish simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT pordzikj simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT blaikiea simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT matthiasc simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT kuhns simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview AT buhrcr simulationbasedevaluationoflargelanguagemodelsforcomorbiditydetectioninsleepmedicinendashapilotstudyonchatgpto1preview

Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine &ndash; a Pilot Study on ChatGPT o1 Preview

Similar Items

Simulation-Based Evaluation of Large Language Models for Comorbidity Detection in Sleep Medicine – a Pilot Study on ChatGPT o1 Preview