From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The <i>sleepCare</i> Platform

<b>Background/Objectives</b>: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Me...

Full description

Saved in:
Bibliographic Details
Main Author: Christos A. Frantzidis
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Brain Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3425/15/7/667
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<b>Background/Objectives</b>: Sleep disorders are prevalent among aging populations and are often linked to cognitive decline, chronic conditions, and reduced quality of life. Traditional diagnostic methods, such as polysomnography, are resource-intensive and limited in accessibility. Meanwhile, individuals frequently describe their sleep experiences through unstructured narratives in clinical notes, online forums, and telehealth platforms. This study proposes a machine learning pipeline (<i><b>sleepCare</b></i>) that classifies sleep-related narratives into clinically meaningful categories, including stress-related, neurodegenerative, and breathing-related disorders. The proposed framework employs natural language processing (NLP) and machine learning techniques to support remote applications and real-time patient monitoring, offering a scalable solution for the early identification of sleep disturbances. <b>Methods</b>: The <i><b>sleepCare</b></i> consists of a three-tiered classification pipeline to analyze narrative sleep reports. First, a baseline model used a Multinomial Naïve Bayes classifier with n-gram features from a Bag-of-Words representation. Next, a Support Vector Machine (SVM) was trained on GloVe-based word embeddings to capture semantic context. Finally, a transformer-based model (BERT) was fine-tuned to extract contextual embeddings, using the [CLS] token as input for SVM classification. Each model was evaluated using stratified train-test splits and 10-fold cross-validation. Hyperparameter tuning via GridSearchCV optimized performance. The dataset contained 475 labeled sleep narratives, classified into five etiological categories relevant for clinical interpretation. <b>Results</b>: The transformer-based model utilizing BERT embeddings and an optimized Support Vector Machine classifier achieved an overall accuracy of <b>81%</b> on the test set. Class-wise F1-scores ranged from <b>0.72 to 0.91</b>, with the highest performance observed in classifying <b>normal or improved sleep</b> (F1 = 0.91). The <b>macro average F1-score</b> was <b>0.78</b>, indicating balanced performance across all categories. GridSearchCV identified the optimal SVM parameters (C = 4, kernel = ‘rbf’, gamma = 0.01, degree = 2, class_weight = ‘balanced’). The confusion matrix revealed robust classification with limited misclassifications, particularly between overlapping symptom categories such as stress-related and neurodegenerative sleep disturbances. <b>Conclusions</b>: Unlike generic large language model applications, our approach emphasizes the <b>personalized identification of sleep symptomatology</b> through targeted classification of the narrative input. By integrating structured learning with contextual embeddings, the framework offers a <b>clinically meaningful</b>, scalable solution for early detection and differentiation of sleep disorders in diverse, real-world, and remote settings.
ISSN:2076-3425