Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features

<b>Background:</b> Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Bolaji A. Omodunbi, David B. Olawade, Omosigho F. Awe, Afeez A. Soladoye, Nicholas Aderinto, Saak V. Ovsepian, Stergios Boussios
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Diagnostics
Subjects:	Parkinson’s disease stacked ensemble learning machine learning feature selection predictive analytics
Online Access:	https://www.mdpi.com/2075-4418/15/12/1467
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849472265162326016
author	Bolaji A. Omodunbi David B. Olawade Omosigho F. Awe Afeez A. Soladoye Nicholas Aderinto Saak V. Ovsepian Stergios Boussios
author_facet	Bolaji A. Omodunbi David B. Olawade Omosigho F. Awe Afeez A. Soladoye Nicholas Aderinto Saak V. Ovsepian Stergios Boussios
author_sort	Bolaji A. Omodunbi
collection	DOAJ
description	<b>Background:</b> Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. <b>Methods:</b> An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min–max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques—forward search, gain ratio, and Kruskal–Wallis test—were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. <b>Results:</b> The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system’s methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. <b>Conclusions:</b> By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability.
format	Article
id	doaj-art-c7a57658ad1f4901a7dcff2586c9d0e8
institution	Kabale University
issn	2075-4418
language	English
publishDate	2025-06-01
publisher	MDPI AG
record_format	Article
series	Diagnostics
spelling	doaj-art-c7a57658ad1f4901a7dcff2586c9d0e82025-08-20T03:24:34ZengMDPI AGDiagnostics2075-44182025-06-011512146710.3390/diagnostics15121467Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal FeaturesBolaji A. Omodunbi0David B. Olawade1Omosigho F. Awe2Afeez A. Soladoye3Nicholas Aderinto4Saak V. Ovsepian5Stergios Boussios6Department of Computer Engineering, Federal University Oye-Ekiti, Oye-Ekiti 371104, NigeriaDepartment of Allied and Public Health, School of Health, Sport and Bioscience, University of East London, London E16 2RD, UKDepartment of Computer Engineering, Federal University of Technology Akure, Gaga 340110, NigeriaDepartment of Computer Engineering, Federal University Oye-Ekiti, Oye-Ekiti 371104, NigeriaDepartment of Medicine and Surgery, Ladoke Akintola University of Technology, Ogbomoso 210214, NigeriaFaculty of Engineering and Science, University of Greenwich London, Chatham ME4 4TB, UKDepartment of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, UK<b>Background:</b> Parkinson’s disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. <b>Methods:</b> An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min–max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques—forward search, gain ratio, and Kruskal–Wallis test—were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. <b>Results:</b> The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system’s methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. <b>Conclusions:</b> By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability.https://www.mdpi.com/2075-4418/15/12/1467Parkinson’s diseasestacked ensemble learningmachine learningfeature selectionpredictive analytics
spellingShingle	Bolaji A. Omodunbi David B. Olawade Omosigho F. Awe Afeez A. Soladoye Nicholas Aderinto Saak V. Ovsepian Stergios Boussios Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features Diagnostics Parkinson’s disease stacked ensemble learning machine learning feature selection predictive analytics
title	Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
title_full	Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
title_fullStr	Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
title_full_unstemmed	Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
title_short	Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
title_sort	stacked ensemble learning for classification of parkinson s disease using telemonitoring vocal features
topic	Parkinson’s disease stacked ensemble learning machine learning feature selection predictive analytics
url	https://www.mdpi.com/2075-4418/15/12/1467
work_keys_str_mv	AT bolajiaomodunbi stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures AT davidbolawade stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures AT omosighofawe stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures AT afeezasoladoye stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures AT nicholasaderinto stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures AT saakvovsepian stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures AT stergiosboussios stackedensemblelearningforclassificationofparkinsonsdiseaseusingtelemonitoringvocalfeatures

Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features

Similar Items