Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques

Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpr...

Full description

Saved in:
Bibliographic Details
Main Authors: Deepak Painuli, Suyash Bhardwaj, Utku Kose
Format: Article
Language:English
Published: Tehran University of Medical Sciences 2025-01-01
Series:Health Technology Assessment in Action
Subjects:
Online Access:https://htainaction.tums.ac.ir/index.php/hta/article/view/275
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823863901302816768
author Deepak Painuli
Suyash Bhardwaj
Utku Kose
author_facet Deepak Painuli
Suyash Bhardwaj
Utku Kose
author_sort Deepak Painuli
collection DOAJ
description Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpretations of movement, which can be subtle and challenging to assess accurately, potentially leading to misdiagnoses. However, recent studies indicate that over 90% of individuals with PD exhibit vocal abnormalities at the onset of the disease. Machine learning (ML) techniques have shown promise in addressing these diagnostic challenges due to their higher efficiency and reduced error rates in analyzing complex, high-dimensional datasets, particularly those derived from speech signals. This study investigates 12 machine learning models—logistic regression (LR), support vector machine (SVM, linear/RBF), K-nearest neighbor (KNN), Naïve bayes (NB), decision tree (DT), random forest (RF), extra trees (ET), gradient boosting (GbBoost), extreme gradient boosting (XgBoost), adaboost, and multi-layer perceptron (MLP)—to develop a robust ML model capable of reliably identifying PD cases. The analysis utilized a PD voice dataset comprising 756 acoustic samples from 252 participants, including 188 individuals with PD and 64 healthy controls. The dataset included 130 male and 122 female subjects, with age ranges of 33 - 87 years and 41 - 82 years, respectively. To enhance model performance, the GridSearchCV method was employed for hyperparameter tuning, alongside recursive feature elimination (RFE) and minimum redundancy maximum relevance (mRMR) feature selection techniques. Among the 12 ML models evaluated, the RF model with the RFE-generated feature subset (RFE-50) emerged as the top performer. It achieved an accuracy of 96.46%, a recall of 0.96, a precision of 0.97, an F1-score of 0.96, and an AUC score of 0.998, marking the highest performance metrics reported for this dataset in recent studies.
format Article
id doaj-art-317218c5e1414f7cac13741c49d00f7e
institution Kabale University
issn 2645-3835
language English
publishDate 2025-01-01
publisher Tehran University of Medical Sciences
record_format Article
series Health Technology Assessment in Action
spelling doaj-art-317218c5e1414f7cac13741c49d00f7e2025-02-09T08:59:45ZengTehran University of Medical SciencesHealth Technology Assessment in Action2645-38352025-01-0191Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning TechniquesDeepak Painuli0Suyash Bhardwaj1Utku Kose2COER University, Roorkee, IndiaGurukula Kangri Vishwavidyalaya, Haridwar, IndiaSuleyman Demirel University, Isparta, Turkey Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpretations of movement, which can be subtle and challenging to assess accurately, potentially leading to misdiagnoses. However, recent studies indicate that over 90% of individuals with PD exhibit vocal abnormalities at the onset of the disease. Machine learning (ML) techniques have shown promise in addressing these diagnostic challenges due to their higher efficiency and reduced error rates in analyzing complex, high-dimensional datasets, particularly those derived from speech signals. This study investigates 12 machine learning models—logistic regression (LR), support vector machine (SVM, linear/RBF), K-nearest neighbor (KNN), Naïve bayes (NB), decision tree (DT), random forest (RF), extra trees (ET), gradient boosting (GbBoost), extreme gradient boosting (XgBoost), adaboost, and multi-layer perceptron (MLP)—to develop a robust ML model capable of reliably identifying PD cases. The analysis utilized a PD voice dataset comprising 756 acoustic samples from 252 participants, including 188 individuals with PD and 64 healthy controls. The dataset included 130 male and 122 female subjects, with age ranges of 33 - 87 years and 41 - 82 years, respectively. To enhance model performance, the GridSearchCV method was employed for hyperparameter tuning, alongside recursive feature elimination (RFE) and minimum redundancy maximum relevance (mRMR) feature selection techniques. Among the 12 ML models evaluated, the RF model with the RFE-generated feature subset (RFE-50) emerged as the top performer. It achieved an accuracy of 96.46%, a recall of 0.96, a precision of 0.97, an F1-score of 0.96, and an AUC score of 0.998, marking the highest performance metrics reported for this dataset in recent studies. https://htainaction.tums.ac.ir/index.php/hta/article/view/275Medical DiagnosisParkinson's diseaseMachine LearningData preprocessingFeature selectionGridSearchCV
spellingShingle Deepak Painuli
Suyash Bhardwaj
Utku Kose
Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
Health Technology Assessment in Action
Medical Diagnosis
Parkinson's disease
Machine Learning
Data preprocessing
Feature selection
GridSearchCV
title Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
title_full Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
title_fullStr Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
title_full_unstemmed Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
title_short Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques
title_sort efficient feature selection and hyperparameter tuning for improved speech signal based parkinson s disease diagnosis via machine learning techniques
topic Medical Diagnosis
Parkinson's disease
Machine Learning
Data preprocessing
Feature selection
GridSearchCV
url https://htainaction.tums.ac.ir/index.php/hta/article/view/275
work_keys_str_mv AT deepakpainuli efficientfeatureselectionandhyperparametertuningforimprovedspeechsignalbasedparkinsonsdiseasediagnosisviamachinelearningtechniques
AT suyashbhardwaj efficientfeatureselectionandhyperparametertuningforimprovedspeechsignalbasedparkinsonsdiseasediagnosisviamachinelearningtechniques
AT utkukose efficientfeatureselectionandhyperparametertuningforimprovedspeechsignalbasedparkinsonsdiseasediagnosisviamachinelearningtechniques