Efficient Feature Selection and Hyperparameter Tuning for Improved Speech Signal-Based Parkinson’s Disease Diagnosis via Machine Learning Techniques

Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpr...

Full description

Saved in:
Bibliographic Details
Main Authors: Deepak Painuli, Suyash Bhardwaj, Utku Kose
Format: Article
Language:English
Published: Tehran University of Medical Sciences 2025-01-01
Series:Health Technology Assessment in Action
Subjects:
Online Access:https://htainaction.tums.ac.ir/index.php/hta/article/view/275
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Parkinson’s disease (PD) is a neurodegenerative disorder that progressively worsens with age, particularly affecting the elderly. Symptoms of PD include visual hallucinations, depression, autonomic dysfunction, and motor difficulties. Conventional diagnostic methods often rely on subjective interpretations of movement, which can be subtle and challenging to assess accurately, potentially leading to misdiagnoses. However, recent studies indicate that over 90% of individuals with PD exhibit vocal abnormalities at the onset of the disease. Machine learning (ML) techniques have shown promise in addressing these diagnostic challenges due to their higher efficiency and reduced error rates in analyzing complex, high-dimensional datasets, particularly those derived from speech signals. This study investigates 12 machine learning models—logistic regression (LR), support vector machine (SVM, linear/RBF), K-nearest neighbor (KNN), Naïve bayes (NB), decision tree (DT), random forest (RF), extra trees (ET), gradient boosting (GbBoost), extreme gradient boosting (XgBoost), adaboost, and multi-layer perceptron (MLP)—to develop a robust ML model capable of reliably identifying PD cases. The analysis utilized a PD voice dataset comprising 756 acoustic samples from 252 participants, including 188 individuals with PD and 64 healthy controls. The dataset included 130 male and 122 female subjects, with age ranges of 33 - 87 years and 41 - 82 years, respectively. To enhance model performance, the GridSearchCV method was employed for hyperparameter tuning, alongside recursive feature elimination (RFE) and minimum redundancy maximum relevance (mRMR) feature selection techniques. Among the 12 ML models evaluated, the RF model with the RFE-generated feature subset (RFE-50) emerged as the top performer. It achieved an accuracy of 96.46%, a recall of 0.96, a precision of 0.97, an F1-score of 0.96, and an AUC score of 0.998, marking the highest performance metrics reported for this dataset in recent studies.
ISSN:2645-3835