A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers
Abstract Recent advances in artificial intelligence-based audio and speech processing have increasingly focused on the binary and multi-class classification of voice disorders. Despite progress, achieving high accuracy in multi-class classification remains challenging. This paper proposes a novel hy...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | BMC Medical Informatics and Decision Making |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12911-025-02978-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849314734462992384 |
|---|---|
| author | Mehtab Ur Rahman Cem Direkoglu |
| author_facet | Mehtab Ur Rahman Cem Direkoglu |
| author_sort | Mehtab Ur Rahman |
| collection | DOAJ |
| description | Abstract Recent advances in artificial intelligence-based audio and speech processing have increasingly focused on the binary and multi-class classification of voice disorders. Despite progress, achieving high accuracy in multi-class classification remains challenging. This paper proposes a novel hybrid approach using a two-stage framework to enhance voice disorders classification performance, and achieve state-of-the-art accuracies in multi-class classification. Our hybrid approach, combines deep learning features with various powerful classifiers. In the first stage, high-level feature embeddings are extracted from voice data spectrograms using a pre-trained VGGish model. In the second stage, these embeddings are used as input to four different classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Multi-Layer Perceptron (MLP), and an Ensemble Classifier (EC). Experiments are conducted on a subset of the Saarbruecken Voice Database (SVD) for male, female, and combined speakers. For binary classification, VGGish-SVM achieved the highest accuracy for male speakers (82.45% for healthy vs. disordered; 75.45% for hyperfunctional dysphonia vs. vocal fold paresis), while VGGish-EC performed best for female speakers (71.54% for healthy vs. disordered; 68.42% for hyperfunctional dysphonia vs. vocal fold paresis). In multi-class classification, VGGish-SVM outperformed other models, achieving mean accuracies of 77.81% for male speakers, 63.11% for female speakers, and 70.53% for combined genders. We conducted a comparative analysis against related works, including the Mel frequency cepstral coefficient (MFCC), MFCC-glottal features, and features extracted using the wav2vec and HuBERT models with SVM classifier. Results demonstrate that our hybrid approach consistently outperforms these models, especially in multi-class classification tasks. The results show the feasibility of a hybrid framework for voice disorder classification, offering a foundation for refining automated tools that could support clinical assessments with further validation. |
| format | Article |
| id | doaj-art-e51a30b85c28413ea96794fd268f3b58 |
| institution | Kabale University |
| issn | 1472-6947 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Informatics and Decision Making |
| spelling | doaj-art-e51a30b85c28413ea96794fd268f3b582025-08-20T03:52:23ZengBMCBMC Medical Informatics and Decision Making1472-69472025-05-0125111410.1186/s12911-025-02978-wA hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiersMehtab Ur Rahman0Cem Direkoglu1Department of Language and Communication, Radboud UniversityElectrical and Electronics Engineering Department, Middle East Technical UniversityAbstract Recent advances in artificial intelligence-based audio and speech processing have increasingly focused on the binary and multi-class classification of voice disorders. Despite progress, achieving high accuracy in multi-class classification remains challenging. This paper proposes a novel hybrid approach using a two-stage framework to enhance voice disorders classification performance, and achieve state-of-the-art accuracies in multi-class classification. Our hybrid approach, combines deep learning features with various powerful classifiers. In the first stage, high-level feature embeddings are extracted from voice data spectrograms using a pre-trained VGGish model. In the second stage, these embeddings are used as input to four different classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Multi-Layer Perceptron (MLP), and an Ensemble Classifier (EC). Experiments are conducted on a subset of the Saarbruecken Voice Database (SVD) for male, female, and combined speakers. For binary classification, VGGish-SVM achieved the highest accuracy for male speakers (82.45% for healthy vs. disordered; 75.45% for hyperfunctional dysphonia vs. vocal fold paresis), while VGGish-EC performed best for female speakers (71.54% for healthy vs. disordered; 68.42% for hyperfunctional dysphonia vs. vocal fold paresis). In multi-class classification, VGGish-SVM outperformed other models, achieving mean accuracies of 77.81% for male speakers, 63.11% for female speakers, and 70.53% for combined genders. We conducted a comparative analysis against related works, including the Mel frequency cepstral coefficient (MFCC), MFCC-glottal features, and features extracted using the wav2vec and HuBERT models with SVM classifier. Results demonstrate that our hybrid approach consistently outperforms these models, especially in multi-class classification tasks. The results show the feasibility of a hybrid framework for voice disorder classification, offering a foundation for refining automated tools that could support clinical assessments with further validation.https://doi.org/10.1186/s12911-025-02978-wVoice disordersMulti-class classificationEnsemble classifierVGGish |
| spellingShingle | Mehtab Ur Rahman Cem Direkoglu A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers BMC Medical Informatics and Decision Making Voice disorders Multi-class classification Ensemble classifier VGGish |
| title | A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers |
| title_full | A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers |
| title_fullStr | A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers |
| title_full_unstemmed | A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers |
| title_short | A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers |
| title_sort | hybrid approach for binary and multi class classification of voice disorders using a pre trained model and ensemble classifiers |
| topic | Voice disorders Multi-class classification Ensemble classifier VGGish |
| url | https://doi.org/10.1186/s12911-025-02978-w |
| work_keys_str_mv | AT mehtaburrahman ahybridapproachforbinaryandmulticlassclassificationofvoicedisordersusingapretrainedmodelandensembleclassifiers AT cemdirekoglu ahybridapproachforbinaryandmulticlassclassificationofvoicedisordersusingapretrainedmodelandensembleclassifiers AT mehtaburrahman hybridapproachforbinaryandmulticlassclassificationofvoicedisordersusingapretrainedmodelandensembleclassifiers AT cemdirekoglu hybridapproachforbinaryandmulticlassclassificationofvoicedisordersusingapretrainedmodelandensembleclassifiers |