A novel Swin transformer based framework for speech recognition for dysarthria

Abstract Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson’s disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handling the development of their condition. Se...

Full description

Saved in:
Bibliographic Details
Main Authors: Rabbia Mahum, Ismaila Ganiyu, Lotfi Hidri, Ahmed M. El-Sherbeeny, Haseeb Hassan
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-02042-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849329530952482816
author Rabbia Mahum
Ismaila Ganiyu
Lotfi Hidri
Ahmed M. El-Sherbeeny
Haseeb Hassan
author_facet Rabbia Mahum
Ismaila Ganiyu
Lotfi Hidri
Ahmed M. El-Sherbeeny
Haseeb Hassan
author_sort Rabbia Mahum
collection DOAJ
description Abstract Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson’s disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handling the development of their condition. Several previous studies have concentrated on detecting dysarthria speech using machine learning-based methods. However, the false positive rate is high due to the varying nature of speech and environmental factors such as background noise. Therefore, in this work, we employ a model based on the Swin transformer (ST), namely DSR-Swinoid. Firstly, the speech is converted into mel-spectrograms to reflect the maximum patterns of voice signals. Despite the ST’s initial aim to effectively extract the local and global visual features, it still prioritizes global features. Meanwhile, in mel-spectrograms, the specific gaps due to slurred speech are considered. Therefore, our objective is to improve the ST’s capacity for learning local features by introducing 4 modules: network for local feature capturing (NLF), convolutional patch concatenation, multi-path (MP), and multi-view block (MVB). The NLF module enriches the existing Swin transformer by enhancing its capability to capture local features effectively. MP integrates features from different Swin phases to emphasize local information. In the meantime, the MVB-ST block surpasses classical Swin blocks by integrating diverse receptive fields, focusing on a more comprehensive extraction of local features. Investigational outcomes explain that the DSR-Swinoid attains the best exactness of 98.66%, exceeding the outcomes by existing methods.
format Article
id doaj-art-59f3d559db094511b1d1ab2e3a52c1a3
institution Kabale University
issn 2045-2322
language English
publishDate 2025-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-59f3d559db094511b1d1ab2e3a52c1a32025-08-20T03:47:14ZengNature PortfolioScientific Reports2045-23222025-06-0115111510.1038/s41598-025-02042-7A novel Swin transformer based framework for speech recognition for dysarthriaRabbia Mahum0Ismaila Ganiyu1Lotfi Hidri2Ahmed M. El-Sherbeeny3Haseeb Hassan4Department of Computer Science, University of Engineering and Technology TaxilaIndustrial Engineering Department, College of Engineering, King Saud UniversityIndustrial Engineering Department, College of Engineering, King Saud UniversityIndustrial Engineering Department, College of Engineering, King Saud UniversitySchool of Medicine, The Chinese University of Hong KongAbstract Dysarthria frequently occurs in individuals with disorders such as stroke, Parkinson’s disease, cerebral palsy, and other neurological disorders. Well-timed detection and management of dysarthria in these patients is imperative for efficiently handling the development of their condition. Several previous studies have concentrated on detecting dysarthria speech using machine learning-based methods. However, the false positive rate is high due to the varying nature of speech and environmental factors such as background noise. Therefore, in this work, we employ a model based on the Swin transformer (ST), namely DSR-Swinoid. Firstly, the speech is converted into mel-spectrograms to reflect the maximum patterns of voice signals. Despite the ST’s initial aim to effectively extract the local and global visual features, it still prioritizes global features. Meanwhile, in mel-spectrograms, the specific gaps due to slurred speech are considered. Therefore, our objective is to improve the ST’s capacity for learning local features by introducing 4 modules: network for local feature capturing (NLF), convolutional patch concatenation, multi-path (MP), and multi-view block (MVB). The NLF module enriches the existing Swin transformer by enhancing its capability to capture local features effectively. MP integrates features from different Swin phases to emphasize local information. In the meantime, the MVB-ST block surpasses classical Swin blocks by integrating diverse receptive fields, focusing on a more comprehensive extraction of local features. Investigational outcomes explain that the DSR-Swinoid attains the best exactness of 98.66%, exceeding the outcomes by existing methods.https://doi.org/10.1038/s41598-025-02042-7Global featuresLocal featuresDysarthriaAI in healthcare
spellingShingle Rabbia Mahum
Ismaila Ganiyu
Lotfi Hidri
Ahmed M. El-Sherbeeny
Haseeb Hassan
A novel Swin transformer based framework for speech recognition for dysarthria
Scientific Reports
Global features
Local features
Dysarthria
AI in healthcare
title A novel Swin transformer based framework for speech recognition for dysarthria
title_full A novel Swin transformer based framework for speech recognition for dysarthria
title_fullStr A novel Swin transformer based framework for speech recognition for dysarthria
title_full_unstemmed A novel Swin transformer based framework for speech recognition for dysarthria
title_short A novel Swin transformer based framework for speech recognition for dysarthria
title_sort novel swin transformer based framework for speech recognition for dysarthria
topic Global features
Local features
Dysarthria
AI in healthcare
url https://doi.org/10.1038/s41598-025-02042-7
work_keys_str_mv AT rabbiamahum anovelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT ismailaganiyu anovelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT lotfihidri anovelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT ahmedmelsherbeeny anovelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT haseebhassan anovelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT rabbiamahum novelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT ismailaganiyu novelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT lotfihidri novelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT ahmedmelsherbeeny novelswintransformerbasedframeworkforspeechrecognitionfordysarthria
AT haseebhassan novelswintransformerbasedframeworkforspeechrecognitionfordysarthria