Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
Dysarthria, a motor speech disorder, impairs the muscles involved in speech production, leading to challenges in articulation, pronunciation, and overall communication. This results in slow, slurred speech that is difficult to understand. Augmentative and Alternative Communication (AAC) aids integra...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10993356/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849728005186781184 |
|---|---|
| author | R. Vinotha D. Hepsiba L. D. Vijay Anand P. Malin Bruntha Linh Dinh Hien Dang |
| author_facet | R. Vinotha D. Hepsiba L. D. Vijay Anand P. Malin Bruntha Linh Dinh Hien Dang |
| author_sort | R. Vinotha |
| collection | DOAJ |
| description | Dysarthria, a motor speech disorder, impairs the muscles involved in speech production, leading to challenges in articulation, pronunciation, and overall communication. This results in slow, slurred speech that is difficult to understand. Augmentative and Alternative Communication (AAC) aids integrated with speech recognition technology offer a promising solution for individuals with dysarthria. However, Automatic Speech Recognition (ASR) systems trained on typical speech data often struggle to recognize dysarthric speech due to its unique speech patterns and limited training data. To address these challenges, a hybrid Transformer-CTC model has been proposed for improving ASR performance on dysarthric speech. The Transformer architecture employs a self-attention mechanism that models complex dependencies between speech features, enabling it to identify and emphasize important patterns even when training data is limited. This ability is particularly crucial for dysarthric speech, where speech signals often exhibit high variability. On the other hand, Connectionist Temporal Classification (CTC) acts as an effective transcription layer. It aligns speech features with character sequences without requiring precise input-output alignment, making it well-suited for handling the inconsistencies and distortions present in dysarthric speech. The integration of these components creates a powerful architecture capable of learning nuanced speech patterns and delivering accurate transcriptions for dysarthric speech. The model was trained using the UA speech corpus, containing 13 hours of speech from 15 speakers with varying dysarthria levels. The proposed hybrid system achieves an impressive Word Recognition Accuracy (WRA) of 89%, demonstrating its effectiveness in accurately transcribing dysarthric speech. This innovative approach significantly advances the development of ASR technologies tailored to diverse and variable speech patterns, ultimately enhancing communication for individuals with speech disorders. |
| format | Article |
| id | doaj-art-d37921bf3e2a48baa07ca083db53a215 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-d37921bf3e2a48baa07ca083db53a2152025-08-20T03:09:42ZengIEEEIEEE Access2169-35362025-01-0113824798249110.1109/ACCESS.2025.356834210993356Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition SystemR. Vinotha0https://orcid.org/0000-0003-1779-4962D. Hepsiba1L. D. Vijay Anand2P. Malin Bruntha3https://orcid.org/0000-0001-8749-993XLinh Dinh4Hien Dang5https://orcid.org/0000-0002-7112-9966Department of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Biomedical Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Information Systems, Suffolk University, Boston, MA, USAFaculty of Computer Science and Engineering, Thuyloi University, Hanoi, VietnamDysarthria, a motor speech disorder, impairs the muscles involved in speech production, leading to challenges in articulation, pronunciation, and overall communication. This results in slow, slurred speech that is difficult to understand. Augmentative and Alternative Communication (AAC) aids integrated with speech recognition technology offer a promising solution for individuals with dysarthria. However, Automatic Speech Recognition (ASR) systems trained on typical speech data often struggle to recognize dysarthric speech due to its unique speech patterns and limited training data. To address these challenges, a hybrid Transformer-CTC model has been proposed for improving ASR performance on dysarthric speech. The Transformer architecture employs a self-attention mechanism that models complex dependencies between speech features, enabling it to identify and emphasize important patterns even when training data is limited. This ability is particularly crucial for dysarthric speech, where speech signals often exhibit high variability. On the other hand, Connectionist Temporal Classification (CTC) acts as an effective transcription layer. It aligns speech features with character sequences without requiring precise input-output alignment, making it well-suited for handling the inconsistencies and distortions present in dysarthric speech. The integration of these components creates a powerful architecture capable of learning nuanced speech patterns and delivering accurate transcriptions for dysarthric speech. The model was trained using the UA speech corpus, containing 13 hours of speech from 15 speakers with varying dysarthria levels. The proposed hybrid system achieves an impressive Word Recognition Accuracy (WRA) of 89%, demonstrating its effectiveness in accurately transcribing dysarthric speech. This innovative approach significantly advances the development of ASR technologies tailored to diverse and variable speech patterns, ultimately enhancing communication for individuals with speech disorders.https://ieeexplore.ieee.org/document/10993356/Dysarthriaspeech recognitionTransformerdeep learningassistive technologyCTC |
| spellingShingle | R. Vinotha D. Hepsiba L. D. Vijay Anand P. Malin Bruntha Linh Dinh Hien Dang Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System IEEE Access Dysarthria speech recognition Transformer deep learning assistive technology CTC |
| title | Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System |
| title_full | Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System |
| title_fullStr | Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System |
| title_full_unstemmed | Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System |
| title_short | Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System |
| title_sort | empowering dysarthric communication hybrid transformer ctc based speech recognition system |
| topic | Dysarthria speech recognition Transformer deep learning assistive technology CTC |
| url | https://ieeexplore.ieee.org/document/10993356/ |
| work_keys_str_mv | AT rvinotha empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem AT dhepsiba empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem AT ldvijayanand empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem AT pmalinbruntha empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem AT linhdinh empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem AT hiendang empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem |