Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System

Dysarthria, a motor speech disorder, impairs the muscles involved in speech production, leading to challenges in articulation, pronunciation, and overall communication. This results in slow, slurred speech that is difficult to understand. Augmentative and Alternative Communication (AAC) aids integra...

Full description

Saved in:
Bibliographic Details
Main Authors: R. Vinotha, D. Hepsiba, L. D. Vijay Anand, P. Malin Bruntha, Linh Dinh, Hien Dang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10993356/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728005186781184
author R. Vinotha
D. Hepsiba
L. D. Vijay Anand
P. Malin Bruntha
Linh Dinh
Hien Dang
author_facet R. Vinotha
D. Hepsiba
L. D. Vijay Anand
P. Malin Bruntha
Linh Dinh
Hien Dang
author_sort R. Vinotha
collection DOAJ
description Dysarthria, a motor speech disorder, impairs the muscles involved in speech production, leading to challenges in articulation, pronunciation, and overall communication. This results in slow, slurred speech that is difficult to understand. Augmentative and Alternative Communication (AAC) aids integrated with speech recognition technology offer a promising solution for individuals with dysarthria. However, Automatic Speech Recognition (ASR) systems trained on typical speech data often struggle to recognize dysarthric speech due to its unique speech patterns and limited training data. To address these challenges, a hybrid Transformer-CTC model has been proposed for improving ASR performance on dysarthric speech. The Transformer architecture employs a self-attention mechanism that models complex dependencies between speech features, enabling it to identify and emphasize important patterns even when training data is limited. This ability is particularly crucial for dysarthric speech, where speech signals often exhibit high variability. On the other hand, Connectionist Temporal Classification (CTC) acts as an effective transcription layer. It aligns speech features with character sequences without requiring precise input-output alignment, making it well-suited for handling the inconsistencies and distortions present in dysarthric speech. The integration of these components creates a powerful architecture capable of learning nuanced speech patterns and delivering accurate transcriptions for dysarthric speech. The model was trained using the UA speech corpus, containing 13 hours of speech from 15 speakers with varying dysarthria levels. The proposed hybrid system achieves an impressive Word Recognition Accuracy (WRA) of 89%, demonstrating its effectiveness in accurately transcribing dysarthric speech. This innovative approach significantly advances the development of ASR technologies tailored to diverse and variable speech patterns, ultimately enhancing communication for individuals with speech disorders.
format Article
id doaj-art-d37921bf3e2a48baa07ca083db53a215
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d37921bf3e2a48baa07ca083db53a2152025-08-20T03:09:42ZengIEEEIEEE Access2169-35362025-01-0113824798249110.1109/ACCESS.2025.356834210993356Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition SystemR. Vinotha0https://orcid.org/0000-0003-1779-4962D. Hepsiba1L. D. Vijay Anand2P. Malin Bruntha3https://orcid.org/0000-0001-8749-993XLinh Dinh4Hien Dang5https://orcid.org/0000-0002-7112-9966Department of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Biomedical Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, IndiaDepartment of Information Systems, Suffolk University, Boston, MA, USAFaculty of Computer Science and Engineering, Thuyloi University, Hanoi, VietnamDysarthria, a motor speech disorder, impairs the muscles involved in speech production, leading to challenges in articulation, pronunciation, and overall communication. This results in slow, slurred speech that is difficult to understand. Augmentative and Alternative Communication (AAC) aids integrated with speech recognition technology offer a promising solution for individuals with dysarthria. However, Automatic Speech Recognition (ASR) systems trained on typical speech data often struggle to recognize dysarthric speech due to its unique speech patterns and limited training data. To address these challenges, a hybrid Transformer-CTC model has been proposed for improving ASR performance on dysarthric speech. The Transformer architecture employs a self-attention mechanism that models complex dependencies between speech features, enabling it to identify and emphasize important patterns even when training data is limited. This ability is particularly crucial for dysarthric speech, where speech signals often exhibit high variability. On the other hand, Connectionist Temporal Classification (CTC) acts as an effective transcription layer. It aligns speech features with character sequences without requiring precise input-output alignment, making it well-suited for handling the inconsistencies and distortions present in dysarthric speech. The integration of these components creates a powerful architecture capable of learning nuanced speech patterns and delivering accurate transcriptions for dysarthric speech. The model was trained using the UA speech corpus, containing 13 hours of speech from 15 speakers with varying dysarthria levels. The proposed hybrid system achieves an impressive Word Recognition Accuracy (WRA) of 89%, demonstrating its effectiveness in accurately transcribing dysarthric speech. This innovative approach significantly advances the development of ASR technologies tailored to diverse and variable speech patterns, ultimately enhancing communication for individuals with speech disorders.https://ieeexplore.ieee.org/document/10993356/Dysarthriaspeech recognitionTransformerdeep learningassistive technologyCTC
spellingShingle R. Vinotha
D. Hepsiba
L. D. Vijay Anand
P. Malin Bruntha
Linh Dinh
Hien Dang
Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
IEEE Access
Dysarthria
speech recognition
Transformer
deep learning
assistive technology
CTC
title Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
title_full Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
title_fullStr Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
title_full_unstemmed Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
title_short Empowering Dysarthric Communication: Hybrid Transformer-CTC-Based Speech Recognition System
title_sort empowering dysarthric communication hybrid transformer ctc based speech recognition system
topic Dysarthria
speech recognition
Transformer
deep learning
assistive technology
CTC
url https://ieeexplore.ieee.org/document/10993356/
work_keys_str_mv AT rvinotha empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem
AT dhepsiba empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem
AT ldvijayanand empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem
AT pmalinbruntha empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem
AT linhdinh empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem
AT hiendang empoweringdysarthriccommunicationhybridtransformerctcbasedspeechrecognitionsystem