Recent advancements in automatic disordered speech recognition: A survey paper

Automatic Speech Recognition (ASR) technology has recently witnessed a paradigm shift with respect to performance accuracy. Nevertheless, impaired speech remains a significant challenge, evidenced by the inadequate accuracy of existing ASR solutions. This lacking is reported in various research repo...

Full description

Saved in:
Bibliographic Details
Main Authors: Nada Gohider, Otman A. Basir
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Natural Language Processing Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S294971912400058X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850256830709628928
author Nada Gohider
Otman A. Basir
author_facet Nada Gohider
Otman A. Basir
author_sort Nada Gohider
collection DOAJ
description Automatic Speech Recognition (ASR) technology has recently witnessed a paradigm shift with respect to performance accuracy. Nevertheless, impaired speech remains a significant challenge, evidenced by the inadequate accuracy of existing ASR solutions. This lacking is reported in various research reports. While this lacking has motivated new directions in Automatic Disordered Speech Recognition (ADSR), the gap between ASR performance accuracy and that of ADSR remains significant. In this paper, we report a consolidated account of research work conducted to date to address this gap, highlighting the root causes of such performance discrepancy and discussing prominent research directions in this area. The paper raises some fundamental issues and challenges that ADSR research faces today. Firstly, we discuss the adequacy of impaired speech representation in existing datasets, in terms of the diversity of speech impairments, speech continuity, speech style, vocabulary, age group, and the environments of the data collection process. We argue that disordered speech is poorly represented in the existing datasets; thus, it is expected that several fundamental components needed for training ADSR models are absent. Most of the open-access databases of impaired speech focus on adult dysarthric speakers, ignoring a wide spectrum of speech disorders and age groups. Furthermore, the paper reviews prominent research directions adopted by the ADSR research community in its effort to advance speech recognition technology for impaired speakers. We categorize this research effort into directions such as personalized models, model adaptation, data augmentation, and multi-modal learning. Although these research directions have advanced the performance of ADSR models, we believe there is still potential for further advancement since current efforts, in essence, make the false assumption that there is a limited distribution shift between the source and target data. Finally, we stress the need to investigate performance measures other than Word Error Rate (WER)- measures that can reliably encode the contribution of erroneous output tokens in the final uttered message.
format Article
id doaj-art-04a8cc13a09446e2b4db3998a8abddf4
institution OA Journals
issn 2949-7191
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Natural Language Processing Journal
spelling doaj-art-04a8cc13a09446e2b4db3998a8abddf42025-08-20T01:56:34ZengElsevierNatural Language Processing Journal2949-71912024-12-01910011010.1016/j.nlp.2024.100110Recent advancements in automatic disordered speech recognition: A survey paperNada Gohider0Otman A. Basir1Corresponding author.; Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, CanadaDepartment of Electrical and Computer Engineering, University of Waterloo, Waterloo, CanadaAutomatic Speech Recognition (ASR) technology has recently witnessed a paradigm shift with respect to performance accuracy. Nevertheless, impaired speech remains a significant challenge, evidenced by the inadequate accuracy of existing ASR solutions. This lacking is reported in various research reports. While this lacking has motivated new directions in Automatic Disordered Speech Recognition (ADSR), the gap between ASR performance accuracy and that of ADSR remains significant. In this paper, we report a consolidated account of research work conducted to date to address this gap, highlighting the root causes of such performance discrepancy and discussing prominent research directions in this area. The paper raises some fundamental issues and challenges that ADSR research faces today. Firstly, we discuss the adequacy of impaired speech representation in existing datasets, in terms of the diversity of speech impairments, speech continuity, speech style, vocabulary, age group, and the environments of the data collection process. We argue that disordered speech is poorly represented in the existing datasets; thus, it is expected that several fundamental components needed for training ADSR models are absent. Most of the open-access databases of impaired speech focus on adult dysarthric speakers, ignoring a wide spectrum of speech disorders and age groups. Furthermore, the paper reviews prominent research directions adopted by the ADSR research community in its effort to advance speech recognition technology for impaired speakers. We categorize this research effort into directions such as personalized models, model adaptation, data augmentation, and multi-modal learning. Although these research directions have advanced the performance of ADSR models, we believe there is still potential for further advancement since current efforts, in essence, make the false assumption that there is a limited distribution shift between the source and target data. Finally, we stress the need to investigate performance measures other than Word Error Rate (WER)- measures that can reliably encode the contribution of erroneous output tokens in the final uttered message.http://www.sciencedirect.com/science/article/pii/S294971912400058XAutomatic Speech RecognitionDisordered SpeechDysarthric SpeechMotor speech DisordersAssistive Technology
spellingShingle Nada Gohider
Otman A. Basir
Recent advancements in automatic disordered speech recognition: A survey paper
Natural Language Processing Journal
Automatic Speech Recognition
Disordered Speech
Dysarthric Speech
Motor speech Disorders
Assistive Technology
title Recent advancements in automatic disordered speech recognition: A survey paper
title_full Recent advancements in automatic disordered speech recognition: A survey paper
title_fullStr Recent advancements in automatic disordered speech recognition: A survey paper
title_full_unstemmed Recent advancements in automatic disordered speech recognition: A survey paper
title_short Recent advancements in automatic disordered speech recognition: A survey paper
title_sort recent advancements in automatic disordered speech recognition a survey paper
topic Automatic Speech Recognition
Disordered Speech
Dysarthric Speech
Motor speech Disorders
Assistive Technology
url http://www.sciencedirect.com/science/article/pii/S294971912400058X
work_keys_str_mv AT nadagohider recentadvancementsinautomaticdisorderedspeechrecognitionasurveypaper
AT otmanabasir recentadvancementsinautomaticdisorderedspeechrecognitionasurveypaper