Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights

This research investigated the application of deep neural networks for diagnosing diseases that affect the voice and speech mechanisms through the non-invasive analysis of vowel sound recordings. Using the Saarbruecken Voice Database, the voice recordings were converted to spectrograms to train the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Filip Ratajczak, Mikołaj Najda, Kamil Szyc
Format:	Article
Language:	English
Published:	Polish Academy of Sciences 2025-07-01
Series:	International Journal of Electronics and Telecommunications
Subjects:	voice disorder diagnosis vowel sound analysis convolutional neural networks (cnns) explainable artificial intelligence (xai)
Online Access:	https://journals.pan.pl/Content/135742/14_5010_Ratajczak_L_sk.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849470193412079616
author	Filip Ratajczak Mikołaj Najda Kamil Szyc
author_facet	Filip Ratajczak Mikołaj Najda Kamil Szyc
author_sort	Filip Ratajczak
collection	DOAJ
description	This research investigated the application of deep neural networks for diagnosing diseases that affect the voice and speech mechanisms through the non-invasive analysis of vowel sound recordings. Using the Saarbruecken Voice Database, the voice recordings were converted to spectrograms to train the models, specifically focusing on the vowels /a/, /u/, and /i/. The study used Explainable Artificial Intelligence (XAI) methodologies to identify essential features within these spectrograms for pathology identification, with the aim of providing medical professionals with enhanced insight into how diseases manifest in sound production. The F1 Score performance evaluation showed that the DenseNet model scored 0.70 ± 0.03 with a top of 0.74. The findings indicated that neither vowel selection nor data augmentation strategies significantly improved model performance. Additionally, the research highlighted that signal splitting was ineffective in enhancing the models’ ability to extract features. This study builds on our previous research [1], offering a more comprehensive understanding of the topic.
format	Article
id	doaj-art-ddaaf5825cd64186b0ee97bd9b3cdeea
institution	Kabale University
issn	2081-8491 2300-1933
language	English
publishDate	2025-07-01
publisher	Polish Academy of Sciences
record_format	Article
series	International Journal of Electronics and Telecommunications
spelling	doaj-art-ddaaf5825cd64186b0ee97bd9b3cdeea2025-08-20T03:25:12ZengPolish Academy of SciencesInternational Journal of Electronics and Telecommunications2081-84912300-19332025-07-01vol. 71No 3https://doi.org/10.24425/ijet.2025.153621Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insightsFilip Ratajczak0Mikołaj Najda1Kamil Szyc2Faculty of Information and Communication Technology, Wrocław University of Science and Technology, Wrocław, PolandInstitute of Data Science, Maastricht University, The NetherlandsFaculty of Information and Communication Technology, Wrocław University of Science and Technology, Wrocław, PolandThis research investigated the application of deep neural networks for diagnosing diseases that affect the voice and speech mechanisms through the non-invasive analysis of vowel sound recordings. Using the Saarbruecken Voice Database, the voice recordings were converted to spectrograms to train the models, specifically focusing on the vowels /a/, /u/, and /i/. The study used Explainable Artificial Intelligence (XAI) methodologies to identify essential features within these spectrograms for pathology identification, with the aim of providing medical professionals with enhanced insight into how diseases manifest in sound production. The F1 Score performance evaluation showed that the DenseNet model scored 0.70 ± 0.03 with a top of 0.74. The findings indicated that neither vowel selection nor data augmentation strategies significantly improved model performance. Additionally, the research highlighted that signal splitting was ineffective in enhancing the models’ ability to extract features. This study builds on our previous research [1], offering a more comprehensive understanding of the topic.https://journals.pan.pl/Content/135742/14_5010_Ratajczak_L_sk.pdfvoice disorder diagnosisvowel sound analysisconvolutional neural networks (cnns)explainable artificial intelligence (xai)
spellingShingle	Filip Ratajczak Mikołaj Najda Kamil Szyc Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights International Journal of Electronics and Telecommunications voice disorder diagnosis vowel sound analysis convolutional neural networks (cnns) explainable artificial intelligence (xai)
title	Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights
title_full	Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights
title_fullStr	Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights
title_full_unstemmed	Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights
title_short	Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights
title_sort	utilizing cnn architectures for non invasive diagnosis of speech disorders further experiments and insights
topic	voice disorder diagnosis vowel sound analysis convolutional neural networks (cnns) explainable artificial intelligence (xai)
url	https://journals.pan.pl/Content/135742/14_5010_Ratajczak_L_sk.pdf
work_keys_str_mv	AT filipratajczak utilizingcnnarchitecturesfornoninvasivediagnosisofspeechdisordersfurtherexperimentsandinsights AT mikołajnajda utilizingcnnarchitecturesfornoninvasivediagnosisofspeechdisordersfurtherexperimentsandinsights AT kamilszyc utilizingcnnarchitecturesfornoninvasivediagnosisofspeechdisordersfurtherexperimentsandinsights

Utilizing CNN architectures for non-invasive diagnosis of speech disorders – further experiments and insights

Similar Items