A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.

In recent years, empowered by artificial intelligence technologies, computer-assisted language learning systems have gradually become a hot topic of research. Currently, the mainstream pronunciation assessment models rely on advanced speech recognition technology, converting speech into phoneme sequ...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhaohui Du, Xiaofeng Zhao, Lin Li, Baohua Yu, Lijiang Miao
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0324048
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849731738656309248
author	Zhaohui Du Xiaofeng Zhao Lin Li Baohua Yu Lijiang Miao
author_facet	Zhaohui Du Xiaofeng Zhao Lin Li Baohua Yu Lijiang Miao
author_sort	Zhaohui Du
collection	DOAJ
description	In recent years, empowered by artificial intelligence technologies, computer-assisted language learning systems have gradually become a hot topic of research. Currently, the mainstream pronunciation assessment models rely on advanced speech recognition technology, converting speech into phoneme sequences, and then determining mispronounced phonemes through sequence comparison. To optimize the phoneme recognition task in pronunciation evaluation, this paper proposes a Chinese pronunciation phoneme recognition model based on the improved Zipformer-RNN-T(Pruned) architecture, aiming to improve recognition accuracy and reduce parameter count. First, the AISHELL1-PHONEME and ST-CMDS-PHONEME datasets for Mandarin phoneme recognition through data preprocessing. Then, three layers of the Zipformer Block architecture are introduced into the Zipformer encoder to significantly enhance model performance. In the stateless Pred Network, the GELU activation function is adopted to effectively prevent neuron deactivation. Furthermore, a hybrid Pruned RNN-T/CTC Loss fusion strategy is proposed, further optimizing recognition performance. The experimental results demonstrate that the method performs excellently in the phoneme recognition task, achieving a Word Error Rate (WER) of 1.92% (Dev) and 2.12% (Test) on the AISHELL1-PHONEME dataset, and 4.28% (Dev) and 4.51% (Test) on the ST-CMDS-PHONEME dataset. Moreover, the model requires only 61.1M parameters, striking a balance between performance and efficiency.
format	Article
id	doaj-art-352f495d96ea45e7b06131729d227aad
institution	DOAJ
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-352f495d96ea45e7b06131729d227aad2025-08-20T03:08:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032404810.1371/journal.pone.0324048A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.Zhaohui DuXiaofeng ZhaoLin LiBaohua YuLijiang MiaoIn recent years, empowered by artificial intelligence technologies, computer-assisted language learning systems have gradually become a hot topic of research. Currently, the mainstream pronunciation assessment models rely on advanced speech recognition technology, converting speech into phoneme sequences, and then determining mispronounced phonemes through sequence comparison. To optimize the phoneme recognition task in pronunciation evaluation, this paper proposes a Chinese pronunciation phoneme recognition model based on the improved Zipformer-RNN-T(Pruned) architecture, aiming to improve recognition accuracy and reduce parameter count. First, the AISHELL1-PHONEME and ST-CMDS-PHONEME datasets for Mandarin phoneme recognition through data preprocessing. Then, three layers of the Zipformer Block architecture are introduced into the Zipformer encoder to significantly enhance model performance. In the stateless Pred Network, the GELU activation function is adopted to effectively prevent neuron deactivation. Furthermore, a hybrid Pruned RNN-T/CTC Loss fusion strategy is proposed, further optimizing recognition performance. The experimental results demonstrate that the method performs excellently in the phoneme recognition task, achieving a Word Error Rate (WER) of 1.92% (Dev) and 2.12% (Test) on the AISHELL1-PHONEME dataset, and 4.28% (Dev) and 4.51% (Test) on the ST-CMDS-PHONEME dataset. Moreover, the model requires only 61.1M parameters, striking a balance between performance and efficiency.https://doi.org/10.1371/journal.pone.0324048
spellingShingle	Zhaohui Du Xiaofeng Zhao Lin Li Baohua Yu Lijiang Miao A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling. PLoS ONE
title	A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.
title_full	A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.
title_fullStr	A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.
title_full_unstemmed	A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.
title_short	A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.
title_sort	study on phonemes recognition method for mandarin pronunciation based on improved zipformer rnn t pruned modeling
url	https://doi.org/10.1371/journal.pone.0324048
work_keys_str_mv	AT zhaohuidu astudyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT xiaofengzhao astudyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT linli astudyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT baohuayu astudyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT lijiangmiao astudyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT zhaohuidu studyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT xiaofengzhao studyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT linli studyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT baohuayu studyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling AT lijiangmiao studyonphonemesrecognitionmethodformandarinpronunciationbasedonimprovedzipformerrnntprunedmodeling

A study on phonemes recognition method for Mandarin pronunciation based on improved Zipformer-RNN-T(Pruned) modeling.

Similar Items