CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN can perform three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), which allow CITISEN to be used as a platform for utilizing and evalu...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2022-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/9718270/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849223135104073728 |
|---|---|
| author | Yu-Wen Chen Kuo-Hsuan Hung You-Jin Li Alexander Chao-Fu Kang Ya-Hsin Lai Kai-Chun Liu Szu-Wei Fu Syu-Siang Wang Yu Tsao |
| author_facet | Yu-Wen Chen Kuo-Hsuan Hung You-Jin Li Alexander Chao-Fu Kang Ya-Hsin Lai Kai-Chun Liu Szu-Wei Fu Syu-Siang Wang Yu Tsao |
| author_sort | Yu-Wen Chen |
| collection | DOAJ |
| description | This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN can perform three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), which allow CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, CITISEN downloads pretrained SE models on the cloud server and then uses these models to effectively reduce noise components from prerecordings or instant recordings provided by users. When it encounters noisy speech signals with unknown speakers or noise types, the MA function allows CITISEN to improve the SE performance effectively. A few audio files of unseen speakers or noise types are recorded and uploaded to the cloud server and then used to adapt the pretrained SE model. Finally, for BNC, CITISEN removes the original background noise using an SE model and then mixes the processed speech signal with new background noise. The novel BNC function can evaluate SE performance under specific conditions, cover people’s tracks, and provide entertainment. The experimental results confirmed the effectiveness of SE, MA, and BNC functions. Compared with the noisy speech signals, the enhanced speech signals by SE achieved about 6% and 33% of improvements, respectively, in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). With MA, the STOI and PESQ could be further improved by approximately 6% and 11%, respectively. Note that the SE model and MA method are not limited to the ones described in this study and can be replaced with any SE model and MA method. Finally, the BNC experiment results indicated that the speech signals of original and converted backgrounds have a close scene identification accuracy and similar embeddings in an acoustic scene classification model. Therefore, the proposed BNC can effectively convert the background noise of a speech signal and be a data augmentation method when clean speech signals are unavailable. |
| format | Article |
| id | doaj-art-b63b9b86615e46eb8293e39aa85c25fb |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2022-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-b63b9b86615e46eb8293e39aa85c25fb2025-08-25T23:00:25ZengIEEEIEEE Access2169-35362022-01-0110460824609910.1109/ACCESS.2022.31534699718270CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile ApplicationYu-Wen Chen0https://orcid.org/0000-0002-7473-0570Kuo-Hsuan Hung1You-Jin Li2Alexander Chao-Fu Kang3https://orcid.org/0000-0001-7625-4910Ya-Hsin Lai4https://orcid.org/0000-0002-6286-8441Kai-Chun Liu5https://orcid.org/0000-0001-7867-4716Szu-Wei Fu6Syu-Siang Wang7https://orcid.org/0000-0002-2652-5521Yu Tsao8https://orcid.org/0000-0001-6956-0418Research Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanResearch Center for Information Technology Innovation, Academia Sinica, Taipei, TaiwanThis study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN can perform three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), which allow CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, CITISEN downloads pretrained SE models on the cloud server and then uses these models to effectively reduce noise components from prerecordings or instant recordings provided by users. When it encounters noisy speech signals with unknown speakers or noise types, the MA function allows CITISEN to improve the SE performance effectively. A few audio files of unseen speakers or noise types are recorded and uploaded to the cloud server and then used to adapt the pretrained SE model. Finally, for BNC, CITISEN removes the original background noise using an SE model and then mixes the processed speech signal with new background noise. The novel BNC function can evaluate SE performance under specific conditions, cover people’s tracks, and provide entertainment. The experimental results confirmed the effectiveness of SE, MA, and BNC functions. Compared with the noisy speech signals, the enhanced speech signals by SE achieved about 6% and 33% of improvements, respectively, in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). With MA, the STOI and PESQ could be further improved by approximately 6% and 11%, respectively. Note that the SE model and MA method are not limited to the ones described in this study and can be replaced with any SE model and MA method. Finally, the BNC experiment results indicated that the speech signals of original and converted backgrounds have a close scene identification accuracy and similar embeddings in an acoustic scene classification model. Therefore, the proposed BNC can effectively convert the background noise of a speech signal and be a data augmentation method when clean speech signals are unavailable.https://ieeexplore.ieee.org/document/9718270/Speech enhancementmodel adaptationbackground noise conversiondeep learningmobile application |
| spellingShingle | Yu-Wen Chen Kuo-Hsuan Hung You-Jin Li Alexander Chao-Fu Kang Ya-Hsin Lai Kai-Chun Liu Szu-Wei Fu Syu-Siang Wang Yu Tsao CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application IEEE Access Speech enhancement model adaptation background noise conversion deep learning mobile application |
| title | CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application |
| title_full | CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application |
| title_fullStr | CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application |
| title_full_unstemmed | CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application |
| title_short | CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application |
| title_sort | citisen a deep learning based speech signal processing mobile application |
| topic | Speech enhancement model adaptation background noise conversion deep learning mobile application |
| url | https://ieeexplore.ieee.org/document/9718270/ |
| work_keys_str_mv | AT yuwenchen citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT kuohsuanhung citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT youjinli citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT alexanderchaofukang citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT yahsinlai citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT kaichunliu citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT szuweifu citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT syusiangwang citisenadeeplearningbasedspeechsignalprocessingmobileapplication AT yutsao citisenadeeplearningbasedspeechsignalprocessingmobileapplication |