High-Quality Text-to-Speech Implementation via Active Shallow Diffusion Mechanism
Denoising diffusion probabilistic models (DDPMs) have proven to be useful in text-to-speech (TTS) tasks; however, it has been a challenge for traditional diffusion models to carry out real-time processing because of the need for hundreds of sampling steps during the iteration. In this work, a two-st...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-01-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/3/833 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Denoising diffusion probabilistic models (DDPMs) have proven to be useful in text-to-speech (TTS) tasks; however, it has been a challenge for traditional diffusion models to carry out real-time processing because of the need for hundreds of sampling steps during the iteration. In this work, a two-stage fast inference and efficient diffusion-based acoustic model of TTS, the Cascaded MixGAN-TTS (CMG-TTS), is proposed to address this problem. An active shallow diffusion mechanism is adopted to divide the CMG-TTS training process into two stages. Specifically, a basic acoustic model in the first stage is trained to provide valuable a priori knowledge for the second stage, and for the underlying acoustic modeling, a mixture combination mechanism-based linguistic encoder is introduced to work with pitch and energy predictors. In the following stage of processing, a post-net is used to optimize the mel-spectrogram reconstruction performance. The CMG-TTS is evaluated on datasets such as the AISHELL3 and LJSpeech, and the experiments show that the CMG-TTS achieves satisfactory results in both subjective and objective evaluation metrics with only one denoising step. Compared to other TTS models based on diffusion modeling, the CMG-TTS obtains a leading score in the real time factor (RTF), and both stages of the CMG-TTS are effective in the ablation studies. |
|---|---|
| ISSN: | 1424-8220 |