Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation. Therefore, we propose a novel method to improve...
Saved in:
| Main Authors: | Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2022-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/9775804/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
ECE-TTS: A Zero-Shot Emotion Text-to-Speech Model with Simplified and Precise Control
by: Shixiong Liang, et al.
Published: (2025-05-01) -
End-to-End Multi-Speaker FastSpeech2 With Hierarchical Decoder
by: Majid Adibian, et al.
Published: (2025-01-01) -
Automatic development of speech-in-noise hearing tests using machine learning
by: Sigrid Polspoel, et al.
Published: (2025-04-01) -
Three Years of VoiceMOS Challenges: Lessons Learned by the UWB-NTIS-TTS Team
by: Marie Kunesova, et al.
Published: (2025-01-01) -
Using casual speech phonology in synthetic speech
by: Linda SHOCKEY
Published: (2014-04-01)