Phoneme-Level Duration Controllable Neural Text-to-Speech With Phoneme Embedding Skip Connection and Modified Gaussian Duration Modeling

This paper proposes a simple but practically important and effective approach to improve phoneme duration expansion and contraction control in neural text-to-speech (TTS) systems for modifying the speaking rate of synthesized speech. The use of simple uniform expansion and contraction is sometimes u...

Full description

Saved in:
Bibliographic Details
Main Authors: Tadashi Ogura, Takuma Okamoto, Yamato Ohtani, Erica Cooper, Tomoki Toda, Hisashi Kawai
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11062880/
Tags: Add Tag
No Tags, Be the first to tag this record!