High-Quality Text-to-Speech Implementation via Active Shallow Diffusion Mechanism

High-Quality Text-to-Speech Implementation via Active Shallow Diffusion Mechanism

Denoising diffusion probabilistic models (DDPMs) have proven to be useful in text-to-speech (TTS) tasks; however, it has been a challenge for traditional diffusion models to carry out real-time processing because of the need for hundreds of sampling steps during the iteration. In this work, a two-st...

Full description

Saved in:

Bibliographic Details
Main Authors:	Junlin Deng, Ruihan Hou, Yan Deng, Yongqiu Long, Ning Wu
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	text-to-speech speech synthesis diffusion probabilistic model MixGAN mel-spectrogram
Online Access:	https://www.mdpi.com/1424-8220/25/3/833
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Polish Speech and Text Emotion Recognition in a Multimodal Emotion Analysis System
by: Kamil Skowroński, et al.
Published: (2024-11-01)

Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN
by: Gheed T. Waleed, et al.
Published: (2025-06-01)

Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
by: Shaoting Zeng, et al.
Published: (2025-01-01)

MixDiff-TTS: Mixture Alignment and Diffusion Model for Text-to-Speech
by: Yongqiu Long, et al.
Published: (2025-04-01)

ECE-TTS: A Zero-Shot Emotion Text-to-Speech Model with Simplified and Precise Control
by: Shixiong Liang, et al.
Published: (2025-05-01)

Multi-Feature Fusion-Based Speech Disorder Classification Using MobileNetV3-EfficientNetB7, Linformer-Performer, and SHAP-Aware XGBoost
by: Abdul Rahaman Wahab Sait, et al.
Published: (2025-01-01)

DDA-MSLD: A Multi-Feature Speech Lie Detection Algorithm Based on a Dual-Stream Deep Architecture
by: Pengfei Guo, et al.
Published: (2025-05-01)

Advancing automatic speech recognition for low-resource ghanaian languages: Audio datasets for Akan, Ewe, Dagbani, Dagaare, and IkposoScience Data Bank
by: Isaac Wiafe, et al.
Published: (2025-08-01)

Document-Level Neural TTS Using Curriculum Learning and Attention Masking
by: Sung-Woong Hwang, et al.
Published: (2021-01-01)

Using casual speech phonology in synthetic speech
by: Linda SHOCKEY
Published: (2014-04-01)

Speech Delay Assistive Device for Speech-to-Text Transcription Based on Machine Learning
by: Maria Kristina C. Rodriguez, et al.
Published: (2025-05-01)

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment
by: Apiwat Ditthapron, et al.
Published: (2021-01-01)

Speaker Diarization: A Review of Objectives and Methods
by: Douglas O’Shaughnessy
Published: (2025-02-01)

Hearing vocals to recognize schizophrenia: speech discriminant analysis with fusion of emotions and features based on deep learning
by: Jie Huang, et al.
Published: (2025-05-01)

Development of a Deep Learning-Based Text-To-Speech System for the Malang Walikan Language Using the Pre-Trained SpeechT5 and Hifi-GAN Models
by: Aina Avrilia Imani, et al.
Published: (2025-07-01)

A curated crowdsourced dataset of Luganda and Swahili speech for text-to-speech synthesisMendeley Data
by: Andrew Katumba, et al.
Published: (2025-10-01)

A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient
by: Lang LIN, et al.
Published: (2018-05-01)

Analysis of the influence of selected audio pre-processing stages on accuracy of speaker language recognition
by: Олеся Барковська, et al.
Published: (2023-12-01)

Analysis of the influence of selected audio pre-processing stages on accuracy of speaker language recognition
by: Olesia Barkovska, et al.
Published: (2023-12-01)

Quality assessment of synthetic speech
by: Stefan Brachmański, et al.
Published: (2025-07-01)

Analysis and Research on Spectrogram-Based Emotional Speech Signal Augmentation Algorithm
by: Huawei Tao, et al.
Published: (2025-06-01)

Phonetic minimization of the text corpus in Belarusian for the speech synthesis system training
by: S. I. Lysy
Published: (2019-03-01)

Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network
by: Asroni Asroni, et al.
Published: (2021-06-01)

Detection Of Sentence Modality On French Automatic Speech-to-text Transcriptions
by: Luisa Orosanu, et al.
Published: (2016-05-01)

A Unified Approach to Voice Classification: Leveraging Spectrograms, Mel Spectrograms, and Statistical Features
by: Muhammad Talha, et al.
Published: (2025-01-01)

Detection of Abnormal Symptoms Using Acoustic-Spectrogram-Based Deep Learning
by: Seong-Yoon Kim, et al.
Published: (2025-04-01)

Speech Synthesis of Arabic Vocabularies
by: Enaam Saeed
Published: (2011-07-01)

Helium Speech Recognition Method Based on Spectrogram with Deep Learning
by: Yonghong Chen, et al.
Published: (2025-05-01)

CochleaSpecNet: An Attention-Based Dual Branch Hybrid CNN-GRU Network for Speech Emotion Recognition Using Cochleagram and Spectrogram
by: Atkia Anika Namey, et al.
Published: (2024-01-01)

Indonesian Voice Cloning Text-to-Speech System With Vall-E-Based Model and Speech Enhancement
by: Hizkia Raditya Pratama Roosadi, et al.
Published: (2024-01-01)

Implementation of Eye-To-Text Morse Code Device to Help Speech Impairments People
by: Pujianti Wahyuningsih, et al.
Published: (2025-03-01)

Transcription of Informatics Final Project Seminar Recordings via Speech-to-Text
by: Trisna Gelar, et al.
Published: (2024-12-01)

End-to-End Multi-Speaker FastSpeech2 With Hierarchical Decoder
by: Majid Adibian, et al.
Published: (2025-01-01)

Automatic recognition and representation of text in the form of audio stream
by: L. V. Serebryanaya, et al.
Published: (2021-10-01)

Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
by: Yeunju Choi, et al.
Published: (2022-01-01)

Safeguarding the Integrity of Online Social Networks (OSN): Leveraging the Efficacy of Conv-LSTM-Based Siamese Network to Predict Hate Speech in Low Resource Hindi-English Code-Mixed Text
by: Shankar Biradar, et al.
Published: (2025-01-01)

PHYSIOLOGICAL ASPECTS FOR SONOGRAMS BUILDING AND SPECTRUM RESTORE DISTORTED SPEECH VOCALIZATIONS
by: Mikhail V. Alyushin, et al.
Published: (2025-05-01)

Scalable multimodal approach for face generation and super-resolution using a conditional diffusion model
by: Ahmed Abotaleb, et al.
Published: (2024-11-01)

Impaired Prosodic Processing but Not Hearing Function Is Associated with an Age-Related Reduction in AI Speech Recognition
by: Björn Herrmann, et al.
Published: (2025-02-01)

Segmentation of speech on phonetic elements for systems of speech information protection
by: Y. N. Seitkulov, et al.
Published: (2019-07-01)