Emotion transfer in audio using mel-cepstral representation and CycleGANs

Abstract The field of audio synthesis is currently confronted with two major challenges: to more effectively eliminate non-emotional influences in emotional feature extraction work, and to improve the emotional expression when reference audio is scarce. Therefore, an innovative audio deep feature de...

Full description

Saved in:
Bibliographic Details
Main Authors: Guijin Han, Junzhe Zhao, Yiming Zhou
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00082-7
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The field of audio synthesis is currently confronted with two major challenges: to more effectively eliminate non-emotional influences in emotional feature extraction work, and to improve the emotional expression when reference audio is scarce. Therefore, an innovative audio deep feature decoupling and emotion adaptive fusion model, which combines Mel Frequency Cepstral Coefficients (MFCCs) with Cycle-consistent Generative Adversarial Networks (CycleGANs), is proposed in this paper. We designed a Deep Feature Decoupled Encoder Group (DFDEG), which is based on Gated Linear Units (GLU), Self-Attention, and Average Pooling. Meanwhile, we designed a feature fusion method called Emotion Adaptive Instance Normalization (Emo-AdaIN), which is based on AdaIN. By integrating the DFDEG, Emo-AdaIN, and CycleGANs, an unsupervised bidirectional multi-emotion transfer method within the MFCCs is successfully achieved. This method performs well in terms of emotion decoupling and transfer on unseen datasets: for different speakers, the transfer result’s Lowest Emotional Similarity (LES) is 94.56%, and Average Confidence Level (ACL) is 0.51. This demonstrates the generalization performance across different speakers and the robustness across different emotion granularity.
ISSN:1319-1578
2213-1248