Advancements in End-to-End Audio Style Transformation: A Differentiable Approach for Voice Conversion and Musical Style Transfer

Introduction: This study introduces a fully differentiable, end-to-end audio transformation network designed to overcome these limitations by operating directly on acoustic features. Methods: The proposed method employs an encoder–decoder architecture with a global conditioning mechanism. It elimina...

Full description

Saved in:
Bibliographic Details
Main Authors: Shashwat Aggarwal, Shashwat Uttam, Sameer Garg, Shubham Garg, Kopal Jain, Swati Aggarwal
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/1/16
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Introduction: This study introduces a fully differentiable, end-to-end audio transformation network designed to overcome these limitations by operating directly on acoustic features. Methods: The proposed method employs an encoder–decoder architecture with a global conditioning mechanism. It eliminates the need for parallel utterances, intermediate phonetic representations, and speaker-independent ASR systems. The system is evaluated on tasks of voice conversion and musical style transfer using subjective and objective metrics. Results: Experimental results demonstrate the model’s efficacy, achieving competitive performance in both seen and unseen target scenarios. The proposed framework outperforms seven existing systems for audio transformation and aligns closely with state-of-the-art methods. Conclusion: This approach simplifies feature engineering, ensures vocabulary independence, and broadens the applicability of audio transformations across diverse domains, such as personalized voice assistants and musical experimentation.
ISSN:2673-2688