Automatic Timbre Transformation Using Enhanced Diffusion Model
We present a novel timbre transfer model that uses an enhanced diffusion architecture to convert music from various instruments into Erhu timbre. The Erhu, a traditional Chinese instrument, is difficult to simulate due to its rich vibrato and smooth note transitions. Existing Musical Instrument Digi...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11004054/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | We present a novel timbre transfer model that uses an enhanced diffusion architecture to convert music from various instruments into Erhu timbre. The Erhu, a traditional Chinese instrument, is difficult to simulate due to its rich vibrato and smooth note transitions. Existing Musical Instrument Digital Interface systems struggle to capture its nuanced dynamics. Our model integrates a Pitch Encoder, a Loudness Encoder, and a Diffusion Decoder. The encoders extract pitch features and dynamic loudness variations, guiding the decoder in generating realistic Erhu timbre. By extracting general musical features, the system generalizes to unseen input types without retraining. Evaluations based on pitch accuracy, cosine similarity, and Fréchet Audio Distance show that our model achieves 96% pitch accuracy and high fidelity in Erhu timbre reproduction. This study demonstrates the potential of diffusion-based timbre transfer models in music generation and provides new directions for future work on both music generation and timbre transfer. |
|---|---|
| ISSN: | 2169-3536 |