Advancing deep learning for expressive music composition and performance modeling
Abstract The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nu...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-13064-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849332911484960768 |
|---|---|
| author | Man Zhang |
| author_facet | Man Zhang |
| author_sort | Man Zhang |
| collection | DOAJ |
| description | Abstract The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music. |
| format | Article |
| id | doaj-art-91f39b2d5fc64007ae34122b03ebb08b |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-91f39b2d5fc64007ae34122b03ebb08b2025-08-20T03:46:04ZengNature PortfolioScientific Reports2045-23222025-07-0115111510.1038/s41598-025-13064-6Advancing deep learning for expressive music composition and performance modelingMan Zhang0School of Mechanical Engineering, Yellow River Conservancy Technical UniversityAbstract The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music.https://doi.org/10.1038/s41598-025-13064-6Deep learningAI music generationMusic transcriptionTransformer modelsGenerative adversarial networks (GANs)Long short-term memory (LSTM) |
| spellingShingle | Man Zhang Advancing deep learning for expressive music composition and performance modeling Scientific Reports Deep learning AI music generation Music transcription Transformer models Generative adversarial networks (GANs) Long short-term memory (LSTM) |
| title | Advancing deep learning for expressive music composition and performance modeling |
| title_full | Advancing deep learning for expressive music composition and performance modeling |
| title_fullStr | Advancing deep learning for expressive music composition and performance modeling |
| title_full_unstemmed | Advancing deep learning for expressive music composition and performance modeling |
| title_short | Advancing deep learning for expressive music composition and performance modeling |
| title_sort | advancing deep learning for expressive music composition and performance modeling |
| topic | Deep learning AI music generation Music transcription Transformer models Generative adversarial networks (GANs) Long short-term memory (LSTM) |
| url | https://doi.org/10.1038/s41598-025-13064-6 |
| work_keys_str_mv | AT manzhang advancingdeeplearningforexpressivemusiccompositionandperformancemodeling |