Advancing deep learning for expressive music composition and performance modeling

Abstract The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nu...

Full description

Saved in:
Bibliographic Details
Main Author: Man Zhang
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-13064-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332911484960768
author Man Zhang
author_facet Man Zhang
author_sort Man Zhang
collection DOAJ
description Abstract The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music.
format Article
id doaj-art-91f39b2d5fc64007ae34122b03ebb08b
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-91f39b2d5fc64007ae34122b03ebb08b2025-08-20T03:46:04ZengNature PortfolioScientific Reports2045-23222025-07-0115111510.1038/s41598-025-13064-6Advancing deep learning for expressive music composition and performance modelingMan Zhang0School of Mechanical Engineering, Yellow River Conservancy Technical UniversityAbstract The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music.https://doi.org/10.1038/s41598-025-13064-6Deep learningAI music generationMusic transcriptionTransformer modelsGenerative adversarial networks (GANs)Long short-term memory (LSTM)
spellingShingle Man Zhang
Advancing deep learning for expressive music composition and performance modeling
Scientific Reports
Deep learning
AI music generation
Music transcription
Transformer models
Generative adversarial networks (GANs)
Long short-term memory (LSTM)
title Advancing deep learning for expressive music composition and performance modeling
title_full Advancing deep learning for expressive music composition and performance modeling
title_fullStr Advancing deep learning for expressive music composition and performance modeling
title_full_unstemmed Advancing deep learning for expressive music composition and performance modeling
title_short Advancing deep learning for expressive music composition and performance modeling
title_sort advancing deep learning for expressive music composition and performance modeling
topic Deep learning
AI music generation
Music transcription
Transformer models
Generative adversarial networks (GANs)
Long short-term memory (LSTM)
url https://doi.org/10.1038/s41598-025-13064-6
work_keys_str_mv AT manzhang advancingdeeplearningforexpressivemusiccompositionandperformancemodeling