TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation

Music generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mea...

Full description

Saved in:
Bibliographic Details
Main Authors: Yumei Zhang, Yulin Zhou, Xiaojiao Lv, Jinshan Li, Heng Lu, Yuping Su, Honghong Yang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/2/386
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587503021850624
author Yumei Zhang
Yulin Zhou
Xiaojiao Lv
Jinshan Li
Heng Lu
Yuping Su
Honghong Yang
author_facet Yumei Zhang
Yulin Zhou
Xiaojiao Lv
Jinshan Li
Heng Lu
Yuping Su
Honghong Yang
author_sort Yumei Zhang
collection DOAJ
description Music generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mean square norm restriction (TARREAN). This model improves the temporal coherence of music by utilizing the gate recurrent unit (GRU), which enhances the model’s ability to capture the dependencies between sequential elements. Additionally, we apply masked multi-head attention to prevent the model from accessing future information during training, preserving the causal structure of music sequences. To reduce computational overhead, we introduce root mean square layer normalization (RMS Norm), which smooths gradients and simplifies the calculations, thereby improving training efficiency. The music sequences are encoded using a compound word method, converting them into discrete symbol-event combinations for input into the TARREAN model. The proposed method effectively mitigates discontinuity issues in generated music and enhances generation quality. We evaluated the model using the Essen Associative Code and Folk Song Database, which contains 20,000 folk melodies from Germany, Poland, and China. The results show that our model produces music that is more aligned with human preferences, as indicated by subjective evaluation scores. The TARREAN model achieved a satisfaction score of 4.34, significantly higher than the 3.79 score of the Transformer-XL + REMI model. Objective evaluation also demonstrated a 15% improvement in temporal coherence compared to traditional methods. Both objective and subjective experimental results demonstrate that TARREAN can significantly improve generation coherence and reduce computational costs.
format Article
id doaj-art-79bbd62abe314c2797c6ad7dfcf3a87a
institution Kabale University
issn 1424-8220
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-79bbd62abe314c2797c6ad7dfcf3a87a2025-01-24T13:48:44ZengMDPI AGSensors1424-82202025-01-0125238610.3390/s25020386TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music GenerationYumei Zhang0Yulin Zhou1Xiaojiao Lv2Jinshan Li3Heng Lu4Yuping Su5Honghong Yang6School of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaMusic generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mean square norm restriction (TARREAN). This model improves the temporal coherence of music by utilizing the gate recurrent unit (GRU), which enhances the model’s ability to capture the dependencies between sequential elements. Additionally, we apply masked multi-head attention to prevent the model from accessing future information during training, preserving the causal structure of music sequences. To reduce computational overhead, we introduce root mean square layer normalization (RMS Norm), which smooths gradients and simplifies the calculations, thereby improving training efficiency. The music sequences are encoded using a compound word method, converting them into discrete symbol-event combinations for input into the TARREAN model. The proposed method effectively mitigates discontinuity issues in generated music and enhances generation quality. We evaluated the model using the Essen Associative Code and Folk Song Database, which contains 20,000 folk melodies from Germany, Poland, and China. The results show that our model produces music that is more aligned with human preferences, as indicated by subjective evaluation scores. The TARREAN model achieved a satisfaction score of 4.34, significantly higher than the 3.79 score of the Transformer-XL + REMI model. Objective evaluation also demonstrated a 15% improvement in temporal coherence compared to traditional methods. Both objective and subjective experimental results demonstrate that TARREAN can significantly improve generation coherence and reduce computational costs.https://www.mdpi.com/1424-8220/25/2/386automatic music generationdeep learningtransformergate recurrent unitroot mean square layer normalization
spellingShingle Yumei Zhang
Yulin Zhou
Xiaojiao Lv
Jinshan Li
Heng Lu
Yuping Su
Honghong Yang
TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
Sensors
automatic music generation
deep learning
transformer
gate recurrent unit
root mean square layer normalization
title TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_full TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_fullStr TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_full_unstemmed TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_short TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_sort tarrean a novel transformer with a gate recurrent unit for stylized music generation
topic automatic music generation
deep learning
transformer
gate recurrent unit
root mean square layer normalization
url https://www.mdpi.com/1424-8220/25/2/386
work_keys_str_mv AT yumeizhang tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration
AT yulinzhou tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration
AT xiaojiaolv tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration
AT jinshanli tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration
AT henglu tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration
AT yupingsu tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration
AT honghongyang tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration