TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation

Music generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mea...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yumei Zhang, Yulin Zhou, Xiaojiao Lv, Jinshan Li, Heng Lu, Yuping Su, Honghong Yang
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	automatic music generation deep learning transformer gate recurrent unit root mean square layer normalization
Online Access:	https://www.mdpi.com/1424-8220/25/2/386
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587503021850624
author	Yumei Zhang Yulin Zhou Xiaojiao Lv Jinshan Li Heng Lu Yuping Su Honghong Yang
author_facet	Yumei Zhang Yulin Zhou Xiaojiao Lv Jinshan Li Heng Lu Yuping Su Honghong Yang
author_sort	Yumei Zhang
collection	DOAJ
description	Music generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mean square norm restriction (TARREAN). This model improves the temporal coherence of music by utilizing the gate recurrent unit (GRU), which enhances the model’s ability to capture the dependencies between sequential elements. Additionally, we apply masked multi-head attention to prevent the model from accessing future information during training, preserving the causal structure of music sequences. To reduce computational overhead, we introduce root mean square layer normalization (RMS Norm), which smooths gradients and simplifies the calculations, thereby improving training efficiency. The music sequences are encoded using a compound word method, converting them into discrete symbol-event combinations for input into the TARREAN model. The proposed method effectively mitigates discontinuity issues in generated music and enhances generation quality. We evaluated the model using the Essen Associative Code and Folk Song Database, which contains 20,000 folk melodies from Germany, Poland, and China. The results show that our model produces music that is more aligned with human preferences, as indicated by subjective evaluation scores. The TARREAN model achieved a satisfaction score of 4.34, significantly higher than the 3.79 score of the Transformer-XL + REMI model. Objective evaluation also demonstrated a 15% improvement in temporal coherence compared to traditional methods. Both objective and subjective experimental results demonstrate that TARREAN can significantly improve generation coherence and reduce computational costs.
format	Article
id	doaj-art-79bbd62abe314c2797c6ad7dfcf3a87a
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-79bbd62abe314c2797c6ad7dfcf3a87a2025-01-24T13:48:44ZengMDPI AGSensors1424-82202025-01-0125238610.3390/s25020386TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music GenerationYumei Zhang0Yulin Zhou1Xiaojiao Lv2Jinshan Li3Heng Lu4Yuping Su5Honghong Yang6School of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaMusic generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mean square norm restriction (TARREAN). This model improves the temporal coherence of music by utilizing the gate recurrent unit (GRU), which enhances the model’s ability to capture the dependencies between sequential elements. Additionally, we apply masked multi-head attention to prevent the model from accessing future information during training, preserving the causal structure of music sequences. To reduce computational overhead, we introduce root mean square layer normalization (RMS Norm), which smooths gradients and simplifies the calculations, thereby improving training efficiency. The music sequences are encoded using a compound word method, converting them into discrete symbol-event combinations for input into the TARREAN model. The proposed method effectively mitigates discontinuity issues in generated music and enhances generation quality. We evaluated the model using the Essen Associative Code and Folk Song Database, which contains 20,000 folk melodies from Germany, Poland, and China. The results show that our model produces music that is more aligned with human preferences, as indicated by subjective evaluation scores. The TARREAN model achieved a satisfaction score of 4.34, significantly higher than the 3.79 score of the Transformer-XL + REMI model. Objective evaluation also demonstrated a 15% improvement in temporal coherence compared to traditional methods. Both objective and subjective experimental results demonstrate that TARREAN can significantly improve generation coherence and reduce computational costs.https://www.mdpi.com/1424-8220/25/2/386automatic music generationdeep learningtransformergate recurrent unitroot mean square layer normalization
spellingShingle	Yumei Zhang Yulin Zhou Xiaojiao Lv Jinshan Li Heng Lu Yuping Su Honghong Yang TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation Sensors automatic music generation deep learning transformer gate recurrent unit root mean square layer normalization
title	TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_full	TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_fullStr	TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_full_unstemmed	TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_short	TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
title_sort	tarrean a novel transformer with a gate recurrent unit for stylized music generation
topic	automatic music generation deep learning transformer gate recurrent unit root mean square layer normalization
url	https://www.mdpi.com/1424-8220/25/2/386
work_keys_str_mv	AT yumeizhang tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT yulinzhou tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT xiaojiaolv tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT jinshanli tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT henglu tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT yupingsu tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT honghongyang tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration

TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation

Similar Items