TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation
Music generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mea...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/25/2/386 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587503021850624 |
---|---|
author | Yumei Zhang Yulin Zhou Xiaojiao Lv Jinshan Li Heng Lu Yuping Su Honghong Yang |
author_facet | Yumei Zhang Yulin Zhou Xiaojiao Lv Jinshan Li Heng Lu Yuping Su Honghong Yang |
author_sort | Yumei Zhang |
collection | DOAJ |
description | Music generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mean square norm restriction (TARREAN). This model improves the temporal coherence of music by utilizing the gate recurrent unit (GRU), which enhances the model’s ability to capture the dependencies between sequential elements. Additionally, we apply masked multi-head attention to prevent the model from accessing future information during training, preserving the causal structure of music sequences. To reduce computational overhead, we introduce root mean square layer normalization (RMS Norm), which smooths gradients and simplifies the calculations, thereby improving training efficiency. The music sequences are encoded using a compound word method, converting them into discrete symbol-event combinations for input into the TARREAN model. The proposed method effectively mitigates discontinuity issues in generated music and enhances generation quality. We evaluated the model using the Essen Associative Code and Folk Song Database, which contains 20,000 folk melodies from Germany, Poland, and China. The results show that our model produces music that is more aligned with human preferences, as indicated by subjective evaluation scores. The TARREAN model achieved a satisfaction score of 4.34, significantly higher than the 3.79 score of the Transformer-XL + REMI model. Objective evaluation also demonstrated a 15% improvement in temporal coherence compared to traditional methods. Both objective and subjective experimental results demonstrate that TARREAN can significantly improve generation coherence and reduce computational costs. |
format | Article |
id | doaj-art-79bbd62abe314c2797c6ad7dfcf3a87a |
institution | Kabale University |
issn | 1424-8220 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj-art-79bbd62abe314c2797c6ad7dfcf3a87a2025-01-24T13:48:44ZengMDPI AGSensors1424-82202025-01-0125238610.3390/s25020386TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music GenerationYumei Zhang0Yulin Zhou1Xiaojiao Lv2Jinshan Li3Heng Lu4Yuping Su5Honghong Yang6School of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaSchool of Computer Science, Shaanxi Normal University, Xi’an 710062, ChinaMusic generation by AI algorithms like Transformer is currently a research hotspot. Existing methods often suffer from issues related to coherence and high computational costs. To address these problems, we propose a novel Transformer-based model that incorporates a gate recurrent unit with root mean square norm restriction (TARREAN). This model improves the temporal coherence of music by utilizing the gate recurrent unit (GRU), which enhances the model’s ability to capture the dependencies between sequential elements. Additionally, we apply masked multi-head attention to prevent the model from accessing future information during training, preserving the causal structure of music sequences. To reduce computational overhead, we introduce root mean square layer normalization (RMS Norm), which smooths gradients and simplifies the calculations, thereby improving training efficiency. The music sequences are encoded using a compound word method, converting them into discrete symbol-event combinations for input into the TARREAN model. The proposed method effectively mitigates discontinuity issues in generated music and enhances generation quality. We evaluated the model using the Essen Associative Code and Folk Song Database, which contains 20,000 folk melodies from Germany, Poland, and China. The results show that our model produces music that is more aligned with human preferences, as indicated by subjective evaluation scores. The TARREAN model achieved a satisfaction score of 4.34, significantly higher than the 3.79 score of the Transformer-XL + REMI model. Objective evaluation also demonstrated a 15% improvement in temporal coherence compared to traditional methods. Both objective and subjective experimental results demonstrate that TARREAN can significantly improve generation coherence and reduce computational costs.https://www.mdpi.com/1424-8220/25/2/386automatic music generationdeep learningtransformergate recurrent unitroot mean square layer normalization |
spellingShingle | Yumei Zhang Yulin Zhou Xiaojiao Lv Jinshan Li Heng Lu Yuping Su Honghong Yang TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation Sensors automatic music generation deep learning transformer gate recurrent unit root mean square layer normalization |
title | TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation |
title_full | TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation |
title_fullStr | TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation |
title_full_unstemmed | TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation |
title_short | TARREAN: A Novel Transformer with a Gate Recurrent Unit for Stylized Music Generation |
title_sort | tarrean a novel transformer with a gate recurrent unit for stylized music generation |
topic | automatic music generation deep learning transformer gate recurrent unit root mean square layer normalization |
url | https://www.mdpi.com/1424-8220/25/2/386 |
work_keys_str_mv | AT yumeizhang tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT yulinzhou tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT xiaojiaolv tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT jinshanli tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT henglu tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT yupingsu tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration AT honghongyang tarreananoveltransformerwithagaterecurrentunitforstylizedmusicgeneration |