Challenges and Perspectives in Interpretable Music Auto-Tagging Using Perceptual Features
In the era of music streaming platforms and recommendation systems, the automatic music auto-tagging task has gained a lot of traction and has motivated researchers to develop methods for solving the task focusing on improving performance metrics on baseline datasets. The majority of recent approach...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10944805/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In the era of music streaming platforms and recommendation systems, the automatic music auto-tagging task has gained a lot of traction and has motivated researchers to develop methods for solving the task focusing on improving performance metrics on baseline datasets. The majority of recent approaches rely on deep neural networks, which despite their impressive performance, are opaque, meaning it is difficult to explain their output on a given input. While the problem of interpretability has been highlighted in other domains, such as medicine, it has not been a priority for music-related tasks. In this work, we explored the usefulness of interpretability for music auto-tagging. We developed a pipeline incorporating three types of information extraction procedures: 1) symbolic knowledge, 2) auxiliary deep neural networks, and 3) signal processing, to extract perceptual features of audio files, which were then used to train an explainable machine learning model to predict tags. We experimented on three datasets the MTG-Jamendo dataset, the GTZAN dataset, and the MagnaTagATune dataset. Our method outperforms baseline models in all tasks and in some cases is competitive with the state-of-the-art. We conducted a human survey to evaluate user trust in our methodology and in a state-of-the-art model, concluding that while the state-of-the-art model offers better performance, there are use cases where the slight deterioration in accuracy is outweighed by the increased trust and value provided by interpretability. |
|---|---|
| ISSN: | 2169-3536 |