Challenges and Perspectives in Interpretable Music Auto-Tagging Using Perceptual Features

In the era of music streaming platforms and recommendation systems, the automatic music auto-tagging task has gained a lot of traction and has motivated researchers to develop methods for solving the task focusing on improving performance metrics on baseline datasets. The majority of recent approach...

Full description

Saved in:
Bibliographic Details
Main Authors: Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos, Giorgos Stamou
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10944805/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the era of music streaming platforms and recommendation systems, the automatic music auto-tagging task has gained a lot of traction and has motivated researchers to develop methods for solving the task focusing on improving performance metrics on baseline datasets. The majority of recent approaches rely on deep neural networks, which despite their impressive performance, are opaque, meaning it is difficult to explain their output on a given input. While the problem of interpretability has been highlighted in other domains, such as medicine, it has not been a priority for music-related tasks. In this work, we explored the usefulness of interpretability for music auto-tagging. We developed a pipeline incorporating three types of information extraction procedures: 1) symbolic knowledge, 2) auxiliary deep neural networks, and 3) signal processing, to extract perceptual features of audio files, which were then used to train an explainable machine learning model to predict tags. We experimented on three datasets the MTG-Jamendo dataset, the GTZAN dataset, and the MagnaTagATune dataset. Our method outperforms baseline models in all tasks and in some cases is competitive with the state-of-the-art. We conducted a human survey to evaluate user trust in our methodology and in a state-of-the-art model, concluding that while the state-of-the-art model offers better performance, there are use cases where the slight deterioration in accuracy is outweighed by the increased trust and value provided by interpretability.
ISSN:2169-3536