LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging

We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combi...

Full description

Saved in:
Bibliographic Details
Main Authors: Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of Signal Processing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10839319/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823859590025969664
author Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
author_facet Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
author_sort Charilaos Papaioannou
collection DOAJ
description We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.
format Article
id doaj-art-0880e47d41d742e9ba97a68cf07ddff0
institution Kabale University
issn 2644-1322
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Signal Processing
spelling doaj-art-0880e47d41d742e9ba97a68cf07ddff02025-02-11T00:01:46ZengIEEEIEEE Open Journal of Signal Processing2644-13222025-01-01613814610.1109/OJSP.2025.352931510839319LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio TaggingCharilaos Papaioannou0https://orcid.org/0009-0008-8558-3255Emmanouil Benetos1https://orcid.org/0000-0002-6820-6764Alexandros Potamianos2https://orcid.org/0009-0007-1532-5288School of ECE, National Technical University of Athens, Zografou, GreeceCentre for Digital Music, Queen Mary University of London, London, U.K.School of ECE, National Technical University of Athens, Zografou, GreeceWe introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.https://ieeexplore.ieee.org/document/10839319/Few-shot learningprototypical networksmulti-label classificationaudio taggingworld music datasets
spellingShingle Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
IEEE Open Journal of Signal Processing
Few-shot learning
prototypical networks
multi-label classification
audio tagging
world music datasets
title LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
title_full LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
title_fullStr LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
title_full_unstemmed LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
title_short LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
title_sort lc protonets multi label few shot learning for world music audio tagging
topic Few-shot learning
prototypical networks
multi-label classification
audio tagging
world music datasets
url https://ieeexplore.ieee.org/document/10839319/
work_keys_str_mv AT charilaospapaioannou lcprotonetsmultilabelfewshotlearningforworldmusicaudiotagging
AT emmanouilbenetos lcprotonetsmultilabelfewshotlearningforworldmusicaudiotagging
AT alexandrospotamianos lcprotonetsmultilabelfewshotlearningforworldmusicaudiotagging