MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion

Music question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal...

Full description

Saved in:
Bibliographic Details
Main Authors: Anna-Maria Christodoulou, Kyrre Glette, Olivier Lartillot, Alexander Refsum Jensenius
Format: Article
Language:English
Published: Ubiquity Press 2025-07-01
Series:Transactions of the International Society for Music Information Retrieval
Subjects:
Online Access:https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/222
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849229999432794112
author Anna-Maria Christodoulou
Kyrre Glette
Olivier Lartillot
Alexander Refsum Jensenius
author_facet Anna-Maria Christodoulou
Kyrre Glette
Olivier Lartillot
Alexander Refsum Jensenius
author_sort Anna-Maria Christodoulou
collection DOAJ
description Music question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal dataset of 310 music performance videos and 11,793 human‑annotated question–answer pairs, spanning diverse musical traditions and styles. Grounded in musicology and music psychology, MusiQAl emphasizes multimodal reasoning, causal inference, and cross‑cultural understanding of performer–music interaction. We benchmark AVST and LAVISH architectures on MusiQAI, revealing strengths and limitations, underscoring the importance of integrating multimodal learning and domain expertise to advance MQA and music information retrieval.
format Article
id doaj-art-300fedf2550d4e8e9a30af8d1c67d1b6
institution Kabale University
issn 2514-3298
language English
publishDate 2025-07-01
publisher Ubiquity Press
record_format Article
series Transactions of the International Society for Music Information Retrieval
spelling doaj-art-300fedf2550d4e8e9a30af8d1c67d1b62025-08-21T12:49:42ZengUbiquity PressTransactions of the International Society for Music Information Retrieval2514-32982025-07-0181265–282265–28210.5334/tismir.222222MusiQAl: A Dataset for Music Question–Answering through Audio–Video FusionAnna-Maria Christodoulou0Kyrre Glette1Olivier Lartillot2Alexander Refsum Jensenius3RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Musicology, University of OsloRITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Informatics, University of OsloRITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Musicology, University of OsloRITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Musicology, University of OsloMusic question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal dataset of 310 music performance videos and 11,793 human‑annotated question–answer pairs, spanning diverse musical traditions and styles. Grounded in musicology and music psychology, MusiQAl emphasizes multimodal reasoning, causal inference, and cross‑cultural understanding of performer–music interaction. We benchmark AVST and LAVISH architectures on MusiQAI, revealing strengths and limitations, underscoring the importance of integrating multimodal learning and domain expertise to advance MQA and music information retrieval.https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/222multimodal music processingmirdatasetaudio–videoquestion–answering
spellingShingle Anna-Maria Christodoulou
Kyrre Glette
Olivier Lartillot
Alexander Refsum Jensenius
MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
Transactions of the International Society for Music Information Retrieval
multimodal music processing
mir
dataset
audio–video
question–answering
title MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
title_full MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
title_fullStr MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
title_full_unstemmed MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
title_short MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
title_sort musiqal a dataset for music question answering through audio video fusion
topic multimodal music processing
mir
dataset
audio–video
question–answering
url https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/222
work_keys_str_mv AT annamariachristodoulou musiqaladatasetformusicquestionansweringthroughaudiovideofusion
AT kyrreglette musiqaladatasetformusicquestionansweringthroughaudiovideofusion
AT olivierlartillot musiqaladatasetformusicquestionansweringthroughaudiovideofusion
AT alexanderrefsumjensenius musiqaladatasetformusicquestionansweringthroughaudiovideofusion