MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
Music question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Ubiquity Press
2025-07-01
|
| Series: | Transactions of the International Society for Music Information Retrieval |
| Subjects: | |
| Online Access: | https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/222 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849229999432794112 |
|---|---|
| author | Anna-Maria Christodoulou Kyrre Glette Olivier Lartillot Alexander Refsum Jensenius |
| author_facet | Anna-Maria Christodoulou Kyrre Glette Olivier Lartillot Alexander Refsum Jensenius |
| author_sort | Anna-Maria Christodoulou |
| collection | DOAJ |
| description | Music question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal dataset of 310 music performance videos and 11,793 human‑annotated question–answer pairs, spanning diverse musical traditions and styles. Grounded in musicology and music psychology, MusiQAl emphasizes multimodal reasoning, causal inference, and cross‑cultural understanding of performer–music interaction. We benchmark AVST and LAVISH architectures on MusiQAI, revealing strengths and limitations, underscoring the importance of integrating multimodal learning and domain expertise to advance MQA and music information retrieval. |
| format | Article |
| id | doaj-art-300fedf2550d4e8e9a30af8d1c67d1b6 |
| institution | Kabale University |
| issn | 2514-3298 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Ubiquity Press |
| record_format | Article |
| series | Transactions of the International Society for Music Information Retrieval |
| spelling | doaj-art-300fedf2550d4e8e9a30af8d1c67d1b62025-08-21T12:49:42ZengUbiquity PressTransactions of the International Society for Music Information Retrieval2514-32982025-07-0181265–282265–28210.5334/tismir.222222MusiQAl: A Dataset for Music Question–Answering through Audio–Video FusionAnna-Maria Christodoulou0Kyrre Glette1Olivier Lartillot2Alexander Refsum Jensenius3RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Musicology, University of OsloRITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Informatics, University of OsloRITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Musicology, University of OsloRITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion; Department of Musicology, University of OsloMusic question–answering (MQA) is a machine learning task where a computational system analyzes and answers questions about music‑related data. Traditional methods prioritize audio, overlooking visual and embodied aspects crucial to music performance understanding. We introduce MusiQAl, a multimodal dataset of 310 music performance videos and 11,793 human‑annotated question–answer pairs, spanning diverse musical traditions and styles. Grounded in musicology and music psychology, MusiQAl emphasizes multimodal reasoning, causal inference, and cross‑cultural understanding of performer–music interaction. We benchmark AVST and LAVISH architectures on MusiQAI, revealing strengths and limitations, underscoring the importance of integrating multimodal learning and domain expertise to advance MQA and music information retrieval.https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/222multimodal music processingmirdatasetaudio–videoquestion–answering |
| spellingShingle | Anna-Maria Christodoulou Kyrre Glette Olivier Lartillot Alexander Refsum Jensenius MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion Transactions of the International Society for Music Information Retrieval multimodal music processing mir dataset audio–video question–answering |
| title | MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion |
| title_full | MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion |
| title_fullStr | MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion |
| title_full_unstemmed | MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion |
| title_short | MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion |
| title_sort | musiqal a dataset for music question answering through audio video fusion |
| topic | multimodal music processing mir dataset audio–video question–answering |
| url | https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/222 |
| work_keys_str_mv | AT annamariachristodoulou musiqaladatasetformusicquestionansweringthroughaudiovideofusion AT kyrreglette musiqaladatasetformusicquestionansweringthroughaudiovideofusion AT olivierlartillot musiqaladatasetformusicquestionansweringthroughaudiovideofusion AT alexanderrefsumjensenius musiqaladatasetformusicquestionansweringthroughaudiovideofusion |