Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data

Music information retrieval (MIR) is increasingly concerned with properly managing the complexity of musical data and the curation of high-quality multimodal datasets for use in a variety of computational tasks. This article presents (1) a conceptual framework for how practitioners interested in MIR...

Full description

Saved in:
Bibliographic Details
Main Authors: Mark Gotham, Brian Bemman, Igor Vatolkin
Format: Article
Language:English
Published: Ubiquity Press 2025-05-01
Series:Transactions of the International Society for Music Information Retrieval
Subjects:
Online Access:https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/228
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849469517458046976
author Mark Gotham
Brian Bemman
Igor Vatolkin
author_facet Mark Gotham
Brian Bemman
Igor Vatolkin
author_sort Mark Gotham
collection DOAJ
description Music information retrieval (MIR) is increasingly concerned with properly managing the complexity of musical data and the curation of high-quality multimodal datasets for use in a variety of computational tasks. This article presents (1) a conceptual framework for how practitioners interested in MIR—from musicians to scientists—can understand the multitude of modalities that constitute musical data and (2) a set of proposed guidelines for MIR researchers to consider when setting out to curate comprehensive, well-targeted, durable, and ethically sourced multimodal datasets. For (1), we identify 12 different themes of musical data divided into three, sequential phases further subdivided into five, narrow focus areas: (i) ‘before’ the music (leading to), (ii) the ‘actual’ music (itself and around it), and (iii) ‘after’ the music (uses of and responses to). For (2), we identify 17 specific quantitative, qualitative, and ethical criteria, informed by this conceptual framework and practices observed in existing multimodal datasets, for the eventual construction of an ‘Everything Corpus' for MIR research.
format Article
id doaj-art-8854e97f7c584b1fa02eaa8f6caab593
institution Kabale University
issn 2514-3298
language English
publishDate 2025-05-01
publisher Ubiquity Press
record_format Article
series Transactions of the International Society for Music Information Retrieval
spelling doaj-art-8854e97f7c584b1fa02eaa8f6caab5932025-08-20T03:25:27ZengUbiquity PressTransactions of the International Society for Music Information Retrieval2514-32982025-05-018170–9270–9210.5334/tismir.228228Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music DataMark Gotham0https://orcid.org/0000-0003-0722-3074Brian Bemman1https://orcid.org/0000-0001-7189-7896Igor Vatolkin2https://orcid.org/0000-0002-9454-9402King’s College London, LondonDurham University, DurhamRWTH Aachen University, AachenMusic information retrieval (MIR) is increasingly concerned with properly managing the complexity of musical data and the curation of high-quality multimodal datasets for use in a variety of computational tasks. This article presents (1) a conceptual framework for how practitioners interested in MIR—from musicians to scientists—can understand the multitude of modalities that constitute musical data and (2) a set of proposed guidelines for MIR researchers to consider when setting out to curate comprehensive, well-targeted, durable, and ethically sourced multimodal datasets. For (1), we identify 12 different themes of musical data divided into three, sequential phases further subdivided into five, narrow focus areas: (i) ‘before’ the music (leading to), (ii) the ‘actual’ music (itself and around it), and (iii) ‘after’ the music (uses of and responses to). For (2), we identify 17 specific quantitative, qualitative, and ethical criteria, informed by this conceptual framework and practices observed in existing multimodal datasets, for the eventual construction of an ‘Everything Corpus' for MIR research.https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/228multimodalmusicinformation retrievaldatasetevaluationreview
spellingShingle Mark Gotham
Brian Bemman
Igor Vatolkin
Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data
Transactions of the International Society for Music Information Retrieval
multimodal
music
information retrieval
dataset
evaluation
review
title Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data
title_full Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data
title_fullStr Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data
title_full_unstemmed Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data
title_short Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data
title_sort towards an everything corpus a framework and guidelines for the curation of more comprehensive multimodal music data
topic multimodal
music
information retrieval
dataset
evaluation
review
url https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/228
work_keys_str_mv AT markgotham towardsaneverythingcorpusaframeworkandguidelinesforthecurationofmorecomprehensivemultimodalmusicdata
AT brianbemman towardsaneverythingcorpusaframeworkandguidelinesforthecurationofmorecomprehensivemultimodalmusicdata
AT igorvatolkin towardsaneverythingcorpusaframeworkandguidelinesforthecurationofmorecomprehensivemultimodalmusicdata