Towards an ‘Everything Corpus’: A Framework and Guidelines for the Curation of More Comprehensive Multimodal Music Data

Music information retrieval (MIR) is increasingly concerned with properly managing the complexity of musical data and the curation of high-quality multimodal datasets for use in a variety of computational tasks. This article presents (1) a conceptual framework for how practitioners interested in MIR...

Full description

Saved in:
Bibliographic Details
Main Authors: Mark Gotham, Brian Bemman, Igor Vatolkin
Format: Article
Language:English
Published: Ubiquity Press 2025-05-01
Series:Transactions of the International Society for Music Information Retrieval
Subjects:
Online Access:https://account.transactions.ismir.net/index.php/up-j-tismir/article/view/228
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Music information retrieval (MIR) is increasingly concerned with properly managing the complexity of musical data and the curation of high-quality multimodal datasets for use in a variety of computational tasks. This article presents (1) a conceptual framework for how practitioners interested in MIR—from musicians to scientists—can understand the multitude of modalities that constitute musical data and (2) a set of proposed guidelines for MIR researchers to consider when setting out to curate comprehensive, well-targeted, durable, and ethically sourced multimodal datasets. For (1), we identify 12 different themes of musical data divided into three, sequential phases further subdivided into five, narrow focus areas: (i) ‘before’ the music (leading to), (ii) the ‘actual’ music (itself and around it), and (iii) ‘after’ the music (uses of and responses to). For (2), we identify 17 specific quantitative, qualitative, and ethical criteria, informed by this conceptual framework and practices observed in existing multimodal datasets, for the eventual construction of an ‘Everything Corpus' for MIR research.
ISSN:2514-3298