Music Similarity Detection Through Comparative Imagery Data

In music, plagiarism has been an important but troubled issue, which becomes ever more critical with the widespread usage of generative AI tools. Meanwhile, the development of techniques for music similarity detection has been hampered by the scarcity of legally verified data on plagiarism. In this...

Full description

Saved in:
Bibliographic Details
Main Authors: Asli Saner, Min Chen
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/14/7706
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850078254842511360
author Asli Saner
Min Chen
author_facet Asli Saner
Min Chen
author_sort Asli Saner
collection DOAJ
description In music, plagiarism has been an important but troubled issue, which becomes ever more critical with the widespread usage of generative AI tools. Meanwhile, the development of techniques for music similarity detection has been hampered by the scarcity of legally verified data on plagiarism. In this paper, we present a technical solution for training music similarity detection models through the use of comparative imagery data. With the aid of feature-based analysis and data visualization, we conducted experiments to analyze how different music features may contribute to the judgment of plagiarism. While the feature-based analysis guided us to focus on a subset of features, whose similarity is typically associated with music plagiarism, data visualization inspired us to train machine learning models using such comparative imagery instead of using audio signals directly. We trained feature-based sub-models (convolutional neural networks) using imagery data and an ensemble model with Bayesian interpretation for combining the predictions of the sub-models. We tested the trained model with legally verified data as well as AI-generated music, confirming that the models produced with our approach can detect similarity patterns which are typically associated with music plagiarism. Furthermore, using imagery data as the input and output of an ML model has been proven to facilitate explainable AI.
format Article
id doaj-art-411edefec95f42f587f4e6537d096248
institution DOAJ
issn 2076-3417
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-411edefec95f42f587f4e6537d0962482025-08-20T02:45:37ZengMDPI AGApplied Sciences2076-34172025-07-011514770610.3390/app15147706Music Similarity Detection Through Comparative Imagery DataAsli Saner0Min Chen1Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UKDepartment of Engineering Science, University of Oxford, Oxford OX1 3PJ, UKIn music, plagiarism has been an important but troubled issue, which becomes ever more critical with the widespread usage of generative AI tools. Meanwhile, the development of techniques for music similarity detection has been hampered by the scarcity of legally verified data on plagiarism. In this paper, we present a technical solution for training music similarity detection models through the use of comparative imagery data. With the aid of feature-based analysis and data visualization, we conducted experiments to analyze how different music features may contribute to the judgment of plagiarism. While the feature-based analysis guided us to focus on a subset of features, whose similarity is typically associated with music plagiarism, data visualization inspired us to train machine learning models using such comparative imagery instead of using audio signals directly. We trained feature-based sub-models (convolutional neural networks) using imagery data and an ensemble model with Bayesian interpretation for combining the predictions of the sub-models. We tested the trained model with legally verified data as well as AI-generated music, confirming that the models produced with our approach can detect similarity patterns which are typically associated with music plagiarism. Furthermore, using imagery data as the input and output of an ML model has been proven to facilitate explainable AI.https://www.mdpi.com/2076-3417/15/14/7706musicplagiarismsimilarity detectionmachine learningCNNensemble model
spellingShingle Asli Saner
Min Chen
Music Similarity Detection Through Comparative Imagery Data
Applied Sciences
music
plagiarism
similarity detection
machine learning
CNN
ensemble model
title Music Similarity Detection Through Comparative Imagery Data
title_full Music Similarity Detection Through Comparative Imagery Data
title_fullStr Music Similarity Detection Through Comparative Imagery Data
title_full_unstemmed Music Similarity Detection Through Comparative Imagery Data
title_short Music Similarity Detection Through Comparative Imagery Data
title_sort music similarity detection through comparative imagery data
topic music
plagiarism
similarity detection
machine learning
CNN
ensemble model
url https://www.mdpi.com/2076-3417/15/14/7706
work_keys_str_mv AT aslisaner musicsimilaritydetectionthroughcomparativeimagerydata
AT minchen musicsimilaritydetectionthroughcomparativeimagerydata