An evaluation methodology for machine learning-based tandem mass spectra similarity prediction

Abstract Background Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively str...

Full description

Saved in:

Bibliographic Details
Main Authors:	Michael Strobel, Alberto Gil-de-la-Fuente, Mohammad Reza Zare Shahneh, Yasin El Abiead, Roman Bushuiev, Anton Bushuiev, Tomáš Pluskal, Mingxun Wang
Format:	Article
Language:	English
Published:	BMC 2025-07-01
Series:	BMC Bioinformatics
Subjects:	Mass spectrometry Metabolomics Spectral similarity measure Machine learning Benchmark
Online Access:	https://doi.org/10.1186/s12859-025-06194-1
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849235700960985088
author	Michael Strobel Alberto Gil-de-la-Fuente Mohammad Reza Zare Shahneh Yasin El Abiead Roman Bushuiev Anton Bushuiev Tomáš Pluskal Mingxun Wang
author_facet	Michael Strobel Alberto Gil-de-la-Fuente Mohammad Reza Zare Shahneh Yasin El Abiead Roman Bushuiev Anton Bushuiev Tomáš Pluskal Mingxun Wang
author_sort	Michael Strobel
collection	DOAJ
description	Abstract Background Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively structurally related compounds. However, a key bottleneck of this approach is the comparison of MS/MS spectra used to identify nearby structural neighbors. Machine learning (ML) approaches have emerged as a promising technique to predict structural similarity from MS/MS that may surpass the current state-of-the-art algorithmic methods. However, the comparison between these different ML methods remains a challenge because there is a lack of standardization to benchmark, evaluate, and compare MS/MS similarity methods, and there are no methods that address data leakage between training and test data in order to analyze model generalizability. Result In this work, we present the creation of a new evaluation methodology using a train/test split that allows for the evaluation of machine learning models at varying degrees of structural similarity between training and test sets. We also introduce a training and evaluation framework that measures prediction accuracy on domain-inspired annotation and retrieval metrics designed to mirror real-world applications. We further show how two alternative training methods that leverage MS specific insights (e.g., similar instrumentation, collision energy, adduct) affect method performance and demonstrate the orthogonality of the proposed metrics. We especially highlight the role that collision energy plays in prediction errors. Finally, we release a continually updated version of our dataset online along with our data cleaning and splitting pipelines for community use. Conclusion It is our hope that this benchmark will serve as the basis of development for future machine learning approaches in MS/MS similarity and facilitate comparison between models. We anticipate that the introduced set of evaluation metrics allows for a better reflection of practical performance.
format	Article
id	doaj-art-aa840295040448f3b42d4595e15fda2e
institution	Kabale University
issn	1471-2105
language	English
publishDate	2025-07-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj-art-aa840295040448f3b42d4595e15fda2e2025-08-20T04:02:42ZengBMCBMC Bioinformatics1471-21052025-07-0126111710.1186/s12859-025-06194-1An evaluation methodology for machine learning-based tandem mass spectra similarity predictionMichael Strobel0Alberto Gil-de-la-Fuente1Mohammad Reza Zare Shahneh2Yasin El Abiead3Roman Bushuiev4Anton Bushuiev5Tomáš Pluskal6Mingxun Wang7Department of Computer Science and Engineering, University of California RiversideInformation Technologies Department, Escuela Politécnica Superior, Universidad San Pablo-CEU, CEU UniversitiesDepartment of Computer Science and Engineering, University of California RiversideSkaggs School of Pharmacy and Pharmaceutical Science, University of California San DiegoInstitute of Organic Chemistry and Biochemistry, Czech Academy of SciencesCzech Institute of Informatics, Robotics and CyberneticsInstitute of Organic Chemistry and Biochemistry, Czech Academy of SciencesDepartment of Computer Science and Engineering, University of California RiversideAbstract Background Untargeted tandem mass spectrometry serves as a scalable solution for the organization of small molecules. One of the most prevalent techniques for analyzing the acquired tandem mass spectrometry data (MS/MS) - called molecular networking - organizes and visualizes putatively structurally related compounds. However, a key bottleneck of this approach is the comparison of MS/MS spectra used to identify nearby structural neighbors. Machine learning (ML) approaches have emerged as a promising technique to predict structural similarity from MS/MS that may surpass the current state-of-the-art algorithmic methods. However, the comparison between these different ML methods remains a challenge because there is a lack of standardization to benchmark, evaluate, and compare MS/MS similarity methods, and there are no methods that address data leakage between training and test data in order to analyze model generalizability. Result In this work, we present the creation of a new evaluation methodology using a train/test split that allows for the evaluation of machine learning models at varying degrees of structural similarity between training and test sets. We also introduce a training and evaluation framework that measures prediction accuracy on domain-inspired annotation and retrieval metrics designed to mirror real-world applications. We further show how two alternative training methods that leverage MS specific insights (e.g., similar instrumentation, collision energy, adduct) affect method performance and demonstrate the orthogonality of the proposed metrics. We especially highlight the role that collision energy plays in prediction errors. Finally, we release a continually updated version of our dataset online along with our data cleaning and splitting pipelines for community use. Conclusion It is our hope that this benchmark will serve as the basis of development for future machine learning approaches in MS/MS similarity and facilitate comparison between models. We anticipate that the introduced set of evaluation metrics allows for a better reflection of practical performance.https://doi.org/10.1186/s12859-025-06194-1Mass spectrometryMetabolomicsSpectral similarity measureMachine learningBenchmark
spellingShingle	Michael Strobel Alberto Gil-de-la-Fuente Mohammad Reza Zare Shahneh Yasin El Abiead Roman Bushuiev Anton Bushuiev Tomáš Pluskal Mingxun Wang An evaluation methodology for machine learning-based tandem mass spectra similarity prediction BMC Bioinformatics Mass spectrometry Metabolomics Spectral similarity measure Machine learning Benchmark
title	An evaluation methodology for machine learning-based tandem mass spectra similarity prediction
title_full	An evaluation methodology for machine learning-based tandem mass spectra similarity prediction
title_fullStr	An evaluation methodology for machine learning-based tandem mass spectra similarity prediction
title_full_unstemmed	An evaluation methodology for machine learning-based tandem mass spectra similarity prediction
title_short	An evaluation methodology for machine learning-based tandem mass spectra similarity prediction
title_sort	evaluation methodology for machine learning based tandem mass spectra similarity prediction
topic	Mass spectrometry Metabolomics Spectral similarity measure Machine learning Benchmark
url	https://doi.org/10.1186/s12859-025-06194-1
work_keys_str_mv	AT michaelstrobel anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT albertogildelafuente anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT mohammadrezazareshahneh anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT yasinelabiead anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT romanbushuiev anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT antonbushuiev anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT tomaspluskal anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT mingxunwang anevaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT michaelstrobel evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT albertogildelafuente evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT mohammadrezazareshahneh evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT yasinelabiead evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT romanbushuiev evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT antonbushuiev evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT tomaspluskal evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction AT mingxunwang evaluationmethodologyformachinelearningbasedtandemmassspectrasimilarityprediction

An evaluation methodology for machine learning-based tandem mass spectra similarity prediction

Similar Items