MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data

<b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development o...

Full description

Saved in:
Bibliographic Details
Main Authors: Nami Sakamoto, Takaki Oka, Yuki Matsuzawa, Kozo Nishida, Jayashankar Jayaprakash, Aya Hori, Makoto Arita, Hiroshi Tsugawa
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Metabolites
Subjects:
Online Access:https://www.mdpi.com/2218-1989/14/11/602
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850227135363416064
author Nami Sakamoto
Takaki Oka
Yuki Matsuzawa
Kozo Nishida
Jayashankar Jayaprakash
Aya Hori
Makoto Arita
Hiroshi Tsugawa
author_facet Nami Sakamoto
Takaki Oka
Yuki Matsuzawa
Kozo Nishida
Jayashankar Jayaprakash
Aya Hori
Makoto Arita
Hiroshi Tsugawa
author_sort Nami Sakamoto
collection DOAJ
description <b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. <b>Methods</b>: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. <b>Results</b>: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. <b>Conclusions</b>: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion.
format Article
id doaj-art-baa38db382774dd0b034dd8651154549
institution OA Journals
issn 2218-1989
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Metabolites
spelling doaj-art-baa38db382774dd0b034dd86511545492025-08-20T02:04:54ZengMDPI AGMetabolites2218-19892024-11-01141160210.3390/metabo14110602MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral DataNami Sakamoto0Takaki Oka1Yuki Matsuzawa2Kozo Nishida3Jayashankar Jayaprakash4Aya Hori5Makoto Arita6Hiroshi Tsugawa7Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanGraduate School of Global Food Resources, Hokkaido University, Kita-9, Nishi-9, Kita-ku, Sapporo 060-0809, JapanLaboratory for Metabolomics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, JapanLaboratory for Metabolomics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan<b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. <b>Methods</b>: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. <b>Results</b>: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. <b>Conclusions</b>: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion.https://www.mdpi.com/2218-1989/14/11/602untargeted lipidomicstandem mass spectrummachine learninglipid class predictionmicrobiota-dependent lipidshuman fecal samples
spellingShingle Nami Sakamoto
Takaki Oka
Yuki Matsuzawa
Kozo Nishida
Jayashankar Jayaprakash
Aya Hori
Makoto Arita
Hiroshi Tsugawa
MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
Metabolites
untargeted lipidomics
tandem mass spectrum
machine learning
lipid class prediction
microbiota-dependent lipids
human fecal samples
title MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
title_full MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
title_fullStr MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
title_full_unstemmed MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
title_short MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
title_sort ms2lipid a lipid subclass prediction program using machine learning and curated tandem mass spectral data
topic untargeted lipidomics
tandem mass spectrum
machine learning
lipid class prediction
microbiota-dependent lipids
human fecal samples
url https://www.mdpi.com/2218-1989/14/11/602
work_keys_str_mv AT namisakamoto ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT takakioka ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT yukimatsuzawa ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT kozonishida ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT jayashankarjayaprakash ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT ayahori ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT makotoarita ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata
AT hiroshitsugawa ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata