MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data
<b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development o...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Metabolites |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2218-1989/14/11/602 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850227135363416064 |
|---|---|
| author | Nami Sakamoto Takaki Oka Yuki Matsuzawa Kozo Nishida Jayashankar Jayaprakash Aya Hori Makoto Arita Hiroshi Tsugawa |
| author_facet | Nami Sakamoto Takaki Oka Yuki Matsuzawa Kozo Nishida Jayashankar Jayaprakash Aya Hori Makoto Arita Hiroshi Tsugawa |
| author_sort | Nami Sakamoto |
| collection | DOAJ |
| description | <b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. <b>Methods</b>: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. <b>Results</b>: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. <b>Conclusions</b>: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion. |
| format | Article |
| id | doaj-art-baa38db382774dd0b034dd8651154549 |
| institution | OA Journals |
| issn | 2218-1989 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Metabolites |
| spelling | doaj-art-baa38db382774dd0b034dd86511545492025-08-20T02:04:54ZengMDPI AGMetabolites2218-19892024-11-01141160210.3390/metabo14110602MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral DataNami Sakamoto0Takaki Oka1Yuki Matsuzawa2Kozo Nishida3Jayashankar Jayaprakash4Aya Hori5Makoto Arita6Hiroshi Tsugawa7Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, JapanGraduate School of Global Food Resources, Hokkaido University, Kita-9, Nishi-9, Kita-ku, Sapporo 060-0809, JapanLaboratory for Metabolomics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, JapanLaboratory for Metabolomics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Kanagawa, JapanDepartment of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo 184-8588, Japan<b>Background</b>: Untargeted lipidomics using collision-induced dissociation-based tandem mass spectrometry (CID-MS/MS) is essential for biological and clinical applications. However, annotation confidence still relies on manual curation by analytical chemists, despite the development of various software tools for automatic spectral processing based on rule-based fragment annotations. <b>Methods</b>: In this study, we present a novel machine learning model, MS2Lipid, for the prediction of known lipid subclasses from MS/MS queries, providing an orthogonal approach to existing lipidomics software programs in determining the lipid subclass of ion features. We designed a new descriptor, MCH (mode of carbon and hydrogen), to increase the specificity of lipid subclass prediction in nominal mass resolution MS data. <b>Results</b>: The model, trained with 6760 and 6862 manually curated MS/MS spectra for the positive and negative ion modes, respectively, classified queries into one or several of 97 lipid subclasses, achieving an accuracy of 97.4% in the test set. The program was further validated using various datasets from different instruments and curators, with the average accuracy exceeding 87.2%. Using an integrated approach with molecular spectral networking, we demonstrated the utility of MS2Lipid by annotating microbiota-derived esterified bile acids, whose abundance was significantly increased in fecal samples of obese patients in a human cohort study. This suggests that the machine learning model provides an independent criterion for lipid subclass classification, enhancing the annotation of lipid metabolites within known lipid classes. <b>Conclusions</b>: MS2Lipid is a highly accurate machine learning model that enhances lipid subclass annotation from MS/MS data and provides an independent criterion.https://www.mdpi.com/2218-1989/14/11/602untargeted lipidomicstandem mass spectrummachine learninglipid class predictionmicrobiota-dependent lipidshuman fecal samples |
| spellingShingle | Nami Sakamoto Takaki Oka Yuki Matsuzawa Kozo Nishida Jayashankar Jayaprakash Aya Hori Makoto Arita Hiroshi Tsugawa MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data Metabolites untargeted lipidomics tandem mass spectrum machine learning lipid class prediction microbiota-dependent lipids human fecal samples |
| title | MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data |
| title_full | MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data |
| title_fullStr | MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data |
| title_full_unstemmed | MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data |
| title_short | MS2Lipid: A Lipid Subclass Prediction Program Using Machine Learning and Curated Tandem Mass Spectral Data |
| title_sort | ms2lipid a lipid subclass prediction program using machine learning and curated tandem mass spectral data |
| topic | untargeted lipidomics tandem mass spectrum machine learning lipid class prediction microbiota-dependent lipids human fecal samples |
| url | https://www.mdpi.com/2218-1989/14/11/602 |
| work_keys_str_mv | AT namisakamoto ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT takakioka ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT yukimatsuzawa ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT kozonishida ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT jayashankarjayaprakash ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT ayahori ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT makotoarita ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata AT hiroshitsugawa ms2lipidalipidsubclasspredictionprogramusingmachinelearningandcuratedtandemmassspectraldata |