Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data

Abstract The accumulation of large datasets by the scientific community has surpassed the capacity of traditional processing methods, underscoring the critical need for innovative and efficient algorithms capable of navigating through extensive existing experimental data. Addressing this challenge,...

Full description

Saved in:
Bibliographic Details
Main Authors: Konstantin S. Kozlov, Daniil A. Boiko, Julia V. Burykina, Valentina V. Ilyushenkova, Alexander Y. Kostyukovich, Ekaterina D. Patil, Valentine P. Ananikov
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-56905-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850039932729425920
author Konstantin S. Kozlov
Daniil A. Boiko
Julia V. Burykina
Valentina V. Ilyushenkova
Alexander Y. Kostyukovich
Ekaterina D. Patil
Valentine P. Ananikov
author_facet Konstantin S. Kozlov
Daniil A. Boiko
Julia V. Burykina
Valentina V. Ilyushenkova
Alexander Y. Kostyukovich
Ekaterina D. Patil
Valentine P. Ananikov
author_sort Konstantin S. Kozlov
collection DOAJ
description Abstract The accumulation of large datasets by the scientific community has surpassed the capacity of traditional processing methods, underscoring the critical need for innovative and efficient algorithms capable of navigating through extensive existing experimental data. Addressing this challenge, our study introduces a machine learning (ML)-powered search engine specifically tailored for analyzing tera-scale high-resolution mass spectrometry (HRMS) data. This engine harnesses a novel isotope-distribution-centric search algorithm augmented by two synergistic ML models, assisting with the discovery of hitherto unknown chemical reactions. This methodology enables the rigorous investigation of existing data, thus providing efficient support for chemical hypotheses while reducing the need for conducting additional experiments. Moreover, we extend this approach with baseline methods for automated reaction hypothesis generation. In its practical validation, our approach successfully identified several reactions, unveiling previously undescribed transformations. Among these, the heterocycle-vinyl coupling process within the Mizoroki-Heck reaction stands out, highlighting the capability of the engine to elucidate complex chemical phenomena.
format Article
id doaj-art-ac78c308794d4c5a8d91b7a02a11bc79
institution DOAJ
issn 2041-1723
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-ac78c308794d4c5a8d91b7a02a11bc792025-08-20T02:56:12ZengNature PortfolioNature Communications2041-17232025-03-0116111210.1038/s41467-025-56905-8Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry dataKonstantin S. Kozlov0Daniil A. Boiko1Julia V. Burykina2Valentina V. Ilyushenkova3Alexander Y. Kostyukovich4Ekaterina D. Patil5Valentine P. Ananikov6Zelinsky Institute of Organic Chemistry, Russian Academy of SciencesZelinsky Institute of Organic Chemistry, Russian Academy of SciencesZelinsky Institute of Organic Chemistry, Russian Academy of SciencesZelinsky Institute of Organic Chemistry, Russian Academy of SciencesZelinsky Institute of Organic Chemistry, Russian Academy of SciencesZelinsky Institute of Organic Chemistry, Russian Academy of SciencesZelinsky Institute of Organic Chemistry, Russian Academy of SciencesAbstract The accumulation of large datasets by the scientific community has surpassed the capacity of traditional processing methods, underscoring the critical need for innovative and efficient algorithms capable of navigating through extensive existing experimental data. Addressing this challenge, our study introduces a machine learning (ML)-powered search engine specifically tailored for analyzing tera-scale high-resolution mass spectrometry (HRMS) data. This engine harnesses a novel isotope-distribution-centric search algorithm augmented by two synergistic ML models, assisting with the discovery of hitherto unknown chemical reactions. This methodology enables the rigorous investigation of existing data, thus providing efficient support for chemical hypotheses while reducing the need for conducting additional experiments. Moreover, we extend this approach with baseline methods for automated reaction hypothesis generation. In its practical validation, our approach successfully identified several reactions, unveiling previously undescribed transformations. Among these, the heterocycle-vinyl coupling process within the Mizoroki-Heck reaction stands out, highlighting the capability of the engine to elucidate complex chemical phenomena.https://doi.org/10.1038/s41467-025-56905-8
spellingShingle Konstantin S. Kozlov
Daniil A. Boiko
Julia V. Burykina
Valentina V. Ilyushenkova
Alexander Y. Kostyukovich
Ekaterina D. Patil
Valentine P. Ananikov
Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
Nature Communications
title Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
title_full Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
title_fullStr Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
title_full_unstemmed Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
title_short Discovering organic reactions with a machine-learning-powered deciphering of tera-scale mass spectrometry data
title_sort discovering organic reactions with a machine learning powered deciphering of tera scale mass spectrometry data
url https://doi.org/10.1038/s41467-025-56905-8
work_keys_str_mv AT konstantinskozlov discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata
AT daniilaboiko discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata
AT juliavburykina discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata
AT valentinavilyushenkova discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata
AT alexanderykostyukovich discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata
AT ekaterinadpatil discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata
AT valentinepananikov discoveringorganicreactionswithamachinelearningpowereddecipheringofterascalemassspectrometrydata