Developing deep learning-based large-scale organic reaction classification model via sigma-profiles

Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenlong Wang, Chenyang Xu, Jian Du, Lei Zhang
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2025-06-01
Series:Green Chemical Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666952824000396
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850036406619996160
author Wenlong Wang
Chenyang Xu
Jian Du
Lei Zhang
author_facet Wenlong Wang
Chenyang Xu
Jian Du
Lei Zhang
author_sort Wenlong Wang
collection DOAJ
description Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.
format Article
id doaj-art-dcacaa6efed5414d90f7da240ffbdf7a
institution DOAJ
issn 2666-9528
language English
publishDate 2025-06-01
publisher KeAi Communications Co. Ltd.
record_format Article
series Green Chemical Engineering
spelling doaj-art-dcacaa6efed5414d90f7da240ffbdf7a2025-08-20T02:57:08ZengKeAi Communications Co. Ltd.Green Chemical Engineering2666-95282025-06-016218119210.1016/j.gce.2024.06.003Developing deep learning-based large-scale organic reaction classification model via sigma-profilesWenlong Wang0Chenyang Xu1Jian Du2Lei Zhang3State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, China; Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, 117585, SingaporeState Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, ChinaState Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, ChinaState Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, China; Corresponding author.Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.http://www.sciencedirect.com/science/article/pii/S2666952824000396Reaction classificationReaction fingerprintSigma-profilesDeep learning
spellingShingle Wenlong Wang
Chenyang Xu
Jian Du
Lei Zhang
Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
Green Chemical Engineering
Reaction classification
Reaction fingerprint
Sigma-profiles
Deep learning
title Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
title_full Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
title_fullStr Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
title_full_unstemmed Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
title_short Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
title_sort developing deep learning based large scale organic reaction classification model via sigma profiles
topic Reaction classification
Reaction fingerprint
Sigma-profiles
Deep learning
url http://www.sciencedirect.com/science/article/pii/S2666952824000396
work_keys_str_mv AT wenlongwang developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles
AT chenyangxu developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles
AT jiandu developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles
AT leizhang developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles