Developing deep learning-based large-scale organic reaction classification model via sigma-profiles
Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co. Ltd.
2025-06-01
|
| Series: | Green Chemical Engineering |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666952824000396 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850036406619996160 |
|---|---|
| author | Wenlong Wang Chenyang Xu Jian Du Lei Zhang |
| author_facet | Wenlong Wang Chenyang Xu Jian Du Lei Zhang |
| author_sort | Wenlong Wang |
| collection | DOAJ |
| description | Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task. |
| format | Article |
| id | doaj-art-dcacaa6efed5414d90f7da240ffbdf7a |
| institution | DOAJ |
| issn | 2666-9528 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | KeAi Communications Co. Ltd. |
| record_format | Article |
| series | Green Chemical Engineering |
| spelling | doaj-art-dcacaa6efed5414d90f7da240ffbdf7a2025-08-20T02:57:08ZengKeAi Communications Co. Ltd.Green Chemical Engineering2666-95282025-06-016218119210.1016/j.gce.2024.06.003Developing deep learning-based large-scale organic reaction classification model via sigma-profilesWenlong Wang0Chenyang Xu1Jian Du2Lei Zhang3State Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, China; Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, 117585, SingaporeState Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, ChinaState Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, ChinaState Key Laboratory of Fine Chemical, Frontiers Science Center for Smart Materials Oriented Chemical Engineering, Institute of Chemical Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian, 116024, China; Corresponding author.Advanced technologies like deep learning have accelerated the discovery of novel chemical reactions, especially in the field of organic synthesis. With hundreds of thousands of reactions available for reference, one way to effectively leverage them is by classifying chemical reactions into different clusters based on their specific characteristics, which makes target-guided navigation in the vast chemical space possible. Although previous attempts that apply deep learning to reaction classification tasks have made substantial progress, developing a model with good interpretability as well as high accuracy for large-scale reaction classification tasks remains an open question. In this work, a deep learning-based model for a large-scale reaction classification task is first constructed by utilizing pre-trained BERT and autoencoder. Then, the model is trained under the open-source dataset USPTO_TPL which contains recorded reactions of up to 1000 different types. The multi-classification accuracy of the model on the testing dataset is 99.382%, showing its great potential for practical use. Besides, a reaction similarity map is presented to correlate the reactions in the USPTO_TPL dataset based on their sigma-profile-based statistical features. Finally, representative reactions from the testing dataset are provided to illustrate the model's effectiveness on the reaction classification task.http://www.sciencedirect.com/science/article/pii/S2666952824000396Reaction classificationReaction fingerprintSigma-profilesDeep learning |
| spellingShingle | Wenlong Wang Chenyang Xu Jian Du Lei Zhang Developing deep learning-based large-scale organic reaction classification model via sigma-profiles Green Chemical Engineering Reaction classification Reaction fingerprint Sigma-profiles Deep learning |
| title | Developing deep learning-based large-scale organic reaction classification model via sigma-profiles |
| title_full | Developing deep learning-based large-scale organic reaction classification model via sigma-profiles |
| title_fullStr | Developing deep learning-based large-scale organic reaction classification model via sigma-profiles |
| title_full_unstemmed | Developing deep learning-based large-scale organic reaction classification model via sigma-profiles |
| title_short | Developing deep learning-based large-scale organic reaction classification model via sigma-profiles |
| title_sort | developing deep learning based large scale organic reaction classification model via sigma profiles |
| topic | Reaction classification Reaction fingerprint Sigma-profiles Deep learning |
| url | http://www.sciencedirect.com/science/article/pii/S2666952824000396 |
| work_keys_str_mv | AT wenlongwang developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles AT chenyangxu developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles AT jiandu developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles AT leizhang developingdeeplearningbasedlargescaleorganicreactionclassificationmodelviasigmaprofiles |