CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions
Abstract Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | Journal of Cheminformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13321-024-00944-8 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841544364620251136 |
---|---|
author | Zishuo Zeng Jin Guo Jiao Jin Xiaozhou Luo |
author_facet | Zishuo Zeng Jin Guo Jiao Jin Xiaozhou Luo |
author_sort | Zishuo Zeng |
collection | DOAJ |
description | Abstract Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast’s metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub ( https://github.com/zishuozeng/CLAIRE ). Scientific contribution This work employed contrastive learning for predicting enzymatic reaction’s EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning. |
format | Article |
id | doaj-art-72acf5f8d83d49678b1e525df58b7a56 |
institution | Kabale University |
issn | 1758-2946 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | Journal of Cheminformatics |
spelling | doaj-art-72acf5f8d83d49678b1e525df58b7a562025-01-12T12:37:27ZengBMCJournal of Cheminformatics1758-29462025-01-011711910.1186/s13321-024-00944-8CLAIRE: a contrastive learning-based predictor for EC number of chemical reactionsZishuo Zeng0Jin Guo1Jiao Jin2Xiaozhou Luo3Synceres Biosciences Co. Ltd.Synceres Biosciences Co. Ltd.Synceres Biosciences Co. Ltd.Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Key Laboratory of Quantitative Synthetic Biology, Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of SciencesAbstract Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast’s metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub ( https://github.com/zishuozeng/CLAIRE ). Scientific contribution This work employed contrastive learning for predicting enzymatic reaction’s EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.https://doi.org/10.1186/s13321-024-00944-8Reaction EC numberContrastive learningReaction embeddingsMetabolic modelComputer-aided synthesis planning |
spellingShingle | Zishuo Zeng Jin Guo Jiao Jin Xiaozhou Luo CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions Journal of Cheminformatics Reaction EC number Contrastive learning Reaction embeddings Metabolic model Computer-aided synthesis planning |
title | CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions |
title_full | CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions |
title_fullStr | CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions |
title_full_unstemmed | CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions |
title_short | CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions |
title_sort | claire a contrastive learning based predictor for ec number of chemical reactions |
topic | Reaction EC number Contrastive learning Reaction embeddings Metabolic model Computer-aided synthesis planning |
url | https://doi.org/10.1186/s13321-024-00944-8 |
work_keys_str_mv | AT zishuozeng claireacontrastivelearningbasedpredictorforecnumberofchemicalreactions AT jinguo claireacontrastivelearningbasedpredictorforecnumberofchemicalreactions AT jiaojin claireacontrastivelearningbasedpredictorforecnumberofchemicalreactions AT xiaozhouluo claireacontrastivelearningbasedpredictorforecnumberofchemicalreactions |