Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
Generating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as th...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Molecules |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1420-3049/30/6/1262 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342852114415616 |
|---|---|
| author | Huibin Wang Zehui Wang Minghua Shi Zixian Cheng Ying Qian |
| author_facet | Huibin Wang Zehui Wang Minghua Shi Zixian Cheng Ying Qian |
| author_sort | Huibin Wang |
| collection | DOAJ |
| description | Generating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as the key structural foundation, can facilitate language models in discovering chemically feasible and biologically relevant molecules. However, directly using scaffolds as prior inputs can introduce bias, thereby limiting the exploration of novel molecules. To combine the above advantages and address the limitation, we incorporate molecular scaffold information into language models via an <b>O</b>nline knowledge distillation framework for the unconditional <b>M</b>olecule <b>G</b>eneration task (<b>OMG</b>), which consists of a GPT model that generates SMILES strings of molecules from scratch and a Transformer model that generate SMILES strings of molecules from scaffolds. The knowledge of scaffolds and complete molecular structures is deeply integrated through the mutual learning of the two models. Experimental results on two well-known molecule generation benchmarks show that the OMG framework enhances both the validity and novelty of the GPT-based unconditional molecule generation model. Furthermore, comprehensive property-specific evaluation results indicate that the generated molecules achieve a favorable balance across multiple chemical properties and biological activity, demonstrating the potential of our method in discovering viable drug candidates. |
| format | Article |
| id | doaj-art-996359d3cfdc47018b6655ff26669879 |
| institution | Kabale University |
| issn | 1420-3049 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Molecules |
| spelling | doaj-art-996359d3cfdc47018b6655ff266698792025-08-20T03:43:14ZengMDPI AGMolecules1420-30492025-03-01306126210.3390/molecules30061262Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of ScaffoldsHuibin Wang0Zehui Wang1Minghua Shi2Zixian Cheng3Ying Qian4Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaGenerating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as the key structural foundation, can facilitate language models in discovering chemically feasible and biologically relevant molecules. However, directly using scaffolds as prior inputs can introduce bias, thereby limiting the exploration of novel molecules. To combine the above advantages and address the limitation, we incorporate molecular scaffold information into language models via an <b>O</b>nline knowledge distillation framework for the unconditional <b>M</b>olecule <b>G</b>eneration task (<b>OMG</b>), which consists of a GPT model that generates SMILES strings of molecules from scratch and a Transformer model that generate SMILES strings of molecules from scaffolds. The knowledge of scaffolds and complete molecular structures is deeply integrated through the mutual learning of the two models. Experimental results on two well-known molecule generation benchmarks show that the OMG framework enhances both the validity and novelty of the GPT-based unconditional molecule generation model. Furthermore, comprehensive property-specific evaluation results indicate that the generated molecules achieve a favorable balance across multiple chemical properties and biological activity, demonstrating the potential of our method in discovering viable drug candidates.https://www.mdpi.com/1420-3049/30/6/1262unconditional molecule generationscaffold-based molecule generationlanguage modelonline knowledge distillation |
| spellingShingle | Huibin Wang Zehui Wang Minghua Shi Zixian Cheng Ying Qian Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds Molecules unconditional molecule generation scaffold-based molecule generation language model online knowledge distillation |
| title | Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds |
| title_full | Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds |
| title_fullStr | Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds |
| title_full_unstemmed | Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds |
| title_short | Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds |
| title_sort | enhancing unconditional molecule generation via online knowledge distillation of scaffolds |
| topic | unconditional molecule generation scaffold-based molecule generation language model online knowledge distillation |
| url | https://www.mdpi.com/1420-3049/30/6/1262 |
| work_keys_str_mv | AT huibinwang enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds AT zehuiwang enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds AT minghuashi enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds AT zixiancheng enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds AT yingqian enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds |