Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds

Generating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as th...

Full description

Saved in:
Bibliographic Details
Main Authors: Huibin Wang, Zehui Wang, Minghua Shi, Zixian Cheng, Ying Qian
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/30/6/1262
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849342852114415616
author Huibin Wang
Zehui Wang
Minghua Shi
Zixian Cheng
Ying Qian
author_facet Huibin Wang
Zehui Wang
Minghua Shi
Zixian Cheng
Ying Qian
author_sort Huibin Wang
collection DOAJ
description Generating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as the key structural foundation, can facilitate language models in discovering chemically feasible and biologically relevant molecules. However, directly using scaffolds as prior inputs can introduce bias, thereby limiting the exploration of novel molecules. To combine the above advantages and address the limitation, we incorporate molecular scaffold information into language models via an <b>O</b>nline knowledge distillation framework for the unconditional <b>M</b>olecule <b>G</b>eneration task (<b>OMG</b>), which consists of a GPT model that generates SMILES strings of molecules from scratch and a Transformer model that generate SMILES strings of molecules from scaffolds. The knowledge of scaffolds and complete molecular structures is deeply integrated through the mutual learning of the two models. Experimental results on two well-known molecule generation benchmarks show that the OMG framework enhances both the validity and novelty of the GPT-based unconditional molecule generation model. Furthermore, comprehensive property-specific evaluation results indicate that the generated molecules achieve a favorable balance across multiple chemical properties and biological activity, demonstrating the potential of our method in discovering viable drug candidates.
format Article
id doaj-art-996359d3cfdc47018b6655ff26669879
institution Kabale University
issn 1420-3049
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj-art-996359d3cfdc47018b6655ff266698792025-08-20T03:43:14ZengMDPI AGMolecules1420-30492025-03-01306126210.3390/molecules30061262Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of ScaffoldsHuibin Wang0Zehui Wang1Minghua Shi2Zixian Cheng3Ying Qian4Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaShanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai 200062, ChinaGenerating new drug-like molecules is an essential aspect of drug discovery, and deep learning models significantly accelerate this process. Language models have demonstrated great potential in generating novel and realistic SMILES representations of molecules. Molecular scaffolds, which serve as the key structural foundation, can facilitate language models in discovering chemically feasible and biologically relevant molecules. However, directly using scaffolds as prior inputs can introduce bias, thereby limiting the exploration of novel molecules. To combine the above advantages and address the limitation, we incorporate molecular scaffold information into language models via an <b>O</b>nline knowledge distillation framework for the unconditional <b>M</b>olecule <b>G</b>eneration task (<b>OMG</b>), which consists of a GPT model that generates SMILES strings of molecules from scratch and a Transformer model that generate SMILES strings of molecules from scaffolds. The knowledge of scaffolds and complete molecular structures is deeply integrated through the mutual learning of the two models. Experimental results on two well-known molecule generation benchmarks show that the OMG framework enhances both the validity and novelty of the GPT-based unconditional molecule generation model. Furthermore, comprehensive property-specific evaluation results indicate that the generated molecules achieve a favorable balance across multiple chemical properties and biological activity, demonstrating the potential of our method in discovering viable drug candidates.https://www.mdpi.com/1420-3049/30/6/1262unconditional molecule generationscaffold-based molecule generationlanguage modelonline knowledge distillation
spellingShingle Huibin Wang
Zehui Wang
Minghua Shi
Zixian Cheng
Ying Qian
Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
Molecules
unconditional molecule generation
scaffold-based molecule generation
language model
online knowledge distillation
title Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
title_full Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
title_fullStr Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
title_full_unstemmed Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
title_short Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds
title_sort enhancing unconditional molecule generation via online knowledge distillation of scaffolds
topic unconditional molecule generation
scaffold-based molecule generation
language model
online knowledge distillation
url https://www.mdpi.com/1420-3049/30/6/1262
work_keys_str_mv AT huibinwang enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds
AT zehuiwang enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds
AT minghuashi enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds
AT zixiancheng enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds
AT yingqian enhancingunconditionalmoleculegenerationviaonlineknowledgedistillationofscaffolds