Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation

Abstract Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly wide...

Full description

Saved in:
Bibliographic Details
Main Authors: Yishu Wang, Mengyao Guo, Xiaomin Chen, Dongmei Ai
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-86840-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862455978164224
author Yishu Wang
Mengyao Guo
Xiaomin Chen
Dongmei Ai
author_facet Yishu Wang
Mengyao Guo
Xiaomin Chen
Dongmei Ai
author_sort Yishu Wang
collection DOAJ
description Abstract Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly widespread application of various types of deep learning algorithms has led to revolutionary changes in de novo molecular generation research. In particular, the emergence of a novel natural language processing (NLP) architecture called the transformer has improved the state-of-the-art performance of existing AI technologies. In this study, we modified one top-performing molecular generation model on the basis of the generative pretraining transformer (GPT) architecture in three directions. Moreover, we propose an integrated end-to-end neural network learning framework based on one complete encoder-decoder architecture transformer model: Transfer Text-to-Text Transformer (T5), by learning the embedding vector representation space of conditional molecular properties to encode and guide the vector representation of SMILES sequences, resulting in the output of the final decoder block with a softmax output (maximum likelihood objective). Moreover, we evaluated the performance of these NLP-based generation models and another new model architecture based on a selective state space and selected the best approach jointing a transfer learning strategy for de novo drug discovery to target L858R/T790M/C797S-mutant EGFR in non-small cell lung cancer.
format Article
id doaj-art-12fda47545d94287b3e5cae4f3b55414
institution Kabale University
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-12fda47545d94287b3e5cae4f3b554142025-02-09T12:29:01ZengNature PortfolioScientific Reports2045-23222025-02-0115111510.1038/s41598-025-86840-zScreening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generationYishu Wang0Mengyao Guo1Xiaomin Chen2Dongmei Ai3School of Mathematics and Physics, University of Science and Technology BeijingSchool of Mathematics and Physics, University of Science and Technology BeijingSchool of Mathematics and Physics, University of Science and Technology BeijingSchool of Mathematics and Physics, University of Science and Technology BeijingAbstract Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly widespread application of various types of deep learning algorithms has led to revolutionary changes in de novo molecular generation research. In particular, the emergence of a novel natural language processing (NLP) architecture called the transformer has improved the state-of-the-art performance of existing AI technologies. In this study, we modified one top-performing molecular generation model on the basis of the generative pretraining transformer (GPT) architecture in three directions. Moreover, we propose an integrated end-to-end neural network learning framework based on one complete encoder-decoder architecture transformer model: Transfer Text-to-Text Transformer (T5), by learning the embedding vector representation space of conditional molecular properties to encode and guide the vector representation of SMILES sequences, resulting in the output of the final decoder block with a softmax output (maximum likelihood objective). Moreover, we evaluated the performance of these NLP-based generation models and another new model architecture based on a selective state space and selected the best approach jointing a transfer learning strategy for de novo drug discovery to target L858R/T790M/C797S-mutant EGFR in non-small cell lung cancer.https://doi.org/10.1038/s41598-025-86840-zGenerative pretraining transformer (GPT)T5NSCLCMambaTransfer learningRoPE
spellingShingle Yishu Wang
Mengyao Guo
Xiaomin Chen
Dongmei Ai
Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
Scientific Reports
Generative pretraining transformer (GPT)
T5
NSCLC
Mamba
Transfer learning
RoPE
title Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
title_full Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
title_fullStr Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
title_full_unstemmed Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
title_short Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
title_sort screening of multi deep learning based de novo molecular generation models and their application for specific target molecular generation
topic Generative pretraining transformer (GPT)
T5
NSCLC
Mamba
Transfer learning
RoPE
url https://doi.org/10.1038/s41598-025-86840-z
work_keys_str_mv AT yishuwang screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration
AT mengyaoguo screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration
AT xiaominchen screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration
AT dongmeiai screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration