Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation
Abstract Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly wide...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-86840-z |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823862455978164224 |
---|---|
author | Yishu Wang Mengyao Guo Xiaomin Chen Dongmei Ai |
author_facet | Yishu Wang Mengyao Guo Xiaomin Chen Dongmei Ai |
author_sort | Yishu Wang |
collection | DOAJ |
description | Abstract Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly widespread application of various types of deep learning algorithms has led to revolutionary changes in de novo molecular generation research. In particular, the emergence of a novel natural language processing (NLP) architecture called the transformer has improved the state-of-the-art performance of existing AI technologies. In this study, we modified one top-performing molecular generation model on the basis of the generative pretraining transformer (GPT) architecture in three directions. Moreover, we propose an integrated end-to-end neural network learning framework based on one complete encoder-decoder architecture transformer model: Transfer Text-to-Text Transformer (T5), by learning the embedding vector representation space of conditional molecular properties to encode and guide the vector representation of SMILES sequences, resulting in the output of the final decoder block with a softmax output (maximum likelihood objective). Moreover, we evaluated the performance of these NLP-based generation models and another new model architecture based on a selective state space and selected the best approach jointing a transfer learning strategy for de novo drug discovery to target L858R/T790M/C797S-mutant EGFR in non-small cell lung cancer. |
format | Article |
id | doaj-art-12fda47545d94287b3e5cae4f3b55414 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-12fda47545d94287b3e5cae4f3b554142025-02-09T12:29:01ZengNature PortfolioScientific Reports2045-23222025-02-0115111510.1038/s41598-025-86840-zScreening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generationYishu Wang0Mengyao Guo1Xiaomin Chen2Dongmei Ai3School of Mathematics and Physics, University of Science and Technology BeijingSchool of Mathematics and Physics, University of Science and Technology BeijingSchool of Mathematics and Physics, University of Science and Technology BeijingSchool of Mathematics and Physics, University of Science and Technology BeijingAbstract Traditional virtual screening methods need to explore expanse and vast chemical spaces and need to be based on existing chemical libraries. With the development of deep learning techniques for the de novo generation of molecules, also known as inverse molecular design, the increasingly widespread application of various types of deep learning algorithms has led to revolutionary changes in de novo molecular generation research. In particular, the emergence of a novel natural language processing (NLP) architecture called the transformer has improved the state-of-the-art performance of existing AI technologies. In this study, we modified one top-performing molecular generation model on the basis of the generative pretraining transformer (GPT) architecture in three directions. Moreover, we propose an integrated end-to-end neural network learning framework based on one complete encoder-decoder architecture transformer model: Transfer Text-to-Text Transformer (T5), by learning the embedding vector representation space of conditional molecular properties to encode and guide the vector representation of SMILES sequences, resulting in the output of the final decoder block with a softmax output (maximum likelihood objective). Moreover, we evaluated the performance of these NLP-based generation models and another new model architecture based on a selective state space and selected the best approach jointing a transfer learning strategy for de novo drug discovery to target L858R/T790M/C797S-mutant EGFR in non-small cell lung cancer.https://doi.org/10.1038/s41598-025-86840-zGenerative pretraining transformer (GPT)T5NSCLCMambaTransfer learningRoPE |
spellingShingle | Yishu Wang Mengyao Guo Xiaomin Chen Dongmei Ai Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation Scientific Reports Generative pretraining transformer (GPT) T5 NSCLC Mamba Transfer learning RoPE |
title | Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation |
title_full | Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation |
title_fullStr | Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation |
title_full_unstemmed | Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation |
title_short | Screening of multi deep learning-based de novo molecular generation models and their application for specific target molecular generation |
title_sort | screening of multi deep learning based de novo molecular generation models and their application for specific target molecular generation |
topic | Generative pretraining transformer (GPT) T5 NSCLC Mamba Transfer learning RoPE |
url | https://doi.org/10.1038/s41598-025-86840-z |
work_keys_str_mv | AT yishuwang screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration AT mengyaoguo screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration AT xiaominchen screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration AT dongmeiai screeningofmultideeplearningbaseddenovomoleculargenerationmodelsandtheirapplicationforspecifictargetmoleculargeneration |