Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features

Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have att...

Full description

Saved in:
Bibliographic Details
Main Authors: Sanjana Srinivasan, Alec Lamens, Jürgen Bajorath
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:European Journal of Medicinal Chemistry Reports
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772417425000470
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849711487600295936
author Sanjana Srinivasan
Alec Lamens
Jürgen Bajorath
author_facet Sanjana Srinivasan
Alec Lamens
Jürgen Bajorath
author_sort Sanjana Srinivasan
collection DOAJ
description Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have attempted to better understand from a medicinal chemistry perspective how CLMs learn and arrive at compound predictions. Therefore, we have subjected dual-target candidate compounds for polypharmacology generated with transformer CLMs to a series of analysis steps exploring structural features that are learned and compared them to known compounds with dual-target activity. Using machine learning combined with distinct chemical structure-oriented approaches from explainable artificial intelligence, we show that CLMs learn substructures characteristic of known dual-target compounds as a basis for generating new candidates with various chemical modifications.
format Article
id doaj-art-52df7bf02ae04ac89c518364656c5905
institution DOAJ
issn 2772-4174
language English
publishDate 2025-12-01
publisher Elsevier
record_format Article
series European Journal of Medicinal Chemistry Reports
spelling doaj-art-52df7bf02ae04ac89c518364656c59052025-08-20T03:14:36ZengElsevierEuropean Journal of Medicinal Chemistry Reports2772-41742025-12-011510029110.1016/j.ejmcr.2025.100291Dual-target candidate compounds from a transformer chemical language model contain characteristic structural featuresSanjana Srinivasan0Alec Lamens1Jürgen Bajorath2Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, GermanyDepartment of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, GermanyDepartment of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Corresponding author. Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Germany.Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have attempted to better understand from a medicinal chemistry perspective how CLMs learn and arrive at compound predictions. Therefore, we have subjected dual-target candidate compounds for polypharmacology generated with transformer CLMs to a series of analysis steps exploring structural features that are learned and compared them to known compounds with dual-target activity. Using machine learning combined with distinct chemical structure-oriented approaches from explainable artificial intelligence, we show that CLMs learn substructures characteristic of known dual-target compounds as a basis for generating new candidates with various chemical modifications.http://www.sciencedirect.com/science/article/pii/S2772417425000470Dual-target compoundsPolypharmacologyGenerative compound designTransformerChemical language modelCharacteristic substructures
spellingShingle Sanjana Srinivasan
Alec Lamens
Jürgen Bajorath
Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
European Journal of Medicinal Chemistry Reports
Dual-target compounds
Polypharmacology
Generative compound design
Transformer
Chemical language model
Characteristic substructures
title Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
title_full Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
title_fullStr Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
title_full_unstemmed Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
title_short Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
title_sort dual target candidate compounds from a transformer chemical language model contain characteristic structural features
topic Dual-target compounds
Polypharmacology
Generative compound design
Transformer
Chemical language model
Characteristic substructures
url http://www.sciencedirect.com/science/article/pii/S2772417425000470
work_keys_str_mv AT sanjanasrinivasan dualtargetcandidatecompoundsfromatransformerchemicallanguagemodelcontaincharacteristicstructuralfeatures
AT aleclamens dualtargetcandidatecompoundsfromatransformerchemicallanguagemodelcontaincharacteristicstructuralfeatures
AT jurgenbajorath dualtargetcandidatecompoundsfromatransformerchemicallanguagemodelcontaincharacteristicstructuralfeatures