Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features

Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have att...

Full description

Saved in:
Bibliographic Details
Main Authors: Sanjana Srinivasan, Alec Lamens, Jürgen Bajorath
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:European Journal of Medicinal Chemistry Reports
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772417425000470
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have attempted to better understand from a medicinal chemistry perspective how CLMs learn and arrive at compound predictions. Therefore, we have subjected dual-target candidate compounds for polypharmacology generated with transformer CLMs to a series of analysis steps exploring structural features that are learned and compared them to known compounds with dual-target activity. Using machine learning combined with distinct chemical structure-oriented approaches from explainable artificial intelligence, we show that CLMs learn substructures characteristic of known dual-target compounds as a basis for generating new candidates with various chemical modifications.
ISSN:2772-4174