Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features
Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have att...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-12-01
|
| Series: | European Journal of Medicinal Chemistry Reports |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772417425000470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849711487600295936 |
|---|---|
| author | Sanjana Srinivasan Alec Lamens Jürgen Bajorath |
| author_facet | Sanjana Srinivasan Alec Lamens Jürgen Bajorath |
| author_sort | Sanjana Srinivasan |
| collection | DOAJ |
| description | Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have attempted to better understand from a medicinal chemistry perspective how CLMs learn and arrive at compound predictions. Therefore, we have subjected dual-target candidate compounds for polypharmacology generated with transformer CLMs to a series of analysis steps exploring structural features that are learned and compared them to known compounds with dual-target activity. Using machine learning combined with distinct chemical structure-oriented approaches from explainable artificial intelligence, we show that CLMs learn substructures characteristic of known dual-target compounds as a basis for generating new candidates with various chemical modifications. |
| format | Article |
| id | doaj-art-52df7bf02ae04ac89c518364656c5905 |
| institution | DOAJ |
| issn | 2772-4174 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | European Journal of Medicinal Chemistry Reports |
| spelling | doaj-art-52df7bf02ae04ac89c518364656c59052025-08-20T03:14:36ZengElsevierEuropean Journal of Medicinal Chemistry Reports2772-41742025-12-011510029110.1016/j.ejmcr.2025.100291Dual-target candidate compounds from a transformer chemical language model contain characteristic structural featuresSanjana Srinivasan0Alec Lamens1Jürgen Bajorath2Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, GermanyDepartment of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, GermanyDepartment of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Lamarr Institute for Machine Learning and Artificial Intelligence, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, D-53115, Germany; Corresponding author. Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Germany.Chemical language models (CLMs) are increasingly used for generative design of candidate compounds for medicinal chemistry. However, their predictions are difficult to rationalize. Currently, detailed computational explanations of CLM-based compound generation are unavailable. Therefore, we have attempted to better understand from a medicinal chemistry perspective how CLMs learn and arrive at compound predictions. Therefore, we have subjected dual-target candidate compounds for polypharmacology generated with transformer CLMs to a series of analysis steps exploring structural features that are learned and compared them to known compounds with dual-target activity. Using machine learning combined with distinct chemical structure-oriented approaches from explainable artificial intelligence, we show that CLMs learn substructures characteristic of known dual-target compounds as a basis for generating new candidates with various chemical modifications.http://www.sciencedirect.com/science/article/pii/S2772417425000470Dual-target compoundsPolypharmacologyGenerative compound designTransformerChemical language modelCharacteristic substructures |
| spellingShingle | Sanjana Srinivasan Alec Lamens Jürgen Bajorath Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features European Journal of Medicinal Chemistry Reports Dual-target compounds Polypharmacology Generative compound design Transformer Chemical language model Characteristic substructures |
| title | Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features |
| title_full | Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features |
| title_fullStr | Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features |
| title_full_unstemmed | Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features |
| title_short | Dual-target candidate compounds from a transformer chemical language model contain characteristic structural features |
| title_sort | dual target candidate compounds from a transformer chemical language model contain characteristic structural features |
| topic | Dual-target compounds Polypharmacology Generative compound design Transformer Chemical language model Characteristic substructures |
| url | http://www.sciencedirect.com/science/article/pii/S2772417425000470 |
| work_keys_str_mv | AT sanjanasrinivasan dualtargetcandidatecompoundsfromatransformerchemicallanguagemodelcontaincharacteristicstructuralfeatures AT aleclamens dualtargetcandidatecompoundsfromatransformerchemicallanguagemodelcontaincharacteristicstructuralfeatures AT jurgenbajorath dualtargetcandidatecompoundsfromatransformerchemicallanguagemodelcontaincharacteristicstructuralfeatures |