Visualising lead optimisation series using reduced graphs

Abstract The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the...

Full description

Saved in:
Bibliographic Details
Main Authors: Jessica Stacey, Baptiste Canault, Stephen D. Pickett, Valerie J. Gillet
Format: Article
Language:English
Published: BMC 2025-04-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-01002-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850172897855799296
author Jessica Stacey
Baptiste Canault
Stephen D. Pickett
Valerie J. Gillet
author_facet Jessica Stacey
Baptiste Canault
Stephen D. Pickett
Valerie J. Gillet
author_sort Jessica Stacey
collection DOAJ
description Abstract The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds. This format provides an intuitive way of visualising any structure–activity relationship (SAR) that is present. Automated approaches that attempt to reproduce this well understood format, such as the SAR map, are based on maximum common substructure approaches and do not take account of small changes that may be made to the core structure itself or of the situation where more than one core exists in the data. Here we describe an automated approach to represent LO series that is based on reduced graph descriptions of molecules. A publicly available LO dataset from a drug discovery programme at GSK is analysed to show how the method can group together compounds from the same series even when there are small substructural differences within the core of the series while also being able to identify different related compound series. The resulting visualisation is useful in identifying areas where series are under explored and for mapping design ideas onto the current dataset. The code to generate the visualisations is released into the public domain to promote further research in this area. Scientific contribution: We describe a software tool for analysing lead optimisation series using reduced graph representations of molecules. The representation allows compounds that have similar but not identical chemical scaffolds to be grouped together and is, therefore, an advance on methods that are based on the more traditional Markush structure and SAR tables. The software is a useful addition to the med chem toolbox as it can provide a holistic view of lead optimisation data by representing what might otherwise be seen as separate series as a single series of compounds.
format Article
id doaj-art-1699dcd08aba4e61b7d9286dafec3843
institution OA Journals
issn 1758-2946
language English
publishDate 2025-04-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-1699dcd08aba4e61b7d9286dafec38432025-08-20T02:19:58ZengBMCJournal of Cheminformatics1758-29462025-04-0117112710.1186/s13321-025-01002-7Visualising lead optimisation series using reduced graphsJessica Stacey0Baptiste Canault1Stephen D. Pickett2Valerie J. Gillet3Information School, University of SheffieldGlaxoSmithKlineGlaxoSmithKlineInformation School, University of SheffieldAbstract The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds. This format provides an intuitive way of visualising any structure–activity relationship (SAR) that is present. Automated approaches that attempt to reproduce this well understood format, such as the SAR map, are based on maximum common substructure approaches and do not take account of small changes that may be made to the core structure itself or of the situation where more than one core exists in the data. Here we describe an automated approach to represent LO series that is based on reduced graph descriptions of molecules. A publicly available LO dataset from a drug discovery programme at GSK is analysed to show how the method can group together compounds from the same series even when there are small substructural differences within the core of the series while also being able to identify different related compound series. The resulting visualisation is useful in identifying areas where series are under explored and for mapping design ideas onto the current dataset. The code to generate the visualisations is released into the public domain to promote further research in this area. Scientific contribution: We describe a software tool for analysing lead optimisation series using reduced graph representations of molecules. The representation allows compounds that have similar but not identical chemical scaffolds to be grouped together and is, therefore, an advance on methods that are based on the more traditional Markush structure and SAR tables. The software is a useful addition to the med chem toolbox as it can provide a holistic view of lead optimisation data by representing what might otherwise be seen as separate series as a single series of compounds.https://doi.org/10.1186/s13321-025-01002-7Reduced graphsVisualisationLead optimisationSAR
spellingShingle Jessica Stacey
Baptiste Canault
Stephen D. Pickett
Valerie J. Gillet
Visualising lead optimisation series using reduced graphs
Journal of Cheminformatics
Reduced graphs
Visualisation
Lead optimisation
SAR
title Visualising lead optimisation series using reduced graphs
title_full Visualising lead optimisation series using reduced graphs
title_fullStr Visualising lead optimisation series using reduced graphs
title_full_unstemmed Visualising lead optimisation series using reduced graphs
title_short Visualising lead optimisation series using reduced graphs
title_sort visualising lead optimisation series using reduced graphs
topic Reduced graphs
Visualisation
Lead optimisation
SAR
url https://doi.org/10.1186/s13321-025-01002-7
work_keys_str_mv AT jessicastacey visualisingleadoptimisationseriesusingreducedgraphs
AT baptistecanault visualisingleadoptimisationseriesusingreducedgraphs
AT stephendpickett visualisingleadoptimisationseriesusingreducedgraphs
AT valeriejgillet visualisingleadoptimisationseriesusingreducedgraphs