A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds

Abstract Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets...

Full description

Saved in:
Bibliographic Details
Main Authors: A. Lina Heinzke, Barbara Zdrazil, Paul D. Leeson, Robert J. Young, Axel Pahl, Herbert Waldmann, Andrew R. Leach
Format: Article
Language:English
Published: Nature Portfolio 2024-10-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-024-03582-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850203977863397376
author A. Lina Heinzke
Barbara Zdrazil
Paul D. Leeson
Robert J. Young
Axel Pahl
Herbert Waldmann
Andrew R. Leach
author_facet A. Lina Heinzke
Barbara Zdrazil
Paul D. Leeson
Robert J. Young
Axel Pahl
Herbert Waldmann
Andrew R. Leach
author_sort A. Lina Heinzke
collection DOAJ
description Abstract Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.
format Article
id doaj-art-6b7b2abb2a09429392f3cc739a7e6c8c
institution OA Journals
issn 2052-4463
language English
publishDate 2024-10-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-6b7b2abb2a09429392f3cc739a7e6c8c2025-08-20T02:11:23ZengNature PortfolioScientific Data2052-44632024-10-011111910.1038/s41597-024-03582-9A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compoundsA. Lina Heinzke0Barbara Zdrazil1Paul D. Leeson2Robert J. Young3Axel Pahl4Herbert Waldmann5Andrew R. Leach6European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome CampusEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome CampusPaul Leeson Consulting LtdBlue Burgundy LtdCompound Management and Screening Center, Max-Planck-Institute of Molecular PhysiologyDepartment of Chemical Biology, Max-Planck-Institute of Molecular PhysiologyEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome CampusAbstract Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.https://doi.org/10.1038/s41597-024-03582-9
spellingShingle A. Lina Heinzke
Barbara Zdrazil
Paul D. Leeson
Robert J. Young
Axel Pahl
Herbert Waldmann
Andrew R. Leach
A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
Scientific Data
title A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
title_full A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
title_fullStr A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
title_full_unstemmed A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
title_short A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
title_sort compound target pairs dataset differences between drugs clinical candidates and other bioactive compounds
url https://doi.org/10.1038/s41597-024-03582-9
work_keys_str_mv AT alinaheinzke acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT barbarazdrazil acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT pauldleeson acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT robertjyoung acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT axelpahl acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT herbertwaldmann acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT andrewrleach acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT alinaheinzke compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT barbarazdrazil compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT pauldleeson compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT robertjyoung compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT axelpahl compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT herbertwaldmann compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds
AT andrewrleach compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds