A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds
Abstract Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-10-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-024-03582-9 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850203977863397376 |
|---|---|
| author | A. Lina Heinzke Barbara Zdrazil Paul D. Leeson Robert J. Young Axel Pahl Herbert Waldmann Andrew R. Leach |
| author_facet | A. Lina Heinzke Barbara Zdrazil Paul D. Leeson Robert J. Young Axel Pahl Herbert Waldmann Andrew R. Leach |
| author_sort | A. Lina Heinzke |
| collection | DOAJ |
| description | Abstract Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases. |
| format | Article |
| id | doaj-art-6b7b2abb2a09429392f3cc739a7e6c8c |
| institution | OA Journals |
| issn | 2052-4463 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-6b7b2abb2a09429392f3cc739a7e6c8c2025-08-20T02:11:23ZengNature PortfolioScientific Data2052-44632024-10-011111910.1038/s41597-024-03582-9A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compoundsA. Lina Heinzke0Barbara Zdrazil1Paul D. Leeson2Robert J. Young3Axel Pahl4Herbert Waldmann5Andrew R. Leach6European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome CampusEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome CampusPaul Leeson Consulting LtdBlue Burgundy LtdCompound Management and Screening Center, Max-Planck-Institute of Molecular PhysiologyDepartment of Chemical Biology, Max-Planck-Institute of Molecular PhysiologyEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome CampusAbstract Providing a better understanding of what makes a compound a successful drug candidate is crucial for reducing the high attrition rates in drug discovery. Analyses of the differences between active compounds, clinical candidates and drugs require high-quality datasets. However, most datasets of drug discovery programs are not openly available. This work introduces a dataset of compound-target pairs extracted from the open-source bioactivity database ChEMBL (release 32). Compound-target pairs in the dataset either have at least one measured activity or are part of the manually curated set of known interactions in ChEMBL. Known interactions between drugs or clinical candidates and targets are specifically annotated to facilitate analyses of differences between drugs, clinical candidates, and other active compounds. In total, the dataset comprises 614,594 compound-target pairs, 5,109 (3,932) of which are known interactions between drugs (clinical candidates) and targets. The extraction is performed in an automated manner and fully reproducible. We are providing not only the datasets but also the code to rerun the analyses with other ChEMBL releases.https://doi.org/10.1038/s41597-024-03582-9 |
| spellingShingle | A. Lina Heinzke Barbara Zdrazil Paul D. Leeson Robert J. Young Axel Pahl Herbert Waldmann Andrew R. Leach A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds Scientific Data |
| title | A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds |
| title_full | A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds |
| title_fullStr | A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds |
| title_full_unstemmed | A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds |
| title_short | A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds |
| title_sort | compound target pairs dataset differences between drugs clinical candidates and other bioactive compounds |
| url | https://doi.org/10.1038/s41597-024-03582-9 |
| work_keys_str_mv | AT alinaheinzke acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT barbarazdrazil acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT pauldleeson acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT robertjyoung acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT axelpahl acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT herbertwaldmann acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT andrewrleach acompoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT alinaheinzke compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT barbarazdrazil compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT pauldleeson compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT robertjyoung compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT axelpahl compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT herbertwaldmann compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds AT andrewrleach compoundtargetpairsdatasetdifferencesbetweendrugsclinicalcandidatesandotherbioactivecompounds |