E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
Abstract Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model developm...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-04-01
|
| Series: | Journal of Cheminformatics |
| Online Access: | https://doi.org/10.1186/s13321-025-01014-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850206535948435456 |
|---|---|
| author | Vincenzo Palmacci Yasmine Nahal Matthias Welsch Ola Engkvist Samuel Kaski Johannes Kirchmair |
| author_facet | Vincenzo Palmacci Yasmine Nahal Matthias Welsch Ola Engkvist Samuel Kaski Johannes Kirchmair |
| author_sort | Vincenzo Palmacci |
| collection | DOAJ |
| description | Abstract Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery. Scientific contribution We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling. |
| format | Article |
| id | doaj-art-473afb0f10d6446a908bf4ecd63e845c |
| institution | OA Journals |
| issn | 1758-2946 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | BMC |
| record_format | Article |
| series | Journal of Cheminformatics |
| spelling | doaj-art-473afb0f10d6446a908bf4ecd63e845c2025-08-20T02:10:49ZengBMCJournal of Cheminformatics1758-29462025-04-0117111510.1186/s13321-025-01014-3E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assaysVincenzo Palmacci0Yasmine Nahal1Matthias Welsch2Ola Engkvist3Samuel Kaski4Johannes Kirchmair5Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of ViennaDepartment of Computer Science, Aalto UniversityDepartment of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of ViennaMolecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZenecaDepartment of Computer Science, Aalto UniversityDepartment of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of ViennaAbstract Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery. Scientific contribution We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling.https://doi.org/10.1186/s13321-025-01014-3 |
| spellingShingle | Vincenzo Palmacci Yasmine Nahal Matthias Welsch Ola Engkvist Samuel Kaski Johannes Kirchmair E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays Journal of Cheminformatics |
| title | E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays |
| title_full | E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays |
| title_fullStr | E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays |
| title_full_unstemmed | E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays |
| title_short | E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays |
| title_sort | e guard expert guided augmentation for the robust detection of compounds interfering with biological assays |
| url | https://doi.org/10.1186/s13321-025-01014-3 |
| work_keys_str_mv | AT vincenzopalmacci eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays AT yasminenahal eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays AT matthiaswelsch eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays AT olaengkvist eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays AT samuelkaski eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays AT johanneskirchmair eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays |