E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays

Abstract Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model developm...

Full description

Saved in:
Bibliographic Details
Main Authors: Vincenzo Palmacci, Yasmine Nahal, Matthias Welsch, Ola Engkvist, Samuel Kaski, Johannes Kirchmair
Format: Article
Language:English
Published: BMC 2025-04-01
Series:Journal of Cheminformatics
Online Access:https://doi.org/10.1186/s13321-025-01014-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850206535948435456
author Vincenzo Palmacci
Yasmine Nahal
Matthias Welsch
Ola Engkvist
Samuel Kaski
Johannes Kirchmair
author_facet Vincenzo Palmacci
Yasmine Nahal
Matthias Welsch
Ola Engkvist
Samuel Kaski
Johannes Kirchmair
author_sort Vincenzo Palmacci
collection DOAJ
description Abstract Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery. Scientific contribution We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling.
format Article
id doaj-art-473afb0f10d6446a908bf4ecd63e845c
institution OA Journals
issn 1758-2946
language English
publishDate 2025-04-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-473afb0f10d6446a908bf4ecd63e845c2025-08-20T02:10:49ZengBMCJournal of Cheminformatics1758-29462025-04-0117111510.1186/s13321-025-01014-3E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assaysVincenzo Palmacci0Yasmine Nahal1Matthias Welsch2Ola Engkvist3Samuel Kaski4Johannes Kirchmair5Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of ViennaDepartment of Computer Science, Aalto UniversityDepartment of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of ViennaMolecular AI, Discovery Sciences, BioPharmaceuticals R&D, AstraZenecaDepartment of Computer Science, Aalto UniversityDepartment of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of ViennaAbstract Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery. Scientific contribution We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling.https://doi.org/10.1186/s13321-025-01014-3
spellingShingle Vincenzo Palmacci
Yasmine Nahal
Matthias Welsch
Ola Engkvist
Samuel Kaski
Johannes Kirchmair
E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
Journal of Cheminformatics
title E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
title_full E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
title_fullStr E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
title_full_unstemmed E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
title_short E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays
title_sort e guard expert guided augmentation for the robust detection of compounds interfering with biological assays
url https://doi.org/10.1186/s13321-025-01014-3
work_keys_str_mv AT vincenzopalmacci eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays
AT yasminenahal eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays
AT matthiaswelsch eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays
AT olaengkvist eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays
AT samuelkaski eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays
AT johanneskirchmair eguardexpertguidedaugmentationfortherobustdetectionofcompoundsinterferingwithbiologicalassays