AI-driven prediction of drug activity against Toxoplasma gondii: Data augmentation and deep neural networks for limited datasets

Toxoplasmosis, caused by Toxoplasma gondii (T. gondii), is a serious global health concern, particularly in immunocompromised individuals. Inhibiting the enzyme TgDHFR is a promising strategy for developing treatments. This Artificial Intelligence (AI)-driven Quantitative Structure-Activity Relation...

Full description

Saved in:
Bibliographic Details
Main Authors: Natalia V. Karimova, Ravithree D. Senanayake
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Artificial Intelligence Chemistry
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949747725000016
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Toxoplasmosis, caused by Toxoplasma gondii (T. gondii), is a serious global health concern, particularly in immunocompromised individuals. Inhibiting the enzyme TgDHFR is a promising strategy for developing treatments. This Artificial Intelligence (AI)-driven Quantitative Structure-Activity Relationship (QSAR) study applies deep neural networks (DNNs) to predict pIC50 values for potential inhibitors, using 2D and 3D molecular descriptors and fingerprints. To address training data limitations, we introduced a novel methodology combining targeted descriptor selection, Gaussian noise-based data augmentation, and an ensemble of DNNs. This approach significantly enhanced model performance, increasing the R² from 0.75 with the original dataset to 0.85. The model was further validated using two FDA-approved drugs for T. gondii treatment—pyrimethamine and trimethoprim—yielding relative errors of 3.35 % and 2.15 % in pIC50 predictions compared to experimental values. Finally, the model was applied to screen FDA-approved drugs after filtering out molecules that did not align with the characteristics of the training dataset. The predicted pIC50 values were further used to calculate ligand efficiency (LE), binding efficiency index (BEI), lipophilic ligand efficiency (LLE), and surface efficiency index (SEI), identifying the most promising TgDHFR inhibitors for further investigation. By leveraging AI and data augmentation approach, this study provides a powerful tool for pIC50 predictions of TgDHFR inhibitors, which can be adapted to other systems.
ISSN:2949-7477