Accurate prediction of drug-protein interactions by maintaining the original topological relationships among embeddings

Abstract Background Learning-based methods have recently demonstrated strong potential in predicting drug-protein interactions (DPIs). However, existing approaches often fail to achieve accurate predictions on real-world imbalanced datasets while maintaining high generalizability and scalability, li...

Full description

Saved in:
Bibliographic Details
Main Authors: Yanfei Li, Xiran Chen, Shuqin Wang, Jinmao Wei
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Biology
Subjects:
Online Access:https://doi.org/10.1186/s12915-025-02338-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Learning-based methods have recently demonstrated strong potential in predicting drug-protein interactions (DPIs). However, existing approaches often fail to achieve accurate predictions on real-world imbalanced datasets while maintaining high generalizability and scalability, limiting their practical applicability. Results This study proposes a highly generalized model, GLDPI, aimed at improving prediction accuracy in imbalanced scenarios by preserving the topological relationships among initial molecular representations in the embedding space. Specifically, GLDPI employs dedicated encoders to transform one-dimensional sequence information of drugs and proteins into embedding representations and efficiently calculates the likelihood of DPIs using cosine similarity. Additionally, we introduce a prior loss function based on the guilt-by-association principle to ensure that the topology of the embedding space aligns with the structure of the initial drug-protein network. This design enables GLDPI to effectively capture network relationships and key features of molecular interactions, thereby significantly enhancing predictive performance. Conclusions Experimental results highlight GLDPI’s superior performance on multiple highly imbalanced benchmark datasets, achieving over a 100% improvement in the AUPR metric compared to state-of-the-art methods. Additionally, GLDPI demonstrates exceptional generalization capabilities in cold-start experiments, excelling in predicting novel drug-protein interactions. Furthermore, the model exhibits remarkable scalability, efficiently inferring approximately $$1.2 \times 10^{10}$$ 1.2 × 10 10 drug-protein pairs in less than 10 h.
ISSN:1741-7007