Accurate prediction of drug-protein interactions by maintaining the original topological relationships among embeddings
Abstract Background Learning-based methods have recently demonstrated strong potential in predicting drug-protein interactions (DPIs). However, existing approaches often fail to achieve accurate predictions on real-world imbalanced datasets while maintaining high generalizability and scalability, li...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-08-01
|
| Series: | BMC Biology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12915-025-02338-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Background Learning-based methods have recently demonstrated strong potential in predicting drug-protein interactions (DPIs). However, existing approaches often fail to achieve accurate predictions on real-world imbalanced datasets while maintaining high generalizability and scalability, limiting their practical applicability. Results This study proposes a highly generalized model, GLDPI, aimed at improving prediction accuracy in imbalanced scenarios by preserving the topological relationships among initial molecular representations in the embedding space. Specifically, GLDPI employs dedicated encoders to transform one-dimensional sequence information of drugs and proteins into embedding representations and efficiently calculates the likelihood of DPIs using cosine similarity. Additionally, we introduce a prior loss function based on the guilt-by-association principle to ensure that the topology of the embedding space aligns with the structure of the initial drug-protein network. This design enables GLDPI to effectively capture network relationships and key features of molecular interactions, thereby significantly enhancing predictive performance. Conclusions Experimental results highlight GLDPI’s superior performance on multiple highly imbalanced benchmark datasets, achieving over a 100% improvement in the AUPR metric compared to state-of-the-art methods. Additionally, GLDPI demonstrates exceptional generalization capabilities in cold-start experiments, excelling in predicting novel drug-protein interactions. Furthermore, the model exhibits remarkable scalability, efficiently inferring approximately $$1.2 \times 10^{10}$$ 1.2 × 10 10 drug-protein pairs in less than 10 h. |
|---|---|
| ISSN: | 1741-7007 |