GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robus...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Pharmaceuticals |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8247/18/3/329 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849341388133498880 |
|---|---|
| author | Somanath Dandibhotla Madhav Samudrala Arjun Kaneriya Sivanesan Dakshanamurthy |
| author_facet | Somanath Dandibhotla Madhav Samudrala Arjun Kaneriya Sivanesan Dakshanamurthy |
| author_sort | Somanath Dandibhotla |
| collection | DOAJ |
| description | <b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. <b>Methods:</b> GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. <b>Results:</b> In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. <b>Conclusions:</b> GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format. |
| format | Article |
| id | doaj-art-d29ac2a1468a4719a8cd45c2ef3c8532 |
| institution | Kabale University |
| issn | 1424-8247 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Pharmaceuticals |
| spelling | doaj-art-d29ac2a1468a4719a8cd45c2ef3c85322025-08-20T03:43:37ZengMDPI AGPharmaceuticals1424-82472025-02-0118332910.3390/ph18030329GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding AffinitySomanath Dandibhotla0Madhav Samudrala1Arjun Kaneriya2Sivanesan Dakshanamurthy3Department of Computer Science, College of Engineering and Computing, George Mason University, Fairfax, VA 22030, USADepartment of Statistics, College of Arts and Sciences, The University of Virginia, Charlottesville, VA 22903, USADepartment of Computer Science, School of Computing, Data Sciences & Physics, College of William and Mary, Williamsburg, VA 23185, USADepartment of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20007, USA<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. <b>Methods:</b> GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. <b>Results:</b> In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. <b>Conclusions:</b> GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format.https://www.mdpi.com/1424-8247/18/3/329protein–ligand binding affinitymachine learninggraph neural networksequence-based protein–ligand affinity prediction |
| spellingShingle | Somanath Dandibhotla Madhav Samudrala Arjun Kaneriya Sivanesan Dakshanamurthy GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity Pharmaceuticals protein–ligand binding affinity machine learning graph neural network sequence-based protein–ligand affinity prediction |
| title | GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity |
| title_full | GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity |
| title_fullStr | GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity |
| title_full_unstemmed | GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity |
| title_short | GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity |
| title_sort | gnnseq a sequence based graph neural network for predicting protein ligand binding affinity |
| topic | protein–ligand binding affinity machine learning graph neural network sequence-based protein–ligand affinity prediction |
| url | https://www.mdpi.com/1424-8247/18/3/329 |
| work_keys_str_mv | AT somanathdandibhotla gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity AT madhavsamudrala gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity AT arjunkaneriya gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity AT sivanesandakshanamurthy gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity |