GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity

<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robus...

Full description

Saved in:
Bibliographic Details
Main Authors: Somanath Dandibhotla, Madhav Samudrala, Arjun Kaneriya, Sivanesan Dakshanamurthy
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Pharmaceuticals
Subjects:
Online Access:https://www.mdpi.com/1424-8247/18/3/329
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849341388133498880
author Somanath Dandibhotla
Madhav Samudrala
Arjun Kaneriya
Sivanesan Dakshanamurthy
author_facet Somanath Dandibhotla
Madhav Samudrala
Arjun Kaneriya
Sivanesan Dakshanamurthy
author_sort Somanath Dandibhotla
collection DOAJ
description <b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. <b>Methods:</b> GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. <b>Results:</b> In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. <b>Conclusions:</b> GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format.
format Article
id doaj-art-d29ac2a1468a4719a8cd45c2ef3c8532
institution Kabale University
issn 1424-8247
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Pharmaceuticals
spelling doaj-art-d29ac2a1468a4719a8cd45c2ef3c85322025-08-20T03:43:37ZengMDPI AGPharmaceuticals1424-82472025-02-0118332910.3390/ph18030329GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding AffinitySomanath Dandibhotla0Madhav Samudrala1Arjun Kaneriya2Sivanesan Dakshanamurthy3Department of Computer Science, College of Engineering and Computing, George Mason University, Fairfax, VA 22030, USADepartment of Statistics, College of Arts and Sciences, The University of Virginia, Charlottesville, VA 22903, USADepartment of Computer Science, School of Computing, Data Sciences & Physics, College of William and Mary, Williamsburg, VA 23185, USADepartment of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20007, USA<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. <b>Methods:</b> GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. <b>Results:</b> In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. <b>Conclusions:</b> GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format.https://www.mdpi.com/1424-8247/18/3/329protein–ligand binding affinitymachine learninggraph neural networksequence-based protein–ligand affinity prediction
spellingShingle Somanath Dandibhotla
Madhav Samudrala
Arjun Kaneriya
Sivanesan Dakshanamurthy
GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
Pharmaceuticals
protein–ligand binding affinity
machine learning
graph neural network
sequence-based protein–ligand affinity prediction
title GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_full GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_fullStr GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_full_unstemmed GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_short GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_sort gnnseq a sequence based graph neural network for predicting protein ligand binding affinity
topic protein–ligand binding affinity
machine learning
graph neural network
sequence-based protein–ligand affinity prediction
url https://www.mdpi.com/1424-8247/18/3/329
work_keys_str_mv AT somanathdandibhotla gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity
AT madhavsamudrala gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity
AT arjunkaneriya gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity
AT sivanesandakshanamurthy gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity