GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity

<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robus...

Full description

Saved in:

Bibliographic Details
Main Authors:	Somanath Dandibhotla, Madhav Samudrala, Arjun Kaneriya, Sivanesan Dakshanamurthy
Format:	Article
Language:	English
Published:	MDPI AG 2025-02-01
Series:	Pharmaceuticals
Subjects:	protein–ligand binding affinity machine learning graph neural network sequence-based protein–ligand affinity prediction
Online Access:	https://www.mdpi.com/1424-8247/18/3/329
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849341388133498880
author	Somanath Dandibhotla Madhav Samudrala Arjun Kaneriya Sivanesan Dakshanamurthy
author_facet	Somanath Dandibhotla Madhav Samudrala Arjun Kaneriya Sivanesan Dakshanamurthy
author_sort	Somanath Dandibhotla
collection	DOAJ
description	<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. <b>Methods:</b> GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. <b>Results:</b> In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. <b>Conclusions:</b> GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format.
format	Article
id	doaj-art-d29ac2a1468a4719a8cd45c2ef3c8532
institution	Kabale University
issn	1424-8247
language	English
publishDate	2025-02-01
publisher	MDPI AG
record_format	Article
series	Pharmaceuticals
spelling	doaj-art-d29ac2a1468a4719a8cd45c2ef3c85322025-08-20T03:43:37ZengMDPI AGPharmaceuticals1424-82472025-02-0118332910.3390/ph18030329GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding AffinitySomanath Dandibhotla0Madhav Samudrala1Arjun Kaneriya2Sivanesan Dakshanamurthy3Department of Computer Science, College of Engineering and Computing, George Mason University, Fairfax, VA 22030, USADepartment of Statistics, College of Arts and Sciences, The University of Virginia, Charlottesville, VA 22903, USADepartment of Computer Science, School of Computing, Data Sciences & Physics, College of William and Mary, Williamsburg, VA 23185, USADepartment of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20007, USA<b>Background/Objectives:</b> Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. <b>Methods:</b> GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. <b>Results:</b> In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. <b>Conclusions:</b> GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format.https://www.mdpi.com/1424-8247/18/3/329protein–ligand binding affinitymachine learninggraph neural networksequence-based protein–ligand affinity prediction
spellingShingle	Somanath Dandibhotla Madhav Samudrala Arjun Kaneriya Sivanesan Dakshanamurthy GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity Pharmaceuticals protein–ligand binding affinity machine learning graph neural network sequence-based protein–ligand affinity prediction
title	GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_full	GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_fullStr	GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_full_unstemmed	GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_short	GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity
title_sort	gnnseq a sequence based graph neural network for predicting protein ligand binding affinity
topic	protein–ligand binding affinity machine learning graph neural network sequence-based protein–ligand affinity prediction
url	https://www.mdpi.com/1424-8247/18/3/329
work_keys_str_mv	AT somanathdandibhotla gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity AT madhavsamudrala gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity AT arjunkaneriya gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity AT sivanesandakshanamurthy gnnseqasequencebasedgraphneuralnetworkforpredictingproteinligandbindingaffinity

GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity

Similar Items