iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.

Protein-DNA interactions play a crucial role in cellular biology, essential for maintaining life processes and regulating cellular functions. We propose a method called iProtDNA-SMOTE, which utilizes non-equilibrium graph neural networks along with pre-trained protein language models to predict DNA...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruiyan Huang, Wangren Qiu, Xuan Xiao, Weizhong Lin
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0320817
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849732815866822656
author Ruiyan Huang
Wangren Qiu
Xuan Xiao
Weizhong Lin
author_facet Ruiyan Huang
Wangren Qiu
Xuan Xiao
Weizhong Lin
author_sort Ruiyan Huang
collection DOAJ
description Protein-DNA interactions play a crucial role in cellular biology, essential for maintaining life processes and regulating cellular functions. We propose a method called iProtDNA-SMOTE, which utilizes non-equilibrium graph neural networks along with pre-trained protein language models to predict DNA binding residues. This approach effectively addresses the class imbalance issue in predicting protein-DNA binding sites by leveraging unbalanced graph data, thus enhancing model's generalization and specificity. We trained the model on two datasets, TR646 and TR573, and conducted a series of experiments to evaluate its performance. The model achieved AUC values of 0.850, 0.896, and 0.858 on the independent test datasets TE46, TE129, and TE181, respectively. These results indicate that iProtDNA-SMOTE outperforms existing methods in terms of accuracy and generalization for predicting DNA binding sites, offering reliable and effective predictions to minimize errors. The model has been thoroughly validated for its ability to predict protein-DNA binding sites with high reliability and precision. For the convenience of the scientific community, the benchmark datasets and codes are publicly available at https://github.com/primrosehry/iProtDNA-SMOTE.
format Article
id doaj-art-2ce4ae37841d4c39aef2fd9ba5acc08f
institution DOAJ
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-2ce4ae37841d4c39aef2fd9ba5acc08f2025-08-20T03:08:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032081710.1371/journal.pone.0320817iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.Ruiyan HuangWangren QiuXuan XiaoWeizhong LinProtein-DNA interactions play a crucial role in cellular biology, essential for maintaining life processes and regulating cellular functions. We propose a method called iProtDNA-SMOTE, which utilizes non-equilibrium graph neural networks along with pre-trained protein language models to predict DNA binding residues. This approach effectively addresses the class imbalance issue in predicting protein-DNA binding sites by leveraging unbalanced graph data, thus enhancing model's generalization and specificity. We trained the model on two datasets, TR646 and TR573, and conducted a series of experiments to evaluate its performance. The model achieved AUC values of 0.850, 0.896, and 0.858 on the independent test datasets TE46, TE129, and TE181, respectively. These results indicate that iProtDNA-SMOTE outperforms existing methods in terms of accuracy and generalization for predicting DNA binding sites, offering reliable and effective predictions to minimize errors. The model has been thoroughly validated for its ability to predict protein-DNA binding sites with high reliability and precision. For the convenience of the scientific community, the benchmark datasets and codes are publicly available at https://github.com/primrosehry/iProtDNA-SMOTE.https://doi.org/10.1371/journal.pone.0320817
spellingShingle Ruiyan Huang
Wangren Qiu
Xuan Xiao
Weizhong Lin
iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.
PLoS ONE
title iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.
title_full iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.
title_fullStr iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.
title_full_unstemmed iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.
title_short iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.
title_sort iprotdna smote enhancing protein dna binding sites prediction through imbalanced graph neural networks
url https://doi.org/10.1371/journal.pone.0320817
work_keys_str_mv AT ruiyanhuang iprotdnasmoteenhancingproteindnabindingsitespredictionthroughimbalancedgraphneuralnetworks
AT wangrenqiu iprotdnasmoteenhancingproteindnabindingsitespredictionthroughimbalancedgraphneuralnetworks
AT xuanxiao iprotdnasmoteenhancingproteindnabindingsitespredictionthroughimbalancedgraphneuralnetworks
AT weizhonglin iprotdnasmoteenhancingproteindnabindingsitespredictionthroughimbalancedgraphneuralnetworks