CacPred: a cascaded convolutional neural network for TF-DNA binding prediction

Abstract Background Transcription factors (TFs) regulate the genes’ expression by binding to DNA sequences. Aligned TFBSs of the same TF are seen as cis-regulatory motifs, and substantial computational efforts have been invested to find motifs. In recent years, convolutional neural networks (CNNs) h...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuangquan Zhang, Anjun Ma, Xuping Xie, Zhichao Lian, Yan Wang
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-025-11399-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849390475883053056
author Shuangquan Zhang
Anjun Ma
Xuping Xie
Zhichao Lian
Yan Wang
author_facet Shuangquan Zhang
Anjun Ma
Xuping Xie
Zhichao Lian
Yan Wang
author_sort Shuangquan Zhang
collection DOAJ
description Abstract Background Transcription factors (TFs) regulate the genes’ expression by binding to DNA sequences. Aligned TFBSs of the same TF are seen as cis-regulatory motifs, and substantial computational efforts have been invested to find motifs. In recent years, convolutional neural networks (CNNs) have succeeded in TF-DNA binding prediction, but existing DL methods’ accuracy needs to be improved and convolution function in TF-DNA binding prediction should be further explored. Results We develop a cascaded convolutional neural network model named CacPred to predict TF-DNA binding on 790 Chromatin immunoprecipitation-sequencing (ChIP-seq) datasets and seven ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode, and single ligation) datasets. We compare CacPred to six existing DL models across nine standard evaluation metrics. Our results indicate that CacPred outperforms all comparison models for TF-DNA binding prediction, and the average accuracy (ACC), matthews correlation coefficient (MCC), and the area of eight metrics radar (AEMR) are improved by 3.3%, 9.2%, and 6.4% on 790 ChIP-seq datasets. Meanwhile, CacPred improves the average ACC, MCC, and AEMR of 5.5%, 16.8%, and 12.9% on seven ChIP-nexus datasets. To explain the proposed method, motifs are used to show features CacPred learned. In light of the results, CacPred can find some significant motifs from input sequences. Conclusions This paper indicates that CacPred performs better than existing models on ChIP-seq data. Seven ChIP-nexus datasets are also analyzed, and they coincide with results that our proposed method performs the best on ChIP-seq data. CacPred only is equipped with the convolutional algorithm, demonstrating that pooling processing of the existing models leads to losing some sequence information. Some significant motifs are found, showing that CacPred can learn features from input sequences. In this study, we demonstrate that CacPred is an effective and feasible model for predicting TF-DNA binding. CacPred is freely available at https://github.com/zhangsq06/CacPred .
format Article
id doaj-art-ae467f6ec38d428fb65ffe32d330f3b2
institution Kabale University
issn 1471-2164
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj-art-ae467f6ec38d428fb65ffe32d330f3b22025-08-20T03:41:39ZengBMCBMC Genomics1471-21642025-03-0126S211110.1186/s12864-025-11399-yCacPred: a cascaded convolutional neural network for TF-DNA binding predictionShuangquan Zhang0Anjun Ma1Xuping Xie2Zhichao Lian3Yan Wang4School of Cyber Science and Engineering, Nanjing University of Science and TechnologyDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityKey Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin UniversitySchool of Cyber Science and Engineering, Nanjing University of Science and TechnologyKey Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin UniversityAbstract Background Transcription factors (TFs) regulate the genes’ expression by binding to DNA sequences. Aligned TFBSs of the same TF are seen as cis-regulatory motifs, and substantial computational efforts have been invested to find motifs. In recent years, convolutional neural networks (CNNs) have succeeded in TF-DNA binding prediction, but existing DL methods’ accuracy needs to be improved and convolution function in TF-DNA binding prediction should be further explored. Results We develop a cascaded convolutional neural network model named CacPred to predict TF-DNA binding on 790 Chromatin immunoprecipitation-sequencing (ChIP-seq) datasets and seven ChIP-nexus (chromatin immunoprecipitation experiments with nucleotide resolution through exonuclease, unique barcode, and single ligation) datasets. We compare CacPred to six existing DL models across nine standard evaluation metrics. Our results indicate that CacPred outperforms all comparison models for TF-DNA binding prediction, and the average accuracy (ACC), matthews correlation coefficient (MCC), and the area of eight metrics radar (AEMR) are improved by 3.3%, 9.2%, and 6.4% on 790 ChIP-seq datasets. Meanwhile, CacPred improves the average ACC, MCC, and AEMR of 5.5%, 16.8%, and 12.9% on seven ChIP-nexus datasets. To explain the proposed method, motifs are used to show features CacPred learned. In light of the results, CacPred can find some significant motifs from input sequences. Conclusions This paper indicates that CacPred performs better than existing models on ChIP-seq data. Seven ChIP-nexus datasets are also analyzed, and they coincide with results that our proposed method performs the best on ChIP-seq data. CacPred only is equipped with the convolutional algorithm, demonstrating that pooling processing of the existing models leads to losing some sequence information. Some significant motifs are found, showing that CacPred can learn features from input sequences. In this study, we demonstrate that CacPred is an effective and feasible model for predicting TF-DNA binding. CacPred is freely available at https://github.com/zhangsq06/CacPred .https://doi.org/10.1186/s12864-025-11399-yTranscription factorChIP-seqDeep learningTF-DNA binding prediction
spellingShingle Shuangquan Zhang
Anjun Ma
Xuping Xie
Zhichao Lian
Yan Wang
CacPred: a cascaded convolutional neural network for TF-DNA binding prediction
BMC Genomics
Transcription factor
ChIP-seq
Deep learning
TF-DNA binding prediction
title CacPred: a cascaded convolutional neural network for TF-DNA binding prediction
title_full CacPred: a cascaded convolutional neural network for TF-DNA binding prediction
title_fullStr CacPred: a cascaded convolutional neural network for TF-DNA binding prediction
title_full_unstemmed CacPred: a cascaded convolutional neural network for TF-DNA binding prediction
title_short CacPred: a cascaded convolutional neural network for TF-DNA binding prediction
title_sort cacpred a cascaded convolutional neural network for tf dna binding prediction
topic Transcription factor
ChIP-seq
Deep learning
TF-DNA binding prediction
url https://doi.org/10.1186/s12864-025-11399-y
work_keys_str_mv AT shuangquanzhang cacpredacascadedconvolutionalneuralnetworkfortfdnabindingprediction
AT anjunma cacpredacascadedconvolutionalneuralnetworkfortfdnabindingprediction
AT xupingxie cacpredacascadedconvolutionalneuralnetworkfortfdnabindingprediction
AT zhichaolian cacpredacascadedconvolutionalneuralnetworkfortfdnabindingprediction
AT yanwang cacpredacascadedconvolutionalneuralnetworkfortfdnabindingprediction