An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss

Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often h...

Full description

Saved in:
Bibliographic Details
Main Authors: Jing Sun, Yangfan Huang, Jiale Fu, Li Teng, Xiao Liu, Xiaohua Luo
Format: Article
Language:English
Published: PeerJ Inc. 2025-08-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-3104.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849404324297310208
author Jing Sun
Yangfan Huang
Jiale Fu
Li Teng
Xiao Liu
Xiaohua Luo
author_facet Jing Sun
Yangfan Huang
Jiale Fu
Li Teng
Xiao Liu
Xiaohua Luo
author_sort Jing Sun
collection DOAJ
description Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases.
format Article
id doaj-art-3e89173832fc45c6b2e1267410bf9038
institution Kabale University
issn 2376-5992
language English
publishDate 2025-08-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-3e89173832fc45c6b2e1267410bf90382025-08-20T03:37:02ZengPeerJ Inc.PeerJ Computer Science2376-59922025-08-0111e310410.7717/peerj-cs.3104An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing lossJing Sun0Yangfan Huang1Jiale Fu2Li Teng3Xiao Liu4Xiaohua Luo5School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaDepartment of Otolaryngology Surgery, Chongqing University FuLing Hospital, Chongqing, ChinaPromoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases.https://peerj.com/articles/cs-3104.pdfPromoter predictionEnhanced BERTCNNBiLSTMHearing loss
spellingShingle Jing Sun
Yangfan Huang
Jiale Fu
Li Teng
Xiao Liu
Xiaohua Luo
An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
PeerJ Computer Science
Promoter prediction
Enhanced BERT
CNN
BiLSTM
Hearing loss
title An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
title_full An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
title_fullStr An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
title_full_unstemmed An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
title_short An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
title_sort enhanced bert model with improved local feature extraction and long range dependency capture in promoter prediction for hearing loss
topic Promoter prediction
Enhanced BERT
CNN
BiLSTM
Hearing loss
url https://peerj.com/articles/cs-3104.pdf
work_keys_str_mv AT jingsun anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT yangfanhuang anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT jialefu anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT liteng anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT xiaoliu anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT xiaohualuo anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT jingsun enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT yangfanhuang enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT jialefu enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT liteng enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT xiaoliu enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss
AT xiaohualuo enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss