An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss
Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often h...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
PeerJ Inc.
2025-08-01
|
| Series: | PeerJ Computer Science |
| Subjects: | |
| Online Access: | https://peerj.com/articles/cs-3104.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849404324297310208 |
|---|---|
| author | Jing Sun Yangfan Huang Jiale Fu Li Teng Xiao Liu Xiaohua Luo |
| author_facet | Jing Sun Yangfan Huang Jiale Fu Li Teng Xiao Liu Xiaohua Luo |
| author_sort | Jing Sun |
| collection | DOAJ |
| description | Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases. |
| format | Article |
| id | doaj-art-3e89173832fc45c6b2e1267410bf9038 |
| institution | Kabale University |
| issn | 2376-5992 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | PeerJ Inc. |
| record_format | Article |
| series | PeerJ Computer Science |
| spelling | doaj-art-3e89173832fc45c6b2e1267410bf90382025-08-20T03:37:02ZengPeerJ Inc.PeerJ Computer Science2376-59922025-08-0111e310410.7717/peerj-cs.3104An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing lossJing Sun0Yangfan Huang1Jiale Fu2Li Teng3Xiao Liu4Xiaohua Luo5School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaSchool of Microelectronics and Communication Engineering, Chongqing University, Chongqing, ChinaDepartment of Otolaryngology Surgery, Chongqing University FuLing Hospital, Chongqing, ChinaPromoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases.https://peerj.com/articles/cs-3104.pdfPromoter predictionEnhanced BERTCNNBiLSTMHearing loss |
| spellingShingle | Jing Sun Yangfan Huang Jiale Fu Li Teng Xiao Liu Xiaohua Luo An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss PeerJ Computer Science Promoter prediction Enhanced BERT CNN BiLSTM Hearing loss |
| title | An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss |
| title_full | An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss |
| title_fullStr | An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss |
| title_full_unstemmed | An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss |
| title_short | An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss |
| title_sort | enhanced bert model with improved local feature extraction and long range dependency capture in promoter prediction for hearing loss |
| topic | Promoter prediction Enhanced BERT CNN BiLSTM Hearing loss |
| url | https://peerj.com/articles/cs-3104.pdf |
| work_keys_str_mv | AT jingsun anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT yangfanhuang anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT jialefu anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT liteng anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT xiaoliu anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT xiaohualuo anenhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT jingsun enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT yangfanhuang enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT jialefu enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT liteng enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT xiaoliu enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss AT xiaohualuo enhancedbertmodelwithimprovedlocalfeatureextractionandlongrangedependencycaptureinpromoterpredictionforhearingloss |