Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model
Abstract DNA storage has been widely considered as a promising alternative for exponentially growing data. However, the inherent complex secondary structures severely compromise the processes of synthesis, PCR amplification, and sequencing, interfering with reliable information recovery. In large-sc...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-05717-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849334990019493888 |
|---|---|
| author | Wanmin Lin Ling Chu Xiangyu Yao Zhihua Chen Peng Xu Wenbin Liu |
| author_facet | Wanmin Lin Ling Chu Xiangyu Yao Zhihua Chen Peng Xu Wenbin Liu |
| author_sort | Wanmin Lin |
| collection | DOAJ |
| description | Abstract DNA storage has been widely considered as a promising alternative for exponentially growing data. However, the inherent complex secondary structures severely compromise the processes of synthesis, PCR amplification, and sequencing, interfering with reliable information recovery. In large-scale storage applications, how to effectively circumvent the negative effects is a critical problem. As secondary structures are formed by contiguous bases with reversal complementary relations and accompanied by the released free energy, we construct a BiLSTM-Transformer model with k-mer embedding to predict the free energy of sequences and further screen out these sequences with high values. K-mer embedding can capture the characteristics of contiguous base pairings through overlapping short subsequences, further facilitating free-energy prediction. Compared with other deep learning models, our simulation results demonstrate that BiLSTM-Transformer model with k-mer embedding has a better prediction performance. Application on a real dataset demonstrates that the proposed model can screen out those top high-risk sequences which are prone to more read errors and fewer retrieved copy numbers in real DNA storage. The proposed screening method for top high-risk sequences can be a proactive step to prevent the occurrence of severe secondary structures, providing a solution for reliable information retrieval. |
| format | Article |
| id | doaj-art-3749f9be815a4707bae17a9f68b03947 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-3749f9be815a4707bae17a9f68b039472025-08-20T03:45:26ZengNature PortfolioScientific Reports2045-23222025-07-011511910.1038/s41598-025-05717-3Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning modelWanmin Lin0Ling Chu1Xiangyu Yao2Zhihua Chen3Peng Xu4Wenbin Liu5Institute of Computing Science and Technology, Guangzhou UniversityInstitute of Computing Science and Technology, Guangzhou UniversityInstitute of Computing Science and Technology, Guangzhou UniversityInstitute of Computing Science and Technology, Guangzhou UniversityInstitute of Computing Science and Technology, Guangzhou UniversityInstitute of Computing Science and Technology, Guangzhou UniversityAbstract DNA storage has been widely considered as a promising alternative for exponentially growing data. However, the inherent complex secondary structures severely compromise the processes of synthesis, PCR amplification, and sequencing, interfering with reliable information recovery. In large-scale storage applications, how to effectively circumvent the negative effects is a critical problem. As secondary structures are formed by contiguous bases with reversal complementary relations and accompanied by the released free energy, we construct a BiLSTM-Transformer model with k-mer embedding to predict the free energy of sequences and further screen out these sequences with high values. K-mer embedding can capture the characteristics of contiguous base pairings through overlapping short subsequences, further facilitating free-energy prediction. Compared with other deep learning models, our simulation results demonstrate that BiLSTM-Transformer model with k-mer embedding has a better prediction performance. Application on a real dataset demonstrates that the proposed model can screen out those top high-risk sequences which are prone to more read errors and fewer retrieved copy numbers in real DNA storage. The proposed screening method for top high-risk sequences can be a proactive step to prevent the occurrence of severe secondary structures, providing a solution for reliable information retrieval.https://doi.org/10.1038/s41598-025-05717-3DNA storageSecondary structureSynthesisSequencingDeep learning model |
| spellingShingle | Wanmin Lin Ling Chu Xiangyu Yao Zhihua Chen Peng Xu Wenbin Liu Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model Scientific Reports DNA storage Secondary structure Synthesis Sequencing Deep learning model |
| title | Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model |
| title_full | Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model |
| title_fullStr | Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model |
| title_full_unstemmed | Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model |
| title_short | Predict the degree of secondary structures of the encoding sequences in DNA storage by deep learning model |
| title_sort | predict the degree of secondary structures of the encoding sequences in dna storage by deep learning model |
| topic | DNA storage Secondary structure Synthesis Sequencing Deep learning model |
| url | https://doi.org/10.1038/s41598-025-05717-3 |
| work_keys_str_mv | AT wanminlin predictthedegreeofsecondarystructuresoftheencodingsequencesindnastoragebydeeplearningmodel AT lingchu predictthedegreeofsecondarystructuresoftheencodingsequencesindnastoragebydeeplearningmodel AT xiangyuyao predictthedegreeofsecondarystructuresoftheencodingsequencesindnastoragebydeeplearningmodel AT zhihuachen predictthedegreeofsecondarystructuresoftheencodingsequencesindnastoragebydeeplearningmodel AT pengxu predictthedegreeofsecondarystructuresoftheencodingsequencesindnastoragebydeeplearningmodel AT wenbinliu predictthedegreeofsecondarystructuresoftheencodingsequencesindnastoragebydeeplearningmodel |