Design and structure of overlapping regions in PCA via deep learning

Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and constru...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Zheng, Xi-Chen Cui, Fei Guo, Ming-Liang Dou, Ze-Xiong Xie, Ying-Jin Yuan
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-06-01
Series:Synthetic and Systems Biotechnology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405805X24001595
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583869628416000
author Yan Zheng
Xi-Chen Cui
Fei Guo
Ming-Liang Dou
Ze-Xiong Xie
Ying-Jin Yuan
author_facet Yan Zheng
Xi-Chen Cui
Fei Guo
Ming-Liang Dou
Ze-Xiong Xie
Ying-Jin Yuan
author_sort Yan Zheng
collection DOAJ
description Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.
format Article
id doaj-art-dd310c70e8bd45079add7baab5274c43
institution Kabale University
issn 2405-805X
language English
publishDate 2025-06-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Synthetic and Systems Biotechnology
spelling doaj-art-dd310c70e8bd45079add7baab5274c432025-01-28T04:14:44ZengKeAi Communications Co., Ltd.Synthetic and Systems Biotechnology2405-805X2025-06-01102442451Design and structure of overlapping regions in PCA via deep learningYan Zheng0Xi-Chen Cui1Fei Guo2Ming-Liang Dou3Ze-Xiong Xie4Ying-Jin Yuan5Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China; School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, PR ChinaFrontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China; School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, PR ChinaFrontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China; School of Computer Science and Engineering, Central South University, Changsha, 410083, PR ChinaFrontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR ChinaFrontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China; School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, PR China; Corresponding author. Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China.Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China; School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, PR China; Corresponding author. Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, PR China.Polymerase cycling assembly (PCA) stands out as the predominant method in the synthesis of kilobase-length DNA fragments. The design of overlapping regions is the core factor affecting the success rate of synthesis. However, there still exists DNA sequences that are challenging to design and construct in the genome synthesis. Here we proposed a deep learning model based on extensive synthesis data to discern latent sequence representations in overlapping regions with an AUPR of 0.805. Utilizing the model, we developed the SmartCut algorithm aimed at designing oligonucleotides and enhancing the success rate of PCA experiments. This algorithm was successfully applied to sequences with diverse synthesis constraints, 80.4 % of which were synthesized in a single round. We further discovered structure differences represented by major groove width, stagger, slide, and centroid distance between overlapping and non-overlapping regions, which elucidated the model's reasonableness through the lens of physical chemistry. This comprehensive approach facilitates streamlined and efficient investigations into the genome synthesis.http://www.sciencedirect.com/science/article/pii/S2405805X24001595Synthetic biologyPCADeep learningMolecular dynamics
spellingShingle Yan Zheng
Xi-Chen Cui
Fei Guo
Ming-Liang Dou
Ze-Xiong Xie
Ying-Jin Yuan
Design and structure of overlapping regions in PCA via deep learning
Synthetic and Systems Biotechnology
Synthetic biology
PCA
Deep learning
Molecular dynamics
title Design and structure of overlapping regions in PCA via deep learning
title_full Design and structure of overlapping regions in PCA via deep learning
title_fullStr Design and structure of overlapping regions in PCA via deep learning
title_full_unstemmed Design and structure of overlapping regions in PCA via deep learning
title_short Design and structure of overlapping regions in PCA via deep learning
title_sort design and structure of overlapping regions in pca via deep learning
topic Synthetic biology
PCA
Deep learning
Molecular dynamics
url http://www.sciencedirect.com/science/article/pii/S2405805X24001595
work_keys_str_mv AT yanzheng designandstructureofoverlappingregionsinpcaviadeeplearning
AT xichencui designandstructureofoverlappingregionsinpcaviadeeplearning
AT feiguo designandstructureofoverlappingregionsinpcaviadeeplearning
AT mingliangdou designandstructureofoverlappingregionsinpcaviadeeplearning
AT zexiongxie designandstructureofoverlappingregionsinpcaviadeeplearning
AT yingjinyuan designandstructureofoverlappingregionsinpcaviadeeplearning