Anchorage accurately assembles anchor-flanked synthetic long reads

Abstract Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaofei Carl Zang, Xiang Li, Kyle Metcalfe, Tuval Ben-Yehezkel, Ryan Kelley, Mingfu Shao
Format: Article
Language:English
Published: BMC 2025-07-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-025-00288-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849333770166992896
author Xiaofei Carl Zang
Xiang Li
Kyle Metcalfe
Tuval Ben-Yehezkel
Ryan Kelley
Mingfu Shao
author_facet Xiaofei Carl Zang
Xiang Li
Kyle Metcalfe
Tuval Ben-Yehezkel
Ryan Kelley
Mingfu Shao
author_sort Xiaofei Carl Zang
collection DOAJ
description Abstract Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol. LoopSeq Solo also achieves ultra-high sequencing depth and high purity of short reads covering the entire captured molecule. Despite the availability of many assembly methods, constructing full-length sequence from these anchor-enabled, ultra-high coverage sequencing data remains challenging due to the complexity of the underlying assembly graphs and the lack of specific algorithms leveraging anchors. We present Anchorage, a novel assembler that performs anchor-guided assembly for ultra-high-depth sequencing data. Anchorage starts with a kmer-based approach for precise estimation of molecule lengths. It then formulates the assembly problem as finding an optimal path that connects the two nodes determined by anchors in the underlying compact de Bruijn graph. The optimality is defined as maximizing the weight of the smallest node while matching the estimated sequence length. Anchorage uses a modified dynamic programming algorithm to efficiently find the optimal path. Through both simulations and real data, we show that Anchorage outperforms existing assembly methods, particularly in the presence of sequencing artifacts. Anchorage fills the gap in assembling anchor-enabled data. We anticipate its broad use as anchor-enabled sequencing technologies become prevalent. Anchorage is freely available at https://github.com/Shao-Group/anchorage ; the scripts and documents that can reproduce all experiments in this manuscript are available at https://github.com/Shao-Group/anchorage-test .
format Article
id doaj-art-4035aafaa9c04ed49d01802b16b9d236
institution Kabale University
issn 1748-7188
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj-art-4035aafaa9c04ed49d01802b16b9d2362025-08-20T03:45:45ZengBMCAlgorithms for Molecular Biology1748-71882025-07-0120111610.1186/s13015-025-00288-4Anchorage accurately assembles anchor-flanked synthetic long readsXiaofei Carl Zang0Xiang Li1Kyle Metcalfe2Tuval Ben-Yehezkel3Ryan Kelley4Mingfu Shao5Huck Institutes of the Life Sciences, The Pennsylvania State UniversityDepartment of Computer Science and Engineering, The Pennsylvania State UniversityElement BiosciencesElement BiosciencesElement BiosciencesHuck Institutes of the Life Sciences, The Pennsylvania State UniversityAbstract Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol. LoopSeq Solo also achieves ultra-high sequencing depth and high purity of short reads covering the entire captured molecule. Despite the availability of many assembly methods, constructing full-length sequence from these anchor-enabled, ultra-high coverage sequencing data remains challenging due to the complexity of the underlying assembly graphs and the lack of specific algorithms leveraging anchors. We present Anchorage, a novel assembler that performs anchor-guided assembly for ultra-high-depth sequencing data. Anchorage starts with a kmer-based approach for precise estimation of molecule lengths. It then formulates the assembly problem as finding an optimal path that connects the two nodes determined by anchors in the underlying compact de Bruijn graph. The optimality is defined as maximizing the weight of the smallest node while matching the estimated sequence length. Anchorage uses a modified dynamic programming algorithm to efficiently find the optimal path. Through both simulations and real data, we show that Anchorage outperforms existing assembly methods, particularly in the presence of sequencing artifacts. Anchorage fills the gap in assembling anchor-enabled data. We anticipate its broad use as anchor-enabled sequencing technologies become prevalent. Anchorage is freely available at https://github.com/Shao-Group/anchorage ; the scripts and documents that can reproduce all experiments in this manuscript are available at https://github.com/Shao-Group/anchorage-test .https://doi.org/10.1186/s13015-025-00288-4Genome assemblyDe Bruijn graphSynthetic long readsAnchor-guided assemblyLoopSeq
spellingShingle Xiaofei Carl Zang
Xiang Li
Kyle Metcalfe
Tuval Ben-Yehezkel
Ryan Kelley
Mingfu Shao
Anchorage accurately assembles anchor-flanked synthetic long reads
Algorithms for Molecular Biology
Genome assembly
De Bruijn graph
Synthetic long reads
Anchor-guided assembly
LoopSeq
title Anchorage accurately assembles anchor-flanked synthetic long reads
title_full Anchorage accurately assembles anchor-flanked synthetic long reads
title_fullStr Anchorage accurately assembles anchor-flanked synthetic long reads
title_full_unstemmed Anchorage accurately assembles anchor-flanked synthetic long reads
title_short Anchorage accurately assembles anchor-flanked synthetic long reads
title_sort anchorage accurately assembles anchor flanked synthetic long reads
topic Genome assembly
De Bruijn graph
Synthetic long reads
Anchor-guided assembly
LoopSeq
url https://doi.org/10.1186/s13015-025-00288-4
work_keys_str_mv AT xiaofeicarlzang anchorageaccuratelyassemblesanchorflankedsyntheticlongreads
AT xiangli anchorageaccuratelyassemblesanchorflankedsyntheticlongreads
AT kylemetcalfe anchorageaccuratelyassemblesanchorflankedsyntheticlongreads
AT tuvalbenyehezkel anchorageaccuratelyassemblesanchorflankedsyntheticlongreads
AT ryankelley anchorageaccuratelyassemblesanchorflankedsyntheticlongreads
AT mingfushao anchorageaccuratelyassemblesanchorflankedsyntheticlongreads