Sama: a contig assembler with correctness guarantee

Abstract Background: In genome assembly the task is to reconstruct a genome based on sequencing reads. Current practical methods are based on heuristics which are hard to analyse and thus such analysis is not readily available. Results: We present a model for estimating the probability of misassembl...

Full description

Saved in:
Bibliographic Details
Main Author: Leena Salmela
Format: Article
Language:English
Published: BMC 2025-06-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-025-00280-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background: In genome assembly the task is to reconstruct a genome based on sequencing reads. Current practical methods are based on heuristics which are hard to analyse and thus such analysis is not readily available. Results: We present a model for estimating the probability of misassembly at each position of a de Bruijn graph based assembly. Unlike previous work, our model also takes into account missing data. We apply our model to produce contigs with correctness guarantee and correctness estimates for each position in the contigs. Conclusions: Our experiments show that when the coverage of k-mers is high enough, our method produces contigs with similar contiguity characteristics as state-of-the-art assemblers which are based on heuristic correction of the de Bruijn graph. Our model may have further applications in downstream analysis of contigs or in any analysis working directly on the de Bruijn graph.
ISSN:1748-7188