Flowtigs: Safety in flow decompositions for assembly graphs

Summary: A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any poss...

Full description

Saved in:
Bibliographic Details
Main Authors: Francisco Sena, Eliel Ingervo, Shahbaz Khan, Andrey Prjibelski, Sebastian Schmidt, Alexandru Tomescu
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004224024337
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary: A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an O(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.
ISSN:2589-0042