Flowtigs: Safety in flow decompositions for assembly graphs

Summary: A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any poss...

Full description

Saved in:
Bibliographic Details
Main Authors: Francisco Sena, Eliel Ingervo, Shahbaz Khan, Andrey Prjibelski, Sebastian Schmidt, Alexandru Tomescu
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004224024337
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850121439327289344
author Francisco Sena
Eliel Ingervo
Shahbaz Khan
Andrey Prjibelski
Sebastian Schmidt
Alexandru Tomescu
author_facet Francisco Sena
Eliel Ingervo
Shahbaz Khan
Andrey Prjibelski
Sebastian Schmidt
Alexandru Tomescu
author_sort Francisco Sena
collection DOAJ
description Summary: A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an O(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.
format Article
id doaj-art-cfd96188a2914dc990baaee4d8e2fece
institution OA Journals
issn 2589-0042
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series iScience
spelling doaj-art-cfd96188a2914dc990baaee4d8e2fece2025-08-20T02:35:05ZengElsevieriScience2589-00422024-12-01271211120810.1016/j.isci.2024.111208Flowtigs: Safety in flow decompositions for assembly graphsFrancisco Sena0Eliel Ingervo1Shahbaz Khan2Andrey Prjibelski3Sebastian Schmidt4Alexandru Tomescu5University of Helsinki, Helsinki, Finland; Corresponding authorUniversity of Helsinki, Helsinki, FinlandIndian Institute of Technology Roorkee, Roorkee, IndiaUniversity of Helsinki, Helsinki, FinlandUniversity of Helsinki, Helsinki, FinlandUniversity of Helsinki, Helsinki, Finland; Corresponding authorSummary: A decomposition of a network flow is a set of weighted walks whose superposition equals the flow. In this article, we give a simple and linear-time-verifiable complete characterization (flowtigs) of walks that are safe in such general flow decompositions, i.e., that are subwalks of any possible flow decomposition. We provide an O(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the species abundances as flow values of the metagenomic assembly graph, we can model the possible assembly solutions as flow decompositions into weighted closed walks. On simulated data, compared to reporting unitigs or maximal safe walks based only on the graph structure, reporting flowtigs results in a notably more contiguous assembly. On real data, we frame flowtigs as a heuristic and provide an algorithm that is guided by this heuristic.http://www.sciencedirect.com/science/article/pii/S2589004224024337Microbial genomicsBiocomputational methodClassification of bioinformatical subjectGenomic analysis
spellingShingle Francisco Sena
Eliel Ingervo
Shahbaz Khan
Andrey Prjibelski
Sebastian Schmidt
Alexandru Tomescu
Flowtigs: Safety in flow decompositions for assembly graphs
iScience
Microbial genomics
Biocomputational method
Classification of bioinformatical subject
Genomic analysis
title Flowtigs: Safety in flow decompositions for assembly graphs
title_full Flowtigs: Safety in flow decompositions for assembly graphs
title_fullStr Flowtigs: Safety in flow decompositions for assembly graphs
title_full_unstemmed Flowtigs: Safety in flow decompositions for assembly graphs
title_short Flowtigs: Safety in flow decompositions for assembly graphs
title_sort flowtigs safety in flow decompositions for assembly graphs
topic Microbial genomics
Biocomputational method
Classification of bioinformatical subject
Genomic analysis
url http://www.sciencedirect.com/science/article/pii/S2589004224024337
work_keys_str_mv AT franciscosena flowtigssafetyinflowdecompositionsforassemblygraphs
AT elielingervo flowtigssafetyinflowdecompositionsforassemblygraphs
AT shahbazkhan flowtigssafetyinflowdecompositionsforassemblygraphs
AT andreyprjibelski flowtigssafetyinflowdecompositionsforassemblygraphs
AT sebastianschmidt flowtigssafetyinflowdecompositionsforassemblygraphs
AT alexandrutomescu flowtigssafetyinflowdecompositionsforassemblygraphs