HairSplitter: haplotype assembly from long, noisy reads

Motivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We intro...

Full description

Saved in:
Bibliographic Details
Main Authors: Faure, Roland, Lavenier, Dominique, Flot, Jean-François
Format: Article
Language:English
Published: Peer Community In 2024-10-01
Series:Peer Community Journal
Subjects:
Online Access:https://peercommunityjournal.org/articles/10.24072/pcjournal.481/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206429412229120
author Faure, Roland
Lavenier, Dominique
Flot, Jean-François
author_facet Faure, Roland
Lavenier, Dominique
Flot, Jean-François
author_sort Faure, Roland
collection DOAJ
description Motivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria. Availability: HairSplitter is freely available on GitHub at https://github.com/RolandFaure/Hairsplitter (https://doi.org/10.5281/zenodo.13753481).
format Article
id doaj-art-b81594132c7f42f18654488bc853851e
institution Kabale University
issn 2804-3871
language English
publishDate 2024-10-01
publisher Peer Community In
record_format Article
series Peer Community Journal
spelling doaj-art-b81594132c7f42f18654488bc853851e2025-02-07T10:17:17ZengPeer Community InPeer Community Journal2804-38712024-10-01410.24072/pcjournal.48110.24072/pcjournal.481HairSplitter: haplotype assembly from long, noisy reads Faure, Roland0https://orcid.org/0000-0003-2245-4284Lavenier, Dominique1Flot, Jean-François2https://orcid.org/0000-0003-4091-7916Univ. Rennes, INRIA RBA, CNRS UMR 6074, Rennes, France; Service Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Brussels, BelgiumUniv. Rennes, INRIA RBA, CNRS UMR 6074, Rennes, FranceService Evolution Biologique et Ecologie, Université libre de Bruxelles (ULB), Brussels, Belgium; Interuniversity Institute of Bioinformatics in Brussels -- (IB)2, Brussels, BelgiumMotivation: Long-read assemblers face challenges in discerning closely related viral or bacterial strains, often collapsing similar strains into a single sequence. This limitation has been hampering metagenome analysis, as diverse strains may harbor crucial functional distinctions. Results: We introduce a novel software, HairSplitter, designed to retrieve strains from a partially or totally collapsed assembly and long reads. The method uses a custom variant-calling process to operate with erroneous long reads and introduces a new read binning algorithm to recover an a priori unknown number of strains. On noisy long reads, HairSplitter recovers more strains while being faster than state-of-the-art tools, both in the cases of viruses and bacteria. Availability: HairSplitter is freely available on GitHub at https://github.com/RolandFaure/Hairsplitter (https://doi.org/10.5281/zenodo.13753481).https://peercommunityjournal.org/articles/10.24072/pcjournal.481/Metagenomes, Metaviromes, Haplotyping, Genome assembly, Strain separation
spellingShingle Faure, Roland
Lavenier, Dominique
Flot, Jean-François
HairSplitter: haplotype assembly from long, noisy reads
Peer Community Journal
Metagenomes, Metaviromes, Haplotyping, Genome assembly, Strain separation
title HairSplitter: haplotype assembly from long, noisy reads
title_full HairSplitter: haplotype assembly from long, noisy reads
title_fullStr HairSplitter: haplotype assembly from long, noisy reads
title_full_unstemmed HairSplitter: haplotype assembly from long, noisy reads
title_short HairSplitter: haplotype assembly from long, noisy reads
title_sort hairsplitter haplotype assembly from long noisy reads
topic Metagenomes, Metaviromes, Haplotyping, Genome assembly, Strain separation
url https://peercommunityjournal.org/articles/10.24072/pcjournal.481/
work_keys_str_mv AT faureroland hairsplitterhaplotypeassemblyfromlongnoisyreads
AT lavenierdominique hairsplitterhaplotypeassemblyfromlongnoisyreads
AT flotjeanfrancois hairsplitterhaplotypeassemblyfromlongnoisyreads