The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance

Summary: Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific seq...

Full description

Saved in:
Bibliographic Details
Main Authors: Victor Levallois, Francesco Andreace, Bertrand Le Gal, Yoann Dufresne, Pierre Peterlongo
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004224026609
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846113137106878464
author Victor Levallois
Francesco Andreace
Bertrand Le Gal
Yoann Dufresne
Pierre Peterlongo
author_facet Victor Levallois
Francesco Andreace
Bertrand Le Gal
Yoann Dufresne
Pierre Peterlongo
author_sort Victor Levallois
collection DOAJ
description Summary: Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets.This paper presents the Backpack Quotient Filter (BQF), a data structure for indexing k-mers (substrings of length k), which offers greater space efficiency than the Counting Quotient Filter (CQF). The BQF maintains essential features such as abundance information and dynamicity, with an extremely low false positive rate of less than 10−5%. Our method redefines abundance information handling and implements an independent strategy for space efficiency.The BQF uses four times less space than the CQF on complex datasets such as sea-water metagenomics sequences. Additionally, its space efficiency improves with larger datasets, addressing the need for scalable data solutions.
format Article
id doaj-art-487188aa09fb47c4a1746c76ec14fb88
institution Kabale University
issn 2589-0042
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series iScience
spelling doaj-art-487188aa09fb47c4a1746c76ec14fb882024-12-22T05:29:21ZengElsevieriScience2589-00422024-12-012712111435The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundanceVictor Levallois0Francesco Andreace1Bertrand Le Gal2Yoann Dufresne3Pierre Peterlongo4University Rennes, Inria, CNRS, IRISA - UMR 6074, 35000 Rennes, France; Corresponding authorDepartment of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France; Sorbonne Université, Collège doctoral, 75005 Paris, FranceUniversity Rennes, Inria, CNRS, IRISA - Taran team, ENSSAT, Lannion, FranceDepartment of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France; Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, 75015 Paris, FranceUniversity Rennes, Inria, CNRS, IRISA - UMR 6074, 35000 Rennes, France; Corresponding authorSummary: Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets.This paper presents the Backpack Quotient Filter (BQF), a data structure for indexing k-mers (substrings of length k), which offers greater space efficiency than the Counting Quotient Filter (CQF). The BQF maintains essential features such as abundance information and dynamicity, with an extremely low false positive rate of less than 10−5%. Our method redefines abundance information handling and implements an independent strategy for space efficiency.The BQF uses four times less space than the CQF on complex datasets such as sea-water metagenomics sequences. Additionally, its space efficiency improves with larger datasets, addressing the need for scalable data solutions.http://www.sciencedirect.com/science/article/pii/S2589004224026609Biocomputational methodGenomic analysisHigh-performance computing in bioinformatics
spellingShingle Victor Levallois
Francesco Andreace
Bertrand Le Gal
Yoann Dufresne
Pierre Peterlongo
The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
iScience
Biocomputational method
Genomic analysis
High-performance computing in bioinformatics
title The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
title_full The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
title_fullStr The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
title_full_unstemmed The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
title_short The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
title_sort backpack quotient filter a dynamic and space efficient data structure for querying k mers with abundance
topic Biocomputational method
Genomic analysis
High-performance computing in bioinformatics
url http://www.sciencedirect.com/science/article/pii/S2589004224026609
work_keys_str_mv AT victorlevallois thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT francescoandreace thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT bertrandlegal thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT yoanndufresne thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT pierrepeterlongo thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT victorlevallois backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT francescoandreace backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT bertrandlegal backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT yoanndufresne backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance
AT pierrepeterlongo backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance