The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance
Summary: Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific seq...
        Saved in:
      
    
          | Main Authors: | , , , , | 
|---|---|
| Format: | Article | 
| Language: | English | 
| Published: | 
            Elsevier
    
        2024-12-01
     | 
| Series: | iScience | 
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2589004224026609 | 
| Tags: | 
       Add Tag    
     
      No Tags, Be the first to tag this record!
   
 | 
| _version_ | 1846113137106878464 | 
    
|---|---|
| author | Victor Levallois Francesco Andreace Bertrand Le Gal Yoann Dufresne Pierre Peterlongo  | 
    
| author_facet | Victor Levallois Francesco Andreace Bertrand Le Gal Yoann Dufresne Pierre Peterlongo  | 
    
| author_sort | Victor Levallois | 
    
| collection | DOAJ | 
    
| description | Summary: Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets.This paper presents the Backpack Quotient Filter (BQF), a data structure for indexing k-mers (substrings of length k), which offers greater space efficiency than the Counting Quotient Filter (CQF). The BQF maintains essential features such as abundance information and dynamicity, with an extremely low false positive rate of less than 10−5%. Our method redefines abundance information handling and implements an independent strategy for space efficiency.The BQF uses four times less space than the CQF on complex datasets such as sea-water metagenomics sequences. Additionally, its space efficiency improves with larger datasets, addressing the need for scalable data solutions. | 
    
| format | Article | 
    
| id | doaj-art-487188aa09fb47c4a1746c76ec14fb88 | 
    
| institution | Kabale University | 
    
| issn | 2589-0042 | 
    
| language | English | 
    
| publishDate | 2024-12-01 | 
    
| publisher | Elsevier | 
    
| record_format | Article | 
    
| series | iScience | 
    
| spelling | doaj-art-487188aa09fb47c4a1746c76ec14fb882024-12-22T05:29:21ZengElsevieriScience2589-00422024-12-012712111435The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundanceVictor Levallois0Francesco Andreace1Bertrand Le Gal2Yoann Dufresne3Pierre Peterlongo4University Rennes, Inria, CNRS, IRISA - UMR 6074, 35000 Rennes, France; Corresponding authorDepartment of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France; Sorbonne Université, Collège doctoral, 75005 Paris, FranceUniversity Rennes, Inria, CNRS, IRISA - Taran team, ENSSAT, Lannion, FranceDepartment of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France; Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, 75015 Paris, FranceUniversity Rennes, Inria, CNRS, IRISA - UMR 6074, 35000 Rennes, France; Corresponding authorSummary: Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets.This paper presents the Backpack Quotient Filter (BQF), a data structure for indexing k-mers (substrings of length k), which offers greater space efficiency than the Counting Quotient Filter (CQF). The BQF maintains essential features such as abundance information and dynamicity, with an extremely low false positive rate of less than 10−5%. Our method redefines abundance information handling and implements an independent strategy for space efficiency.The BQF uses four times less space than the CQF on complex datasets such as sea-water metagenomics sequences. Additionally, its space efficiency improves with larger datasets, addressing the need for scalable data solutions.http://www.sciencedirect.com/science/article/pii/S2589004224026609Biocomputational methodGenomic analysisHigh-performance computing in bioinformatics | 
    
| spellingShingle | Victor Levallois Francesco Andreace Bertrand Le Gal Yoann Dufresne Pierre Peterlongo The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance iScience Biocomputational method Genomic analysis High-performance computing in bioinformatics  | 
    
| title | The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance | 
    
| title_full | The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance | 
    
| title_fullStr | The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance | 
    
| title_full_unstemmed | The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance | 
    
| title_short | The backpack quotient filter: A dynamic and space-efficient data structure for querying k-mers with abundance | 
    
| title_sort | backpack quotient filter a dynamic and space efficient data structure for querying k mers with abundance | 
    
| topic | Biocomputational method Genomic analysis High-performance computing in bioinformatics  | 
    
| url | http://www.sciencedirect.com/science/article/pii/S2589004224026609 | 
    
| work_keys_str_mv | AT victorlevallois thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT francescoandreace thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT bertrandlegal thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT yoanndufresne thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT pierrepeterlongo thebackpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT victorlevallois backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT francescoandreace backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT bertrandlegal backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT yoanndufresne backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance AT pierrepeterlongo backpackquotientfilteradynamicandspaceefficientdatastructureforqueryingkmerswithabundance  |