b-move: faster lossless approximate pattern matching in a run-length compressed index

Abstract Background Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limita...

Full description

Saved in:
Bibliographic Details
Main Authors: Lore Depuydt, Luca Renders, Simon Van de Vyver, Lennart Veys, Travis Gagie, Jan Fostier
Format: Article
Language:English
Published: BMC 2025-08-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-025-00281-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238185060597760
author Lore Depuydt
Luca Renders
Simon Van de Vyver
Lennart Veys
Travis Gagie
Jan Fostier
author_facet Lore Depuydt
Luca Renders
Simon Van de Vyver
Lennart Veys
Travis Gagie
Jan Fostier
author_sort Lore Depuydt
collection DOAJ
description Abstract Background Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.’s r-index and Nishimoto and Tabei’s move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.’s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. Results We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient, lossless approximate pattern matching in run-length compressed space. It achieves bidirectional character extensions up to 7 times faster than the br-index, closing the performance gap with FM-index-based alternatives. For locating occurrences, b-move performs $$\phi$$ ϕ and $$\phi ^{-1}$$ ϕ - 1 operations up to 7 times faster than the br-index. At the same time, it maintains the favorable memory characteristics of the br-index, for example, all available complete E. coli genomes on NCBI’s RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Conclusions b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.
format Article
id doaj-art-e8fa08d0d60b4f4886a7f56b73954da2
institution Kabale University
issn 1748-7188
language English
publishDate 2025-08-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj-art-e8fa08d0d60b4f4886a7f56b73954da22025-08-20T04:01:43ZengBMCAlgorithms for Molecular Biology1748-71882025-08-0120111910.1186/s13015-025-00281-xb-move: faster lossless approximate pattern matching in a run-length compressed indexLore Depuydt0Luca Renders1Simon Van de Vyver2Lennart Veys3Travis Gagie4Jan Fostier5Ghent University-imecGhent University-imecGhent UniversityGhent UniversityDalhousie UniversityGhent University-imecAbstract Background Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.’s r-index and Nishimoto and Tabei’s move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.’s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. Results We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient, lossless approximate pattern matching in run-length compressed space. It achieves bidirectional character extensions up to 7 times faster than the br-index, closing the performance gap with FM-index-based alternatives. For locating occurrences, b-move performs $$\phi$$ ϕ and $$\phi ^{-1}$$ ϕ - 1 operations up to 7 times faster than the br-index. At the same time, it maintains the favorable memory characteristics of the br-index, for example, all available complete E. coli genomes on NCBI’s RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Conclusions b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.https://doi.org/10.1186/s13015-025-00281-xPan-genomicsFM-indexr-indexMove structureBidirectional searchApproximate pattern matching
spellingShingle Lore Depuydt
Luca Renders
Simon Van de Vyver
Lennart Veys
Travis Gagie
Jan Fostier
b-move: faster lossless approximate pattern matching in a run-length compressed index
Algorithms for Molecular Biology
Pan-genomics
FM-index
r-index
Move structure
Bidirectional search
Approximate pattern matching
title b-move: faster lossless approximate pattern matching in a run-length compressed index
title_full b-move: faster lossless approximate pattern matching in a run-length compressed index
title_fullStr b-move: faster lossless approximate pattern matching in a run-length compressed index
title_full_unstemmed b-move: faster lossless approximate pattern matching in a run-length compressed index
title_short b-move: faster lossless approximate pattern matching in a run-length compressed index
title_sort b move faster lossless approximate pattern matching in a run length compressed index
topic Pan-genomics
FM-index
r-index
Move structure
Bidirectional search
Approximate pattern matching
url https://doi.org/10.1186/s13015-025-00281-x
work_keys_str_mv AT loredepuydt bmovefasterlosslessapproximatepatternmatchinginarunlengthcompressedindex
AT lucarenders bmovefasterlosslessapproximatepatternmatchinginarunlengthcompressedindex
AT simonvandevyver bmovefasterlosslessapproximatepatternmatchinginarunlengthcompressedindex
AT lennartveys bmovefasterlosslessapproximatepatternmatchinginarunlengthcompressedindex
AT travisgagie bmovefasterlosslessapproximatepatternmatchinginarunlengthcompressedindex
AT janfostier bmovefasterlosslessapproximatepatternmatchinginarunlengthcompressedindex