Fast and accurate short-read alignment with hybrid hash-tree data structure

Abstract Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to gi...

Full description

Saved in:
Bibliographic Details
Main Authors: Junichiro Makino, Toshikazu Ebisuzaki, Ryutaro Himeno, Yoshihide Hayashizaki
Format: Article
Language:English
Published: BioMed Central 2024-10-01
Series:Genomics & Informatics
Subjects:
Online Access:https://doi.org/10.1186/s44342-024-00012-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849685978217709568
author Junichiro Makino
Toshikazu Ebisuzaki
Ryutaro Himeno
Yoshihide Hayashizaki
author_facet Junichiro Makino
Toshikazu Ebisuzaki
Ryutaro Himeno
Yoshihide Hayashizaki
author_sort Junichiro Makino
collection DOAJ
description Abstract Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.
format Article
id doaj-art-bc2b53d30bf04e8d835215d9a284c30e
institution DOAJ
issn 2234-0742
language English
publishDate 2024-10-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-bc2b53d30bf04e8d835215d9a284c30e2025-08-20T03:22:53ZengBioMed CentralGenomics & Informatics2234-07422024-10-0122111010.1186/s44342-024-00012-5Fast and accurate short-read alignment with hybrid hash-tree data structureJunichiro Makino0Toshikazu Ebisuzaki1Ryutaro Himeno2Yoshihide Hayashizaki3Advanced Accelerating Systems Co. LtdK.K. DnaformAdvanced Accelerating Systems Co. LtdK.K. DnaformAbstract Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.https://doi.org/10.1186/s44342-024-00012-5Human whole genome analysisShort readAlignment (mapping)Variant callingHashTree
spellingShingle Junichiro Makino
Toshikazu Ebisuzaki
Ryutaro Himeno
Yoshihide Hayashizaki
Fast and accurate short-read alignment with hybrid hash-tree data structure
Genomics & Informatics
Human whole genome analysis
Short read
Alignment (mapping)
Variant calling
Hash
Tree
title Fast and accurate short-read alignment with hybrid hash-tree data structure
title_full Fast and accurate short-read alignment with hybrid hash-tree data structure
title_fullStr Fast and accurate short-read alignment with hybrid hash-tree data structure
title_full_unstemmed Fast and accurate short-read alignment with hybrid hash-tree data structure
title_short Fast and accurate short-read alignment with hybrid hash-tree data structure
title_sort fast and accurate short read alignment with hybrid hash tree data structure
topic Human whole genome analysis
Short read
Alignment (mapping)
Variant calling
Hash
Tree
url https://doi.org/10.1186/s44342-024-00012-5
work_keys_str_mv AT junichiromakino fastandaccurateshortreadalignmentwithhybridhashtreedatastructure
AT toshikazuebisuzaki fastandaccurateshortreadalignmentwithhybridhashtreedatastructure
AT ryutarohimeno fastandaccurateshortreadalignmentwithhybridhashtreedatastructure
AT yoshihidehayashizaki fastandaccurateshortreadalignmentwithhybridhashtreedatastructure