Fast and accurate short-read alignment with hybrid hash-tree data structure
Abstract Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to gi...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BioMed Central
2024-10-01
|
| Series: | Genomics & Informatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s44342-024-00012-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849685978217709568 |
|---|---|
| author | Junichiro Makino Toshikazu Ebisuzaki Ryutaro Himeno Yoshihide Hayashizaki |
| author_facet | Junichiro Makino Toshikazu Ebisuzaki Ryutaro Himeno Yoshihide Hayashizaki |
| author_sort | Junichiro Makino |
| collection | DOAJ |
| description | Abstract Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk. |
| format | Article |
| id | doaj-art-bc2b53d30bf04e8d835215d9a284c30e |
| institution | DOAJ |
| issn | 2234-0742 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | BioMed Central |
| record_format | Article |
| series | Genomics & Informatics |
| spelling | doaj-art-bc2b53d30bf04e8d835215d9a284c30e2025-08-20T03:22:53ZengBioMed CentralGenomics & Informatics2234-07422024-10-0122111010.1186/s44342-024-00012-5Fast and accurate short-read alignment with hybrid hash-tree data structureJunichiro Makino0Toshikazu Ebisuzaki1Ryutaro Himeno2Yoshihide Hayashizaki3Advanced Accelerating Systems Co. LtdK.K. DnaformAdvanced Accelerating Systems Co. LtdK.K. DnaformAbstract Rapidly increasing the amount of short-read data generated by NGSs (new-generation sequencers) calls for the development of fast and accurate read alignment programs. The programs based on the hash table (BLAST) and Burrows-Wheeler transform (bwa-mem) are used, and the latter is known to give superior performance. We here present a new algorithm, a hybrid of hash table and suffix tree, which we designed to speed up the alignment of short reads against large reference sequences such as the human genome. The total turnaround time for processing one human genome sample (read depth of 30) is just 31 min with our system while that was more than 25 h with bwa-mem/gatk. The time for the aligner alone is 28 min for our system but around 2 h for bwa-mem. Our new algorithm is 4.4 times faster than bwa-mem while achieving similar accuracy. Variant calling and other downstream analyses after the alignment can be done with open-source tools such as SAMtools and Genome Analysis Toolkit (gatk) packages, as well as our own fast variant caller, which is well parallelized and much faster than gatk.https://doi.org/10.1186/s44342-024-00012-5Human whole genome analysisShort readAlignment (mapping)Variant callingHashTree |
| spellingShingle | Junichiro Makino Toshikazu Ebisuzaki Ryutaro Himeno Yoshihide Hayashizaki Fast and accurate short-read alignment with hybrid hash-tree data structure Genomics & Informatics Human whole genome analysis Short read Alignment (mapping) Variant calling Hash Tree |
| title | Fast and accurate short-read alignment with hybrid hash-tree data structure |
| title_full | Fast and accurate short-read alignment with hybrid hash-tree data structure |
| title_fullStr | Fast and accurate short-read alignment with hybrid hash-tree data structure |
| title_full_unstemmed | Fast and accurate short-read alignment with hybrid hash-tree data structure |
| title_short | Fast and accurate short-read alignment with hybrid hash-tree data structure |
| title_sort | fast and accurate short read alignment with hybrid hash tree data structure |
| topic | Human whole genome analysis Short read Alignment (mapping) Variant calling Hash Tree |
| url | https://doi.org/10.1186/s44342-024-00012-5 |
| work_keys_str_mv | AT junichiromakino fastandaccurateshortreadalignmentwithhybridhashtreedatastructure AT toshikazuebisuzaki fastandaccurateshortreadalignmentwithhybridhashtreedatastructure AT ryutarohimeno fastandaccurateshortreadalignmentwithhybridhashtreedatastructure AT yoshihidehayashizaki fastandaccurateshortreadalignmentwithhybridhashtreedatastructure |