Fast noisy long read alignment with multi-level parallelism
Abstract Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignm...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | BMC Bioinformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12859-025-06129-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849314611790086144 |
|---|---|
| author | Zeyu Xia Canqun Yang Chenchen Peng Yifei Guo Yufei Guo Tao Tang Yingbo Cui |
| author_facet | Zeyu Xia Canqun Yang Chenchen Peng Yifei Guo Yufei Guo Tao Tang Yingbo Cui |
| author_sort | Zeyu Xia |
| collection | DOAJ |
| description | Abstract Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU’s performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing. Results To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node. Conclusions Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively. |
| format | Article |
| id | doaj-art-3bd0199fb8d24246aa1c0094a63059e1 |
| institution | Kabale University |
| issn | 1471-2105 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Bioinformatics |
| spelling | doaj-art-3bd0199fb8d24246aa1c0094a63059e12025-08-20T03:52:24ZengBMCBMC Bioinformatics1471-21052025-05-0126113110.1186/s12859-025-06129-wFast noisy long read alignment with multi-level parallelismZeyu Xia0Canqun Yang1Chenchen Peng2Yifei Guo3Yufei Guo4Tao Tang5Yingbo Cui6College of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyAbstract Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU’s performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing. Results To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node. Conclusions Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.https://doi.org/10.1186/s12859-025-06129-wSequence alignmentSMRTParallel processingVector-level parallelizationMPIHeterogeneous parallelization |
| spellingShingle | Zeyu Xia Canqun Yang Chenchen Peng Yifei Guo Yufei Guo Tao Tang Yingbo Cui Fast noisy long read alignment with multi-level parallelism BMC Bioinformatics Sequence alignment SMRT Parallel processing Vector-level parallelization MPI Heterogeneous parallelization |
| title | Fast noisy long read alignment with multi-level parallelism |
| title_full | Fast noisy long read alignment with multi-level parallelism |
| title_fullStr | Fast noisy long read alignment with multi-level parallelism |
| title_full_unstemmed | Fast noisy long read alignment with multi-level parallelism |
| title_short | Fast noisy long read alignment with multi-level parallelism |
| title_sort | fast noisy long read alignment with multi level parallelism |
| topic | Sequence alignment SMRT Parallel processing Vector-level parallelization MPI Heterogeneous parallelization |
| url | https://doi.org/10.1186/s12859-025-06129-w |
| work_keys_str_mv | AT zeyuxia fastnoisylongreadalignmentwithmultilevelparallelism AT canqunyang fastnoisylongreadalignmentwithmultilevelparallelism AT chenchenpeng fastnoisylongreadalignmentwithmultilevelparallelism AT yifeiguo fastnoisylongreadalignmentwithmultilevelparallelism AT yufeiguo fastnoisylongreadalignmentwithmultilevelparallelism AT taotang fastnoisylongreadalignmentwithmultilevelparallelism AT yingbocui fastnoisylongreadalignmentwithmultilevelparallelism |