Fast noisy long read alignment with multi-level parallelism

Abstract Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignm...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeyu Xia, Canqun Yang, Chenchen Peng, Yifei Guo, Yufei Guo, Tao Tang, Yingbo Cui
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06129-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849314611790086144
author Zeyu Xia
Canqun Yang
Chenchen Peng
Yifei Guo
Yufei Guo
Tao Tang
Yingbo Cui
author_facet Zeyu Xia
Canqun Yang
Chenchen Peng
Yifei Guo
Yufei Guo
Tao Tang
Yingbo Cui
author_sort Zeyu Xia
collection DOAJ
description Abstract Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU’s performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing. Results To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node. Conclusions Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.
format Article
id doaj-art-3bd0199fb8d24246aa1c0094a63059e1
institution Kabale University
issn 1471-2105
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-3bd0199fb8d24246aa1c0094a63059e12025-08-20T03:52:24ZengBMCBMC Bioinformatics1471-21052025-05-0126113110.1186/s12859-025-06129-wFast noisy long read alignment with multi-level parallelismZeyu Xia0Canqun Yang1Chenchen Peng2Yifei Guo3Yufei Guo4Tao Tang5Yingbo Cui6College of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyCollege of Computer Science and Technology, National University of Defense TechnologyAbstract Background The advent of Single Molecule Real-Time (SMRT) sequencing has overcome many limitations of second-generation sequencing, such as limited read lengths, PCR amplification biases. However, longer reads increase data volume exponentially and high error rates make many existing alignment tools inapplicable. Additionally, a single CPU’s performance bottleneck restricts the effectiveness of alignment algorithms for SMRT sequencing. Results To address these challenges, we introduce ParaHAT, a parallel alignment algorithm for noisy long reads. ParaHAT utilizes vector-level, thread-level, process-level, and heterogeneous parallelism. We redesign the dynamic programming matrices layouts to eliminate data dependency in the base-level alignment, enabling effective vectorization. We further enhance computational speed through heterogeneous parallel technology and implement the algorithm for multi-node computing using MPI, overcoming the computational limits of a single node. Conclusions Performance evaluations show that ParaHAT got a 10.03x speedup in base-level alignment, with a parallel acceleration ratio and weak scalability metric of 94.61 and 98.98% on 128 nodes, respectively.https://doi.org/10.1186/s12859-025-06129-wSequence alignmentSMRTParallel processingVector-level parallelizationMPIHeterogeneous parallelization
spellingShingle Zeyu Xia
Canqun Yang
Chenchen Peng
Yifei Guo
Yufei Guo
Tao Tang
Yingbo Cui
Fast noisy long read alignment with multi-level parallelism
BMC Bioinformatics
Sequence alignment
SMRT
Parallel processing
Vector-level parallelization
MPI
Heterogeneous parallelization
title Fast noisy long read alignment with multi-level parallelism
title_full Fast noisy long read alignment with multi-level parallelism
title_fullStr Fast noisy long read alignment with multi-level parallelism
title_full_unstemmed Fast noisy long read alignment with multi-level parallelism
title_short Fast noisy long read alignment with multi-level parallelism
title_sort fast noisy long read alignment with multi level parallelism
topic Sequence alignment
SMRT
Parallel processing
Vector-level parallelization
MPI
Heterogeneous parallelization
url https://doi.org/10.1186/s12859-025-06129-w
work_keys_str_mv AT zeyuxia fastnoisylongreadalignmentwithmultilevelparallelism
AT canqunyang fastnoisylongreadalignmentwithmultilevelparallelism
AT chenchenpeng fastnoisylongreadalignmentwithmultilevelparallelism
AT yifeiguo fastnoisylongreadalignmentwithmultilevelparallelism
AT yufeiguo fastnoisylongreadalignmentwithmultilevelparallelism
AT taotang fastnoisylongreadalignmentwithmultilevelparallelism
AT yingbocui fastnoisylongreadalignmentwithmultilevelparallelism