Aryana-bs: context-aware alignment of bisulfite-sequencing reads

Abstract Background DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmet...

Full description

Saved in:
Bibliographic Details
Main Authors: Hassan Nikaein, Ali Sharifi-Zarchi, Afsoon Afzal, Saeedeh Ezzati, Farzane Rasti, Hamidreza Chitsaz, Govindarajan Kunde-Ramamoorthy
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06182-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332291447291904
author Hassan Nikaein
Ali Sharifi-Zarchi
Afsoon Afzal
Saeedeh Ezzati
Farzane Rasti
Hamidreza Chitsaz
Govindarajan Kunde-Ramamoorthy
author_facet Hassan Nikaein
Ali Sharifi-Zarchi
Afsoon Afzal
Saeedeh Ezzati
Farzane Rasti
Hamidreza Chitsaz
Govindarajan Kunde-Ramamoorthy
author_sort Hassan Nikaein
collection DOAJ
description Abstract Background DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmethylated cytosines to thymines while leaving methylated cytosines intact. However, this C-to-T conversion presents a well-known challenge in conventional short-read aligners, which treat these conversions as substitutions. Many aligners that require seed sequences fail when frequent C-to-T conversions occur over short distances, resulting in reduced alignment accuracy. To address this challenge, two alignment methods have been well established: three-letter alignment and wildcard alignment. Three-letter alignment faces the significant issue of data loss by converting all thymines to cytosines, which obscures meaningful information. On the other hand, wildcard alignment introduces a biased alignment, failing to treat reads from unmethylated and methylated regions equally, leading to artifacts in methylation level estimation and inaccuracies in quantifying DNA methylation. This work introduces ARYANA-BS, a novel BS aligner that diverges from conventional DNA aligners by directly integrating BS-specific base alterations within its alignment engine. Leveraging known DNA methylation patterns across different genomic contexts, ARYANA-BS constructs five indexes from the reference genome, aligns each read to all indexes, and selects the alignment with the minimum penalty. To further refine alignment accuracy, an optional Expectation-Maximization (EM) step is incorporated, which integrates methylation probability information into the decision-making process for choosing the optimal index for each read. This approach aims to enhance BS read alignment accuracy by accommodating the complexities of DNA methylation patterns across diverse genomic contexts. Results Experimental evaluations on both simulated and real data reveal that ARYANA-BS achieves state-of-the-art accuracy, maintaining competitive speed and memory efficiency. Conclusions ARYANA-BS significantly improves alignment accuracy for bisulfite sequencing data by effectively integrating DNA methylation-specific alterations and genomic context. It outperforms existing methods, such as BSMAP, bwa-meth, Bismark, BSBolt, and abismal, particularly in robustness against genomic biases and alignment of longer, higher-error reads, demonstrating suitability for cancer research and cell-free DNA studies. While the Expectation-Maximization (EM) algorithm provides only modest initial improvements, it establishes a valuable framework for future refinement and potential enhancements in sensitive applications.
format Article
id doaj-art-3290ac53e2f14317a44e68f52bd6b2b6
institution Kabale University
issn 1471-2105
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-3290ac53e2f14317a44e68f52bd6b2b62025-08-20T03:46:15ZengBMCBMC Bioinformatics1471-21052025-07-0126111910.1186/s12859-025-06182-5Aryana-bs: context-aware alignment of bisulfite-sequencing readsHassan Nikaein0Ali Sharifi-Zarchi1Afsoon Afzal2Saeedeh Ezzati3Farzane Rasti4Hamidreza Chitsaz5Govindarajan Kunde-Ramamoorthy6Department of Computer Engineering, Sharif University of TechnologyDepartment of Computer Engineering, Sharif University of TechnologySchool of Computer Science, Carnegie Mellon UniversityDepartment of Computer Engineering, Sharif University of TechnologyDepartment of Computer Engineering, Sharif University of TechnologyAutoX technologiesThe Jackson LaboratoryAbstract Background DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmethylated cytosines to thymines while leaving methylated cytosines intact. However, this C-to-T conversion presents a well-known challenge in conventional short-read aligners, which treat these conversions as substitutions. Many aligners that require seed sequences fail when frequent C-to-T conversions occur over short distances, resulting in reduced alignment accuracy. To address this challenge, two alignment methods have been well established: three-letter alignment and wildcard alignment. Three-letter alignment faces the significant issue of data loss by converting all thymines to cytosines, which obscures meaningful information. On the other hand, wildcard alignment introduces a biased alignment, failing to treat reads from unmethylated and methylated regions equally, leading to artifacts in methylation level estimation and inaccuracies in quantifying DNA methylation. This work introduces ARYANA-BS, a novel BS aligner that diverges from conventional DNA aligners by directly integrating BS-specific base alterations within its alignment engine. Leveraging known DNA methylation patterns across different genomic contexts, ARYANA-BS constructs five indexes from the reference genome, aligns each read to all indexes, and selects the alignment with the minimum penalty. To further refine alignment accuracy, an optional Expectation-Maximization (EM) step is incorporated, which integrates methylation probability information into the decision-making process for choosing the optimal index for each read. This approach aims to enhance BS read alignment accuracy by accommodating the complexities of DNA methylation patterns across diverse genomic contexts. Results Experimental evaluations on both simulated and real data reveal that ARYANA-BS achieves state-of-the-art accuracy, maintaining competitive speed and memory efficiency. Conclusions ARYANA-BS significantly improves alignment accuracy for bisulfite sequencing data by effectively integrating DNA methylation-specific alterations and genomic context. It outperforms existing methods, such as BSMAP, bwa-meth, Bismark, BSBolt, and abismal, particularly in robustness against genomic biases and alignment of longer, higher-error reads, demonstrating suitability for cancer research and cell-free DNA studies. While the Expectation-Maximization (EM) algorithm provides only modest initial improvements, it establishes a valuable framework for future refinement and potential enhancements in sensitive applications.https://doi.org/10.1186/s12859-025-06182-5DNA methylationbisulfite sequencingalignmentCpG island
spellingShingle Hassan Nikaein
Ali Sharifi-Zarchi
Afsoon Afzal
Saeedeh Ezzati
Farzane Rasti
Hamidreza Chitsaz
Govindarajan Kunde-Ramamoorthy
Aryana-bs: context-aware alignment of bisulfite-sequencing reads
BMC Bioinformatics
DNA methylation
bisulfite sequencing
alignment
CpG island
title Aryana-bs: context-aware alignment of bisulfite-sequencing reads
title_full Aryana-bs: context-aware alignment of bisulfite-sequencing reads
title_fullStr Aryana-bs: context-aware alignment of bisulfite-sequencing reads
title_full_unstemmed Aryana-bs: context-aware alignment of bisulfite-sequencing reads
title_short Aryana-bs: context-aware alignment of bisulfite-sequencing reads
title_sort aryana bs context aware alignment of bisulfite sequencing reads
topic DNA methylation
bisulfite sequencing
alignment
CpG island
url https://doi.org/10.1186/s12859-025-06182-5
work_keys_str_mv AT hassannikaein aryanabscontextawarealignmentofbisulfitesequencingreads
AT alisharifizarchi aryanabscontextawarealignmentofbisulfitesequencingreads
AT afsoonafzal aryanabscontextawarealignmentofbisulfitesequencingreads
AT saeedehezzati aryanabscontextawarealignmentofbisulfitesequencingreads
AT farzanerasti aryanabscontextawarealignmentofbisulfitesequencingreads
AT hamidrezachitsaz aryanabscontextawarealignmentofbisulfitesequencingreads
AT govindarajankunderamamoorthy aryanabscontextawarealignmentofbisulfitesequencingreads