Accurate cross-species 5mC detection for Oxford Nanopore sequencing in plants with DeepPlant

Abstract Nanopore sequencing enables comprehensive detection of 5-methylcytosine (5mC), particularly in repeat regions. However, CHH methylation detection in plants is limited by the scarcity of high-methylation positive samples, reducing generalization across species. Dorado, the only tool for plan...

Full description

Saved in:
Bibliographic Details
Main Authors: He-Xu Chen, Zhen-Dong Liu, Xin Bai, Bo Wu, Rong Song, Hui-Cong Yao, Ying Chen, Wei Chi, Qian Hua, Liang Cheng, Chuan-Le Xiao
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-58576-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Nanopore sequencing enables comprehensive detection of 5-methylcytosine (5mC), particularly in repeat regions. However, CHH methylation detection in plants is limited by the scarcity of high-methylation positive samples, reducing generalization across species. Dorado, the only tool for plant 5mC detection on the R10.4 platform, lacks extensive species testing. Here, we develop DeepPlant, a deep learning model incorporating both Bi-LSTM and Transformer architectures, which significantly improves CHH detection accuracy and performs well for CpG and CHG motifs. We address the scarcity of methylation-positive CHH training samples through screening species with abundant high-methylation CHH sites using bisulfite-sequencing and generate datasets that cover diverse 9-mer motifs for training and testing DeepPlant. Evaluated across nine species, DeepPlant achieves high whole-genome methylation frequency correlations (0.705-0.838) with BS-seq data on CHH, improved by 23.4- 117.6% compared to Dorado. DeepPlant also demonstrates superior single-molecule accuracy and F1 score, offering strong generalization for plant epigenetics research.
ISSN:2041-1723