Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning

Abstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. W...

Full description

Saved in:
Bibliographic Details
Main Authors: Mauricio Perez, Michiko Kimoto, Priscilla Rajakumar, Chayaporn Suphavilai, Rafael Peres da Silva, Hui Pen Tan, Nicholas Ting Xun Ong, Hannah Nicholas, Ichiro Hirao, Wei Leong Chew, Niranjan Nagarajan
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-62347-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. We demonstrate that XNAs with NCBs can be robustly sequenced on a MinION system ( > 2.3×106 reads/flowcell) to obtain significantly distinct signals from controls (median fold-change >6×). To enable AI-model training, we synthesized and sequenced a complex pool of 1,024 NCB-containing oligonucleotides with varied 6-mer contexts and high purity ( > 90%). Bootstrapped models assisted in data preparation, and data augmentation with spliced reads provided high context diversity, enabling learning of generalizable models to decipher NCB-containing sequences with high accuracy ( > 80%) and specificity (99%). These results highlight the versatility of nanopore sequencing for interrogating unusual nucleic acids, and the potential to transform the study of genetic material beyond those that use canonical bases.
ISSN:2041-1723