Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning

Abstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. W...

Full description

Saved in:
Bibliographic Details
Main Authors: Mauricio Perez, Michiko Kimoto, Priscilla Rajakumar, Chayaporn Suphavilai, Rafael Peres da Silva, Hui Pen Tan, Nicholas Ting Xun Ong, Hannah Nicholas, Ichiro Hirao, Wei Leong Chew, Niranjan Nagarajan
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-62347-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849764246094610432
author Mauricio Perez
Michiko Kimoto
Priscilla Rajakumar
Chayaporn Suphavilai
Rafael Peres da Silva
Hui Pen Tan
Nicholas Ting Xun Ong
Hannah Nicholas
Ichiro Hirao
Wei Leong Chew
Niranjan Nagarajan
author_facet Mauricio Perez
Michiko Kimoto
Priscilla Rajakumar
Chayaporn Suphavilai
Rafael Peres da Silva
Hui Pen Tan
Nicholas Ting Xun Ong
Hannah Nicholas
Ichiro Hirao
Wei Leong Chew
Niranjan Nagarajan
author_sort Mauricio Perez
collection DOAJ
description Abstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. We demonstrate that XNAs with NCBs can be robustly sequenced on a MinION system ( > 2.3×106 reads/flowcell) to obtain significantly distinct signals from controls (median fold-change >6×). To enable AI-model training, we synthesized and sequenced a complex pool of 1,024 NCB-containing oligonucleotides with varied 6-mer contexts and high purity ( > 90%). Bootstrapped models assisted in data preparation, and data augmentation with spliced reads provided high context diversity, enabling learning of generalizable models to decipher NCB-containing sequences with high accuracy ( > 80%) and specificity (99%). These results highlight the versatility of nanopore sequencing for interrogating unusual nucleic acids, and the potential to transform the study of genetic material beyond those that use canonical bases.
format Article
id doaj-art-fb69d5db04734fee81f944844752a7dc
institution DOAJ
issn 2041-1723
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-fb69d5db04734fee81f944844752a7dc2025-08-20T03:05:10ZengNature PortfolioNature Communications2041-17232025-07-0116111210.1038/s41467-025-62347-zDirect high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learningMauricio Perez0Michiko Kimoto1Priscilla Rajakumar2Chayaporn Suphavilai3Rafael Peres da Silva4Hui Pen Tan5Nicholas Ting Xun Ong6Hannah Nicholas7Ichiro Hirao8Wei Leong Chew9Niranjan Nagarajan10Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeInstitute of Bioengineering and Bioimaging (IBB), Agency for Science, Technology and Research (A*STAR), NanosGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeInstitute of Bioengineering and Bioimaging (IBB), Agency for Science, Technology and Research (A*STAR), NanosGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeInstitute of Bioengineering and Bioimaging (IBB), Agency for Science, Technology and Research (A*STAR), NanosGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeAbstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. We demonstrate that XNAs with NCBs can be robustly sequenced on a MinION system ( > 2.3×106 reads/flowcell) to obtain significantly distinct signals from controls (median fold-change >6×). To enable AI-model training, we synthesized and sequenced a complex pool of 1,024 NCB-containing oligonucleotides with varied 6-mer contexts and high purity ( > 90%). Bootstrapped models assisted in data preparation, and data augmentation with spliced reads provided high context diversity, enabling learning of generalizable models to decipher NCB-containing sequences with high accuracy ( > 80%) and specificity (99%). These results highlight the versatility of nanopore sequencing for interrogating unusual nucleic acids, and the potential to transform the study of genetic material beyond those that use canonical bases.https://doi.org/10.1038/s41467-025-62347-z
spellingShingle Mauricio Perez
Michiko Kimoto
Priscilla Rajakumar
Chayaporn Suphavilai
Rafael Peres da Silva
Hui Pen Tan
Nicholas Ting Xun Ong
Hannah Nicholas
Ichiro Hirao
Wei Leong Chew
Niranjan Nagarajan
Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
Nature Communications
title Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
title_full Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
title_fullStr Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
title_full_unstemmed Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
title_short Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
title_sort direct high throughput deconvolution of non canonical bases via nanopore sequencing and bootstrapped learning
url https://doi.org/10.1038/s41467-025-62347-z
work_keys_str_mv AT mauricioperez directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT michikokimoto directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT priscillarajakumar directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT chayapornsuphavilai directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT rafaelperesdasilva directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT huipentan directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT nicholastingxunong directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT hannahnicholas directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT ichirohirao directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT weileongchew directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning
AT niranjannagarajan directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning