Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning
Abstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. W...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-62347-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849764246094610432 |
|---|---|
| author | Mauricio Perez Michiko Kimoto Priscilla Rajakumar Chayaporn Suphavilai Rafael Peres da Silva Hui Pen Tan Nicholas Ting Xun Ong Hannah Nicholas Ichiro Hirao Wei Leong Chew Niranjan Nagarajan |
| author_facet | Mauricio Perez Michiko Kimoto Priscilla Rajakumar Chayaporn Suphavilai Rafael Peres da Silva Hui Pen Tan Nicholas Ting Xun Ong Hannah Nicholas Ichiro Hirao Wei Leong Chew Niranjan Nagarajan |
| author_sort | Mauricio Perez |
| collection | DOAJ |
| description | Abstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. We demonstrate that XNAs with NCBs can be robustly sequenced on a MinION system ( > 2.3×106 reads/flowcell) to obtain significantly distinct signals from controls (median fold-change >6×). To enable AI-model training, we synthesized and sequenced a complex pool of 1,024 NCB-containing oligonucleotides with varied 6-mer contexts and high purity ( > 90%). Bootstrapped models assisted in data preparation, and data augmentation with spliced reads provided high context diversity, enabling learning of generalizable models to decipher NCB-containing sequences with high accuracy ( > 80%) and specificity (99%). These results highlight the versatility of nanopore sequencing for interrogating unusual nucleic acids, and the potential to transform the study of genetic material beyond those that use canonical bases. |
| format | Article |
| id | doaj-art-fb69d5db04734fee81f944844752a7dc |
| institution | DOAJ |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-fb69d5db04734fee81f944844752a7dc2025-08-20T03:05:10ZengNature PortfolioNature Communications2041-17232025-07-0116111210.1038/s41467-025-62347-zDirect high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learningMauricio Perez0Michiko Kimoto1Priscilla Rajakumar2Chayaporn Suphavilai3Rafael Peres da Silva4Hui Pen Tan5Nicholas Ting Xun Ong6Hannah Nicholas7Ichiro Hirao8Wei Leong Chew9Niranjan Nagarajan10Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeInstitute of Bioengineering and Bioimaging (IBB), Agency for Science, Technology and Research (A*STAR), NanosGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeInstitute of Bioengineering and Bioimaging (IBB), Agency for Science, Technology and Research (A*STAR), NanosGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeInstitute of Bioengineering and Bioimaging (IBB), Agency for Science, Technology and Research (A*STAR), NanosGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeGenome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, GenomeAbstract The discovery of non-canonical bases (NCBs) and development of synthetic xeno-nucleic acids (XNAs) has spawned interest in many applications in viral genomics, synthetic biology and DNA storage. However, inability to do high-throughput sequencing of NCBs has been a significant limitation. We demonstrate that XNAs with NCBs can be robustly sequenced on a MinION system ( > 2.3×106 reads/flowcell) to obtain significantly distinct signals from controls (median fold-change >6×). To enable AI-model training, we synthesized and sequenced a complex pool of 1,024 NCB-containing oligonucleotides with varied 6-mer contexts and high purity ( > 90%). Bootstrapped models assisted in data preparation, and data augmentation with spliced reads provided high context diversity, enabling learning of generalizable models to decipher NCB-containing sequences with high accuracy ( > 80%) and specificity (99%). These results highlight the versatility of nanopore sequencing for interrogating unusual nucleic acids, and the potential to transform the study of genetic material beyond those that use canonical bases.https://doi.org/10.1038/s41467-025-62347-z |
| spellingShingle | Mauricio Perez Michiko Kimoto Priscilla Rajakumar Chayaporn Suphavilai Rafael Peres da Silva Hui Pen Tan Nicholas Ting Xun Ong Hannah Nicholas Ichiro Hirao Wei Leong Chew Niranjan Nagarajan Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning Nature Communications |
| title | Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning |
| title_full | Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning |
| title_fullStr | Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning |
| title_full_unstemmed | Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning |
| title_short | Direct high-throughput deconvolution of non-canonical bases via nanopore sequencing and bootstrapped learning |
| title_sort | direct high throughput deconvolution of non canonical bases via nanopore sequencing and bootstrapped learning |
| url | https://doi.org/10.1038/s41467-025-62347-z |
| work_keys_str_mv | AT mauricioperez directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT michikokimoto directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT priscillarajakumar directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT chayapornsuphavilai directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT rafaelperesdasilva directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT huipentan directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT nicholastingxunong directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT hannahnicholas directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT ichirohirao directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT weileongchew directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning AT niranjannagarajan directhighthroughputdeconvolutionofnoncanonicalbasesviananoporesequencingandbootstrappedlearning |