Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design

Abstract DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low...

Full description

Saved in:
Bibliographic Details
Main Authors: Carina Imburgia, Lee Organick, Karen Zhang, Nicolas Cardozo, Jeff McBride, Callista Bee, Delaney Wilde, Gwendolin Roote, Sophia Jorgensen, David Ward, Charlie Anderson, Karin Strauss, Luis Ceze, Jeff Nivala
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-61264-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849343147326308352
author Carina Imburgia
Lee Organick
Karen Zhang
Nicolas Cardozo
Jeff McBride
Callista Bee
Delaney Wilde
Gwendolin Roote
Sophia Jorgensen
David Ward
Charlie Anderson
Karin Strauss
Luis Ceze
Jeff Nivala
author_facet Carina Imburgia
Lee Organick
Karen Zhang
Nicolas Cardozo
Jeff McBride
Callista Bee
Delaney Wilde
Gwendolin Roote
Sophia Jorgensen
David Ward
Charlie Anderson
Karin Strauss
Luis Ceze
Jeff Nivala
author_sort Carina Imburgia
collection DOAJ
description Abstract DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low-latency molecular data extraction. We first present a one-pot, multiplexed random access method in which specific data files are selectively cleaved using a CRISPR-Cas9 addressing system and then sequenced via nanopore technology. This approach was validated on a pool of 1.6 million DNA sequences, comprising 25 unique data files. We then developed a molecular similarity-search approach combining machine learning with Cas9-based retrieval. Using a deep neural network, we mapped a database of 1.74 million images into a reduced-dimensional embedding, encoding each embedding as a Cas9 target sequence. These target sequences act as molecular addresses, capturing clusters of semantically related images. By leveraging Cas9’s off-target cleavage activity, query sequences cleave both exact and closely related targets, enabling high-fidelity retrieval of molecular addresses corresponding to in silico image clusters similar to the query. These approaches move towards addressing key challenges in molecular data retrieval by offering simplified, rapid isothermal protocols and new DNA data access capabilities.
format Article
id doaj-art-2bec9d4d3a01476da5bba0b77952c9fb
institution Kabale University
issn 2041-1723
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-2bec9d4d3a01476da5bba0b77952c9fb2025-08-20T03:43:10ZengNature PortfolioNature Communications2041-17232025-07-0116111110.1038/s41467-025-61264-5Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided designCarina Imburgia0Lee Organick1Karen Zhang2Nicolas Cardozo3Jeff McBride4Callista Bee5Delaney Wilde6Gwendolin Roote7Sophia Jorgensen8David Ward9Charlie Anderson10Karin Strauss11Luis Ceze12Jeff Nivala13University of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringMicrosoft ResearchUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringAbstract DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low-latency molecular data extraction. We first present a one-pot, multiplexed random access method in which specific data files are selectively cleaved using a CRISPR-Cas9 addressing system and then sequenced via nanopore technology. This approach was validated on a pool of 1.6 million DNA sequences, comprising 25 unique data files. We then developed a molecular similarity-search approach combining machine learning with Cas9-based retrieval. Using a deep neural network, we mapped a database of 1.74 million images into a reduced-dimensional embedding, encoding each embedding as a Cas9 target sequence. These target sequences act as molecular addresses, capturing clusters of semantically related images. By leveraging Cas9’s off-target cleavage activity, query sequences cleave both exact and closely related targets, enabling high-fidelity retrieval of molecular addresses corresponding to in silico image clusters similar to the query. These approaches move towards addressing key challenges in molecular data retrieval by offering simplified, rapid isothermal protocols and new DNA data access capabilities.https://doi.org/10.1038/s41467-025-61264-5
spellingShingle Carina Imburgia
Lee Organick
Karen Zhang
Nicolas Cardozo
Jeff McBride
Callista Bee
Delaney Wilde
Gwendolin Roote
Sophia Jorgensen
David Ward
Charlie Anderson
Karin Strauss
Luis Ceze
Jeff Nivala
Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
Nature Communications
title Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
title_full Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
title_fullStr Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
title_full_unstemmed Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
title_short Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
title_sort random access and semantic search in dna data storage enabled by cas9 and machine guided design
url https://doi.org/10.1038/s41467-025-61264-5
work_keys_str_mv AT carinaimburgia randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT leeorganick randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT karenzhang randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT nicolascardozo randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT jeffmcbride randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT callistabee randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT delaneywilde randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT gwendolinroote randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT sophiajorgensen randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT davidward randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT charlieanderson randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT karinstrauss randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT luisceze randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign
AT jeffnivala randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign