Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design
Abstract DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low...
Saved in:
| Main Authors: | , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-61264-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849343147326308352 |
|---|---|
| author | Carina Imburgia Lee Organick Karen Zhang Nicolas Cardozo Jeff McBride Callista Bee Delaney Wilde Gwendolin Roote Sophia Jorgensen David Ward Charlie Anderson Karin Strauss Luis Ceze Jeff Nivala |
| author_facet | Carina Imburgia Lee Organick Karen Zhang Nicolas Cardozo Jeff McBride Callista Bee Delaney Wilde Gwendolin Roote Sophia Jorgensen David Ward Charlie Anderson Karin Strauss Luis Ceze Jeff Nivala |
| author_sort | Carina Imburgia |
| collection | DOAJ |
| description | Abstract DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low-latency molecular data extraction. We first present a one-pot, multiplexed random access method in which specific data files are selectively cleaved using a CRISPR-Cas9 addressing system and then sequenced via nanopore technology. This approach was validated on a pool of 1.6 million DNA sequences, comprising 25 unique data files. We then developed a molecular similarity-search approach combining machine learning with Cas9-based retrieval. Using a deep neural network, we mapped a database of 1.74 million images into a reduced-dimensional embedding, encoding each embedding as a Cas9 target sequence. These target sequences act as molecular addresses, capturing clusters of semantically related images. By leveraging Cas9’s off-target cleavage activity, query sequences cleave both exact and closely related targets, enabling high-fidelity retrieval of molecular addresses corresponding to in silico image clusters similar to the query. These approaches move towards addressing key challenges in molecular data retrieval by offering simplified, rapid isothermal protocols and new DNA data access capabilities. |
| format | Article |
| id | doaj-art-2bec9d4d3a01476da5bba0b77952c9fb |
| institution | Kabale University |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-2bec9d4d3a01476da5bba0b77952c9fb2025-08-20T03:43:10ZengNature PortfolioNature Communications2041-17232025-07-0116111110.1038/s41467-025-61264-5Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided designCarina Imburgia0Lee Organick1Karen Zhang2Nicolas Cardozo3Jeff McBride4Callista Bee5Delaney Wilde6Gwendolin Roote7Sophia Jorgensen8David Ward9Charlie Anderson10Karin Strauss11Luis Ceze12Jeff Nivala13University of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringMicrosoft ResearchUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringUniversity of Washington, Paul G. Allen School of Computer Science and EngineeringAbstract DNA is a promising medium for digital data storage due to its exceptional data density and longevity. Practical DNA-based storage systems require selective data retrieval to minimize decoding time and costs. In this work, we introduce CRISPR-Cas9 as a user-friendly tool for multiplexed, low-latency molecular data extraction. We first present a one-pot, multiplexed random access method in which specific data files are selectively cleaved using a CRISPR-Cas9 addressing system and then sequenced via nanopore technology. This approach was validated on a pool of 1.6 million DNA sequences, comprising 25 unique data files. We then developed a molecular similarity-search approach combining machine learning with Cas9-based retrieval. Using a deep neural network, we mapped a database of 1.74 million images into a reduced-dimensional embedding, encoding each embedding as a Cas9 target sequence. These target sequences act as molecular addresses, capturing clusters of semantically related images. By leveraging Cas9’s off-target cleavage activity, query sequences cleave both exact and closely related targets, enabling high-fidelity retrieval of molecular addresses corresponding to in silico image clusters similar to the query. These approaches move towards addressing key challenges in molecular data retrieval by offering simplified, rapid isothermal protocols and new DNA data access capabilities.https://doi.org/10.1038/s41467-025-61264-5 |
| spellingShingle | Carina Imburgia Lee Organick Karen Zhang Nicolas Cardozo Jeff McBride Callista Bee Delaney Wilde Gwendolin Roote Sophia Jorgensen David Ward Charlie Anderson Karin Strauss Luis Ceze Jeff Nivala Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design Nature Communications |
| title | Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design |
| title_full | Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design |
| title_fullStr | Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design |
| title_full_unstemmed | Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design |
| title_short | Random access and semantic search in DNA data storage enabled by Cas9 and machine-guided design |
| title_sort | random access and semantic search in dna data storage enabled by cas9 and machine guided design |
| url | https://doi.org/10.1038/s41467-025-61264-5 |
| work_keys_str_mv | AT carinaimburgia randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT leeorganick randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT karenzhang randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT nicolascardozo randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT jeffmcbride randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT callistabee randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT delaneywilde randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT gwendolinroote randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT sophiajorgensen randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT davidward randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT charlieanderson randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT karinstrauss randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT luisceze randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign AT jeffnivala randomaccessandsemanticsearchindnadatastorageenabledbycas9andmachineguideddesign |