SAVI Space—combinatorial encoding of the billion-size synthetically accessible virtual inventory
Abstract The Synthetically Accessible Virtual Inventory (SAVI) comprises a huge molecule collection. LHASA transform rules, originally intended for retro-synthetic analysis, were applied to Enamine Building Blocks in a forward synthetic manner. Adding new transforms, expressly developed for SAVI, re...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-06-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-05384-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract The Synthetically Accessible Virtual Inventory (SAVI) comprises a huge molecule collection. LHASA transform rules, originally intended for retro-synthetic analysis, were applied to Enamine Building Blocks in a forward synthetic manner. Adding new transforms, expressly developed for SAVI, resulted in SAVI-Lib-2020, a collection of more than a billion synthetically accessible compounds. Handling a billion molecules explicitly is computationally quite demanding for drug discovery applications. SAVI-Space-2024 was created to address this shortcoming. In this paper, we describe the design and implementation of SAVI-Space-2024. We emphasize its reaction-driven combinatorial data structure that encodes transformation rules as reaction SMARTS and applies them in a combinatorial manner. Based on Enamine Building Blocks, this approach yields 7.5 billion molecules while requiring only a fraction of the memory (1.4 GB compared to 210 GB). Furthermore, the improved search capabilities — including fast similarity and substructure searches and docking applications on standard hardware — represent a significant advance over the enumerated SAVI library. |
|---|---|
| ISSN: | 2052-4463 |