SAVI Space—combinatorial encoding of the billion-size synthetically accessible virtual inventory

Abstract The Synthetically Accessible Virtual Inventory (SAVI) comprises a huge molecule collection. LHASA transform rules, originally intended for retro-synthetic analysis, were applied to Enamine Building Blocks in a forward synthetic manner. Adding new transforms, expressly developed for SAVI, re...

Full description

Saved in:
Bibliographic Details
Main Authors: Malte Korn, Philip Judson, Raphael Klein, Christian Lemmen, Marc C. Nicklaus, Matthias Rarey
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05384-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The Synthetically Accessible Virtual Inventory (SAVI) comprises a huge molecule collection. LHASA transform rules, originally intended for retro-synthetic analysis, were applied to Enamine Building Blocks in a forward synthetic manner. Adding new transforms, expressly developed for SAVI, resulted in SAVI-Lib-2020, a collection of more than a billion synthetically accessible compounds. Handling a billion molecules explicitly is computationally quite demanding for drug discovery applications. SAVI-Space-2024 was created to address this shortcoming. In this paper, we describe the design and implementation of SAVI-Space-2024. We emphasize its reaction-driven combinatorial data structure that encodes transformation rules as reaction SMARTS and applies them in a combinatorial manner. Based on Enamine Building Blocks, this approach yields 7.5 billion molecules while requiring only a fraction of the memory (1.4 GB compared to 210 GB). Furthermore, the improved search capabilities — including fast similarity and substructure searches and docking applications on standard hardware — represent a significant advance over the enumerated SAVI library.
ISSN:2052-4463