LSM tree read-deletion operations optimization through the implementation of cuckoo filters

Abstract This study aims to develop an LSM tree that uses two different kinds of probabilistic data structures (PDS). These two data structures are the Bloom Filters and the state-of-the-art Cuckoo Filter released in 2014 by Fan. Cuckoo filters are a perfect choice for saving space and also for dele...

Full description

Saved in:
Bibliographic Details
Main Authors: Humberto Cesar Villalta Valverde, KwangSik Kim, Kisu Kim, Jinman Kwon, Jaechoon Lim, Yongjoo Jun
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01097-7
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This study aims to develop an LSM tree that uses two different kinds of probabilistic data structures (PDS). These two data structures are the Bloom Filters and the state-of-the-art Cuckoo Filter released in 2014 by Fan. Cuckoo filters are a perfect choice for saving space and also for deleting keys that are not needed in the algorithm so the filters are synchronized with the keys of all the SSTables. The implementation in this study of Cuckoo filters showed three main advantages (1) Keys that are deleted in the SSTables are immediately reflected in the Cuckoo Filters (2) When SSTables are merged, Cuckoo Filters are also merged by measuring its load factor for availability (3) The performance of Cuckoo Filters compared to Bloom filters after deleting and query shows better performance despite that cuckoo filters are also being updated. This approach is expected to reduce storage space and the amount of time for queries and deletions on the overall data structure. This new optimized LSM tree is expected to be implemented in HBase as future work to check its performance with a real-world database for more empirical results.
ISSN:2196-1115