Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks

The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost...

Full description

Saved in:
Bibliographic Details
Main Authors: Rani Adam, Daniel R. Catchpoole, Simeon J. Simoff, Zhonglin Qu, Paul J. Kennedy, Quang Vinh Nguyen
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Journal of Sensor and Actuator Networks
Subjects:
Online Access:https://www.mdpi.com/2224-2708/14/2/41
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks.
ISSN:2224-2708