Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks

The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost...

Full description

Saved in:
Bibliographic Details
Main Authors: Rani Adam, Daniel R. Catchpoole, Simeon J. Simoff, Zhonglin Qu, Paul J. Kennedy, Quang Vinh Nguyen
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Journal of Sensor and Actuator Networks
Subjects:
Online Access:https://www.mdpi.com/2224-2708/14/2/41
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850144805786484736
author Rani Adam
Daniel R. Catchpoole
Simeon J. Simoff
Zhonglin Qu
Paul J. Kennedy
Quang Vinh Nguyen
author_facet Rani Adam
Daniel R. Catchpoole
Simeon J. Simoff
Zhonglin Qu
Paul J. Kennedy
Quang Vinh Nguyen
author_sort Rani Adam
collection DOAJ
description The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks.
format Article
id doaj-art-897c5447381a43eda3ed0569aa080a08
institution OA Journals
issn 2224-2708
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Journal of Sensor and Actuator Networks
spelling doaj-art-897c5447381a43eda3ed0569aa080a082025-08-20T02:28:15ZengMDPI AGJournal of Sensor and Actuator Networks2224-27082025-04-011424110.3390/jsan14020041Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud FrameworksRani Adam0Daniel R. Catchpoole1Simeon J. Simoff2Zhonglin Qu3Paul J. Kennedy4Quang Vinh Nguyen5School of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaThe Tumour Bank, Children’s Cancer Research Unit, Kids Research, The Children’s Hospital at Westmead, Westmead, NSW 2145, AustraliaSchool of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaSchool of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaAustralian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, AustraliaSchool of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaThe growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks.https://www.mdpi.com/2224-2708/14/2/41genomic data compressionN-gram analysisglobal shared dictionaryhealth data storage optimizationZstd compressionhealth informatics scalability
spellingShingle Rani Adam
Daniel R. Catchpoole
Simeon J. Simoff
Zhonglin Qu
Paul J. Kennedy
Quang Vinh Nguyen
Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
Journal of Sensor and Actuator Networks
genomic data compression
N-gram analysis
global shared dictionary
health data storage optimization
Zstd compression
health informatics scalability
title Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
title_full Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
title_fullStr Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
title_full_unstemmed Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
title_short Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
title_sort lossless compression with trie based shared dictionary for omics data in edge cloud frameworks
topic genomic data compression
N-gram analysis
global shared dictionary
health data storage optimization
Zstd compression
health informatics scalability
url https://www.mdpi.com/2224-2708/14/2/41
work_keys_str_mv AT raniadam losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks
AT danielrcatchpoole losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks
AT simeonjsimoff losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks
AT zhonglinqu losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks
AT pauljkennedy losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks
AT quangvinhnguyen losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks