Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks
The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Journal of Sensor and Actuator Networks |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2224-2708/14/2/41 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850144805786484736 |
|---|---|
| author | Rani Adam Daniel R. Catchpoole Simeon J. Simoff Zhonglin Qu Paul J. Kennedy Quang Vinh Nguyen |
| author_facet | Rani Adam Daniel R. Catchpoole Simeon J. Simoff Zhonglin Qu Paul J. Kennedy Quang Vinh Nguyen |
| author_sort | Rani Adam |
| collection | DOAJ |
| description | The growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks. |
| format | Article |
| id | doaj-art-897c5447381a43eda3ed0569aa080a08 |
| institution | OA Journals |
| issn | 2224-2708 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Journal of Sensor and Actuator Networks |
| spelling | doaj-art-897c5447381a43eda3ed0569aa080a082025-08-20T02:28:15ZengMDPI AGJournal of Sensor and Actuator Networks2224-27082025-04-011424110.3390/jsan14020041Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud FrameworksRani Adam0Daniel R. Catchpoole1Simeon J. Simoff2Zhonglin Qu3Paul J. Kennedy4Quang Vinh Nguyen5School of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaThe Tumour Bank, Children’s Cancer Research Unit, Kids Research, The Children’s Hospital at Westmead, Westmead, NSW 2145, AustraliaSchool of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaSchool of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaAustralian Artificial Intelligence Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW 2007, AustraliaSchool of Computer, Data and Mathematical Sciences, Western Sydney University, Parramatta, NSW 2170, AustraliaThe growing complexity and volume of genomic and omics data present critical challenges for storage, transfer, and analysis in edge–cloud platforms. Existing compression techniques often involve trade-offs between efficiency and speed, requiring innovative approaches that ensure scalability and cost-effectiveness. This paper introduces a lossless compression method that integrates Trie-based shared dictionaries within an edge–cloud architecture. It presents a software-centric scientific research process of the design and evaluation of the proposed compression method. By enabling localized preprocessing at the edge, our approach reduces data redundancy before cloud transmission, thereby optimizing both storage and network efficiency. A global shared dictionary is constructed using N-gram analysis to identify and prioritize repeated sequences across multiple files. A lightweight index derived from this dictionary is then pushed to edge nodes, where Trie-based sequence replacement is applied to eliminate redundancy locally. The preprocessed data are subsequently transmitted to the cloud, where advanced compression algorithms, such as Zstd, GZIP, Snappy, and LZ4, further compress them. Evaluation on real patient omics datasets from B-cell Acute Lymphoblastic Leukemia (B-ALL) and Chronic Lymphocytic Leukemia (CLL) demonstrates that edge preprocessing significantly improves compression ratios, reduces upload times, and enhances scalability in hybrid cloud frameworks.https://www.mdpi.com/2224-2708/14/2/41genomic data compressionN-gram analysisglobal shared dictionaryhealth data storage optimizationZstd compressionhealth informatics scalability |
| spellingShingle | Rani Adam Daniel R. Catchpoole Simeon J. Simoff Zhonglin Qu Paul J. Kennedy Quang Vinh Nguyen Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks Journal of Sensor and Actuator Networks genomic data compression N-gram analysis global shared dictionary health data storage optimization Zstd compression health informatics scalability |
| title | Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks |
| title_full | Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks |
| title_fullStr | Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks |
| title_full_unstemmed | Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks |
| title_short | Lossless Compression with Trie-Based Shared Dictionary for Omics Data in Edge–Cloud Frameworks |
| title_sort | lossless compression with trie based shared dictionary for omics data in edge cloud frameworks |
| topic | genomic data compression N-gram analysis global shared dictionary health data storage optimization Zstd compression health informatics scalability |
| url | https://www.mdpi.com/2224-2708/14/2/41 |
| work_keys_str_mv | AT raniadam losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks AT danielrcatchpoole losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks AT simeonjsimoff losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks AT zhonglinqu losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks AT pauljkennedy losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks AT quangvinhnguyen losslesscompressionwithtriebasedshareddictionaryforomicsdatainedgecloudframeworks |