ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data

High-throughput sequencing advancements have shifted genomic project bottlenecks from data generation to computational storage and analysis. Single-cell RNA-seq (scRNA-seq) data exhibits unique structural features, including extensive labeled sequence identifiers, which conventional compression tool...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang Yuanxin, Lu Yang, Sun Xiao, Fan Jue
Format: Article
Language:English
Published: EDP Sciences 2025-01-01
Series:BIO Web of Conferences
Online Access:https://www.bio-conferences.org/articles/bioconf/pdf/2025/25/bioconf_icbb2025_03016.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850267912264220672
author Zhang Yuanxin
Lu Yang
Sun Xiao
Fan Jue
author_facet Zhang Yuanxin
Lu Yang
Sun Xiao
Fan Jue
author_sort Zhang Yuanxin
collection DOAJ
description High-throughput sequencing advancements have shifted genomic project bottlenecks from data generation to computational storage and analysis. Single-cell RNA-seq (scRNA-seq) data exhibits unique structural features, including extensive labeled sequence identifiers, which conventional compression tools fail to optimize. This study proposes ScBlkCom, a specialized compression scheme for scRNA-seq data. The method partitions sequencing data into distinct blocks and applies tailored compression strategies: differential encoding for numerical attributes, Huffman coding for categorical labels, and context-adaptive encoding for sequence identifiers. Experiments demonstrate ScBlkCom achieves 84.29% higher compression gain compared to single-module approaches and outperforms generic tools (e.g., GZIP, BZIP2) by 6.44% in compression ratio, while maintaining stable processing speeds. This block-wise adaptive framework effectively addresses scRNA-seq data redundancy, offering enhanced storage efficiency for large-scale single-cell studies.
format Article
id doaj-art-568aef49223f4860b0be845a94a7d5ff
institution OA Journals
issn 2117-4458
language English
publishDate 2025-01-01
publisher EDP Sciences
record_format Article
series BIO Web of Conferences
spelling doaj-art-568aef49223f4860b0be845a94a7d5ff2025-08-20T01:53:36ZengEDP SciencesBIO Web of Conferences2117-44582025-01-011740301610.1051/bioconf/202517403016bioconf_icbb2025_03016ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing DataZhang Yuanxin0Lu Yang1Sun Xiao2Fan Jue3School of Biological Science and Medical Engineering, Southeast UniversityXinge Yuan Biotechnology Co., Ltd.School of Biological Science and Medical Engineering, Southeast UniversityXinge Yuan Biotechnology Co., Ltd.High-throughput sequencing advancements have shifted genomic project bottlenecks from data generation to computational storage and analysis. Single-cell RNA-seq (scRNA-seq) data exhibits unique structural features, including extensive labeled sequence identifiers, which conventional compression tools fail to optimize. This study proposes ScBlkCom, a specialized compression scheme for scRNA-seq data. The method partitions sequencing data into distinct blocks and applies tailored compression strategies: differential encoding for numerical attributes, Huffman coding for categorical labels, and context-adaptive encoding for sequence identifiers. Experiments demonstrate ScBlkCom achieves 84.29% higher compression gain compared to single-module approaches and outperforms generic tools (e.g., GZIP, BZIP2) by 6.44% in compression ratio, while maintaining stable processing speeds. This block-wise adaptive framework effectively addresses scRNA-seq data redundancy, offering enhanced storage efficiency for large-scale single-cell studies.https://www.bio-conferences.org/articles/bioconf/pdf/2025/25/bioconf_icbb2025_03016.pdf
spellingShingle Zhang Yuanxin
Lu Yang
Sun Xiao
Fan Jue
ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
BIO Web of Conferences
title ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
title_full ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
title_fullStr ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
title_full_unstemmed ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
title_short ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
title_sort scblkcom an integrated compression algorithm for single cell rna sequencing data
url https://www.bio-conferences.org/articles/bioconf/pdf/2025/25/bioconf_icbb2025_03016.pdf
work_keys_str_mv AT zhangyuanxin scblkcomanintegratedcompressionalgorithmforsinglecellrnasequencingdata
AT luyang scblkcomanintegratedcompressionalgorithmforsinglecellrnasequencingdata
AT sunxiao scblkcomanintegratedcompressionalgorithmforsinglecellrnasequencingdata
AT fanjue scblkcomanintegratedcompressionalgorithmforsinglecellrnasequencingdata