ScBlkCom: An Integrated Compression Algorithm for Single-Cell RNA Sequencing Data
High-throughput sequencing advancements have shifted genomic project bottlenecks from data generation to computational storage and analysis. Single-cell RNA-seq (scRNA-seq) data exhibits unique structural features, including extensive labeled sequence identifiers, which conventional compression tool...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
EDP Sciences
2025-01-01
|
| Series: | BIO Web of Conferences |
| Online Access: | https://www.bio-conferences.org/articles/bioconf/pdf/2025/25/bioconf_icbb2025_03016.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | High-throughput sequencing advancements have shifted genomic project bottlenecks from data generation to computational storage and analysis. Single-cell RNA-seq (scRNA-seq) data exhibits unique structural features, including extensive labeled sequence identifiers, which conventional compression tools fail to optimize. This study proposes ScBlkCom, a specialized compression scheme for scRNA-seq data. The method partitions sequencing data into distinct blocks and applies tailored compression strategies: differential encoding for numerical attributes, Huffman coding for categorical labels, and context-adaptive encoding for sequence identifiers. Experiments demonstrate ScBlkCom achieves 84.29% higher compression gain compared to single-module approaches and outperforms generic tools (e.g., GZIP, BZIP2) by 6.44% in compression ratio, while maintaining stable processing speeds. This block-wise adaptive framework effectively addresses scRNA-seq data redundancy, offering enhanced storage efficiency for large-scale single-cell studies. |
|---|---|
| ISSN: | 2117-4458 |