A robust multi-scale clustering framework for single-cell RNA-seq data analysis
Abstract Recent advancements in single-cell RNA sequencing (scRNA-seq) technology have unlocked novel opportunities for deep exploration of gene expression patterns. However, the inherent high dimensionality, sparsity, and noise in scRNA-seq data pose significant challenges for existing clustering m...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-03603-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Recent advancements in single-cell RNA sequencing (scRNA-seq) technology have unlocked novel opportunities for deep exploration of gene expression patterns. However, the inherent high dimensionality, sparsity, and noise in scRNA-seq data pose significant challenges for existing clustering methods, especially in accurately identifying and classifying diverse cell types. To address these challenges, we introduce a new method, single-cell Multi-Scale Clustering Framework (scMSCF), which combines multi-dimensional PCA for dimensionality reduction, K-means clustering, and a weighted ensemble meta-clustering approach, enhanced by a self-attention-driven Transformer model to optimize clustering performance. scMSCF constructs an initial clustering framework using a multi-layer dimensionality reduction strategy to establish a robust consensus on clustering structure. A voting mechanism within the meta-clustering process selects high-confidence cells from the initial clustering results to provide precise training labels for the Transformer model. This approach enables the model to capture complex dependencies in gene expression data, thereby enhancing clustering accuracy. Comprehensive testing across eight single-cell RNA sequencing datasets demonstrates that scMSCF surpasses existing methods, achieving on average 10-15% higher ARI, NMI, and ACC scores. For example, on the PBMC5k dataset, scMSCF improves ARI from 0.72 to 0.86, demonstrating its ability to accurately identify diverse cell populations. The source code for our algorithm is publicly available at https://github.com/DEREKJ24/scMSCF . |
|---|---|
| ISSN: | 2045-2322 |