scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation

Abstract Clustering analysis is a fundamental step in scRNA-seq data analysis. However, its reliability is compromised by clustering inconsistency among trials due to stochastic processes in clustering algorithms. Despite efforts to obtain reliable and consensus clustering, existing methods cannot b...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyun Kim, Issac Park, Jong-Eun Park, Jong Kyoung Kim, Minseok Seo, Jae Kyoung Kim
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-60702-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Clustering analysis is a fundamental step in scRNA-seq data analysis. However, its reliability is compromised by clustering inconsistency among trials due to stochastic processes in clustering algorithms. Despite efforts to obtain reliable and consensus clustering, existing methods cannot be applied to large scRNA-seq datasets due to high computational costs. Here, we develop the single-cell Inconsistency Clustering Estimator (scICE) to evaluate clustering consistency and provide consistent clustering results, achieving up to a 30-fold improvement in speed compared to conventional consensus clustering-based methods, such as multiK and chooseR. Application of scICE to 48 real and simulated scRNA-seq datasets, some with over 10,000 cells, successfully identifies all consistent clustering results, substantially narrowing the number of clusters to explore. By enabling the focus on a narrower set of more reliable candidate clusters, users can greatly reduce computational burden while generating more robust results.
ISSN:2041-1723