scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation

Abstract Clustering analysis is a fundamental step in scRNA-seq data analysis. However, its reliability is compromised by clustering inconsistency among trials due to stochastic processes in clustering algorithms. Despite efforts to obtain reliable and consensus clustering, existing methods cannot b...

Full description

Saved in:
Bibliographic Details
Main Authors: Hyun Kim, Issac Park, Jong-Eun Park, Jong Kyoung Kim, Minseok Seo, Jae Kyoung Kim
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-60702-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849768749049053184
author Hyun Kim
Issac Park
Jong-Eun Park
Jong Kyoung Kim
Minseok Seo
Jae Kyoung Kim
author_facet Hyun Kim
Issac Park
Jong-Eun Park
Jong Kyoung Kim
Minseok Seo
Jae Kyoung Kim
author_sort Hyun Kim
collection DOAJ
description Abstract Clustering analysis is a fundamental step in scRNA-seq data analysis. However, its reliability is compromised by clustering inconsistency among trials due to stochastic processes in clustering algorithms. Despite efforts to obtain reliable and consensus clustering, existing methods cannot be applied to large scRNA-seq datasets due to high computational costs. Here, we develop the single-cell Inconsistency Clustering Estimator (scICE) to evaluate clustering consistency and provide consistent clustering results, achieving up to a 30-fold improvement in speed compared to conventional consensus clustering-based methods, such as multiK and chooseR. Application of scICE to 48 real and simulated scRNA-seq datasets, some with over 10,000 cells, successfully identifies all consistent clustering results, substantially narrowing the number of clusters to explore. By enabling the focus on a narrower set of more reliable candidate clusters, users can greatly reduce computational burden while generating more robust results.
format Article
id doaj-art-9eca166306434d4d85ab330fc9bf18d7
institution DOAJ
issn 2041-1723
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-9eca166306434d4d85ab330fc9bf18d72025-08-20T03:03:41ZengNature PortfolioNature Communications2041-17232025-07-0116111210.1038/s41467-025-60702-8scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluationHyun Kim0Issac Park1Jong-Eun Park2Jong Kyoung Kim3Minseok Seo4Jae Kyoung Kim5Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic ScienceDepartment of Mathematics, Pusan National UniversityGraduate School of Medical Science and Engineering, KAISTDepartment of Life Sciences, Pohang University of Science and Technology (POSTECH)Department of Computer and Information Science, Korea UniversityBiomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic ScienceAbstract Clustering analysis is a fundamental step in scRNA-seq data analysis. However, its reliability is compromised by clustering inconsistency among trials due to stochastic processes in clustering algorithms. Despite efforts to obtain reliable and consensus clustering, existing methods cannot be applied to large scRNA-seq datasets due to high computational costs. Here, we develop the single-cell Inconsistency Clustering Estimator (scICE) to evaluate clustering consistency and provide consistent clustering results, achieving up to a 30-fold improvement in speed compared to conventional consensus clustering-based methods, such as multiK and chooseR. Application of scICE to 48 real and simulated scRNA-seq datasets, some with over 10,000 cells, successfully identifies all consistent clustering results, substantially narrowing the number of clusters to explore. By enabling the focus on a narrower set of more reliable candidate clusters, users can greatly reduce computational burden while generating more robust results.https://doi.org/10.1038/s41467-025-60702-8
spellingShingle Hyun Kim
Issac Park
Jong-Eun Park
Jong Kyoung Kim
Minseok Seo
Jae Kyoung Kim
scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation
Nature Communications
title scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation
title_full scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation
title_fullStr scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation
title_full_unstemmed scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation
title_short scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation
title_sort scice enhancing clustering reliability and efficiency of scrna seq data with multi cluster label consistency evaluation
url https://doi.org/10.1038/s41467-025-60702-8
work_keys_str_mv AT hyunkim sciceenhancingclusteringreliabilityandefficiencyofscrnaseqdatawithmulticlusterlabelconsistencyevaluation
AT issacpark sciceenhancingclusteringreliabilityandefficiencyofscrnaseqdatawithmulticlusterlabelconsistencyevaluation
AT jongeunpark sciceenhancingclusteringreliabilityandefficiencyofscrnaseqdatawithmulticlusterlabelconsistencyevaluation
AT jongkyoungkim sciceenhancingclusteringreliabilityandefficiencyofscrnaseqdatawithmulticlusterlabelconsistencyevaluation
AT minseokseo sciceenhancingclusteringreliabilityandefficiencyofscrnaseqdatawithmulticlusterlabelconsistencyevaluation
AT jaekyoungkim sciceenhancingclusteringreliabilityandefficiencyofscrnaseqdatawithmulticlusterlabelconsistencyevaluation