The impact of dropouts in scRNAseq dense neighborhood analysis

Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and...

Full description

Saved in:

Bibliographic Details
Main Authors:	Alisa Pavel, Manja Gersholm Grønberg, Line H. Clemmensen
Format:	Article
Language:	English
Published:	Elsevier 2025-01-01
Series:	Computational and Structural Biotechnology Journal
Subjects:	scRNAseq Dropouts Clustering Sparsity
Online Access:	http://www.sciencedirect.com/science/article/pii/S2001037025001023
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849338307768483840
author	Alisa Pavel Manja Gersholm Grønberg Line H. Clemmensen
author_facet	Alisa Pavel Manja Gersholm Grønberg Line H. Clemmensen
author_sort	Alisa Pavel
collection	DOAJ
description	Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods.We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close.Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.
format	Article
id	doaj-art-b45fb1504fa74534872cf355dfec6b0f
institution	Kabale University
issn	2001-0370
language	English
publishDate	2025-01-01
publisher	Elsevier
record_format	Article
series	Computational and Structural Biotechnology Journal
spelling	doaj-art-b45fb1504fa74534872cf355dfec6b0f2025-08-20T03:44:27ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-01271278128510.1016/j.csbj.2025.03.033The impact of dropouts in scRNAseq dense neighborhood analysisAlisa Pavel0Manja Gersholm Grønberg1Line H. Clemmensen2Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, DenmarkDepartment of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, DenmarkDepartment of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark; Department of Mathematical Sciences, University of Copenhagen, 2100, Copenhagen, Denmark; Corresponding author at: Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark.Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods.We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close.Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.http://www.sciencedirect.com/science/article/pii/S2001037025001023scRNAseqDropoutsClusteringSparsity
spellingShingle	Alisa Pavel Manja Gersholm Grønberg Line H. Clemmensen The impact of dropouts in scRNAseq dense neighborhood analysis Computational and Structural Biotechnology Journal scRNAseq Dropouts Clustering Sparsity
title	The impact of dropouts in scRNAseq dense neighborhood analysis
title_full	The impact of dropouts in scRNAseq dense neighborhood analysis
title_fullStr	The impact of dropouts in scRNAseq dense neighborhood analysis
title_full_unstemmed	The impact of dropouts in scRNAseq dense neighborhood analysis
title_short	The impact of dropouts in scRNAseq dense neighborhood analysis
title_sort	impact of dropouts in scrnaseq dense neighborhood analysis
topic	scRNAseq Dropouts Clustering Sparsity
url	http://www.sciencedirect.com/science/article/pii/S2001037025001023
work_keys_str_mv	AT alisapavel theimpactofdropoutsinscrnaseqdenseneighborhoodanalysis AT manjagersholmgrønberg theimpactofdropoutsinscrnaseqdenseneighborhoodanalysis AT linehclemmensen theimpactofdropoutsinscrnaseqdenseneighborhoodanalysis AT alisapavel impactofdropoutsinscrnaseqdenseneighborhoodanalysis AT manjagersholmgrønberg impactofdropoutsinscrnaseqdenseneighborhoodanalysis AT linehclemmensen impactofdropoutsinscrnaseqdenseneighborhoodanalysis

The impact of dropouts in scRNAseq dense neighborhood analysis

Similar Items