The impact of dropouts in scRNAseq dense neighborhood analysis

Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and...

Full description

Saved in:
Bibliographic Details
Main Authors: Alisa Pavel, Manja Gersholm Grønberg, Line H. Clemmensen
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037025001023
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849338307768483840
author Alisa Pavel
Manja Gersholm Grønberg
Line H. Clemmensen
author_facet Alisa Pavel
Manja Gersholm Grønberg
Line H. Clemmensen
author_sort Alisa Pavel
collection DOAJ
description Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods.We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close.Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.
format Article
id doaj-art-b45fb1504fa74534872cf355dfec6b0f
institution Kabale University
issn 2001-0370
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj-art-b45fb1504fa74534872cf355dfec6b0f2025-08-20T03:44:27ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-01271278128510.1016/j.csbj.2025.03.033The impact of dropouts in scRNAseq dense neighborhood analysisAlisa Pavel0Manja Gersholm Grønberg1Line H. Clemmensen2Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, DenmarkDepartment of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, DenmarkDepartment of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark; Department of Mathematical Sciences, University of Copenhagen, 2100, Copenhagen, Denmark; Corresponding author at: Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800, Kongens Lyngby, Denmark.Single cell RNA sequencing (scRNAseq) provides the possibility to investigate transcriptomic profiles on a single cell level. However, the data show unique challenges in comparison to bulk transcriptomic data, one being high dropout rates, which yields high sparsity data. Many classical analysis and preprocessing pipelines are based on the assumption that poor data can be counteracted by quantity and that similar cells (samples) are close to each other in space. Clustering is commonly used to detect clusters (dense local cell neighborhoods) under the assumption that similar cells are close to each other in space (where close is dependent on the (distance) metric used). The most commonly used clustering methodologies to detect dense local neighborhoods are based on graph clustering on a nearest neighbor graph. However, high dropout rates may break this assumption and make it difficult to reliably detect such dense local neighborhoods.We assess the cluster homogeneity and stability under increasing degrees of dropouts in one of the most popular clustering pipelines (dimensionality reduction + graph based clustering), as provided by scRNAseq analyses packages Seurat and Scanpy. Our study showcases that while the default pipeline performs well in terms of cluster homogeneity (i.e., cells in a cluster are of the same type), also with increasing dropout rates, the stability of clusters (i.e., cell pairs consistently being in the same cluster) decreases. This implies that sub-populations within cell types are increasingly difficult to identify under increasing dropout rates because observations are not consistently close.Our results challenge the current practice of using default clustering pipelines and the general assumption of identifiable local neighborhoods on high dropout data. Hence, these results suggest that careful consideration in interpretation and downstream analysis need to be made when relying on local neighborhoods and clusters on scRNAseq data. In addition, these results call for extensive benchmarking, to identify and provide methods robust in their local neighborhood relationships on data containing low to high dropout rates.http://www.sciencedirect.com/science/article/pii/S2001037025001023scRNAseqDropoutsClusteringSparsity
spellingShingle Alisa Pavel
Manja Gersholm Grønberg
Line H. Clemmensen
The impact of dropouts in scRNAseq dense neighborhood analysis
Computational and Structural Biotechnology Journal
scRNAseq
Dropouts
Clustering
Sparsity
title The impact of dropouts in scRNAseq dense neighborhood analysis
title_full The impact of dropouts in scRNAseq dense neighborhood analysis
title_fullStr The impact of dropouts in scRNAseq dense neighborhood analysis
title_full_unstemmed The impact of dropouts in scRNAseq dense neighborhood analysis
title_short The impact of dropouts in scRNAseq dense neighborhood analysis
title_sort impact of dropouts in scrnaseq dense neighborhood analysis
topic scRNAseq
Dropouts
Clustering
Sparsity
url http://www.sciencedirect.com/science/article/pii/S2001037025001023
work_keys_str_mv AT alisapavel theimpactofdropoutsinscrnaseqdenseneighborhoodanalysis
AT manjagersholmgrønberg theimpactofdropoutsinscrnaseqdenseneighborhoodanalysis
AT linehclemmensen theimpactofdropoutsinscrnaseqdenseneighborhoodanalysis
AT alisapavel impactofdropoutsinscrnaseqdenseneighborhoodanalysis
AT manjagersholmgrønberg impactofdropoutsinscrnaseqdenseneighborhoodanalysis
AT linehclemmensen impactofdropoutsinscrnaseqdenseneighborhoodanalysis