scClassify: sample size estimation and multiscale classification of cells using single and multiple reference

Abstract Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell...

Full description

Saved in:
Bibliographic Details
Main Authors: Yingxin Lin, Yue Cao, Hani Jieun Kim, Agus Salim, Terence P Speed, David M Lin, Pengyi Yang, Jean Yee Hwa Yang
Format: Article
Language:English
Published: Springer Nature 2020-06-01
Series:Molecular Systems Biology
Subjects:
Online Access:https://doi.org/10.15252/msb.20199389
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235618529280000
author Yingxin Lin
Yue Cao
Hani Jieun Kim
Agus Salim
Terence P Speed
David M Lin
Pengyi Yang
Jean Yee Hwa Yang
author_facet Yingxin Lin
Yue Cao
Hani Jieun Kim
Agus Salim
Terence P Speed
David M Lin
Pengyi Yang
Jean Yee Hwa Yang
author_sort Yingxin Lin
collection DOAJ
description Abstract Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data.
format Article
id doaj-art-71435eb9e5d94b0db1e8fa3e180304ab
institution Kabale University
issn 1744-4292
language English
publishDate 2020-06-01
publisher Springer Nature
record_format Article
series Molecular Systems Biology
spelling doaj-art-71435eb9e5d94b0db1e8fa3e180304ab2025-08-20T04:02:44ZengSpringer NatureMolecular Systems Biology1744-42922020-06-0116611610.15252/msb.20199389scClassify: sample size estimation and multiscale classification of cells using single and multiple referenceYingxin Lin0Yue Cao1Hani Jieun Kim2Agus Salim3Terence P Speed4David M Lin5Pengyi Yang6Jean Yee Hwa Yang7School of Mathematics and Statistics, University of SydneySchool of Mathematics and Statistics, University of SydneySchool of Mathematics and Statistics, University of SydneyDepartment of Mathematics and Statistics, La Trobe UniversityBioinformatics Division, Walter and Eliza Hall Institute of Medical ResearchDepartment of Biomedical Sciences, Cornell UniversitySchool of Mathematics and Statistics, University of SydneySchool of Mathematics and Statistics, University of SydneyAbstract Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data.https://doi.org/10.15252/msb.20199389cell type hierarchycell type identificationmultiscale classificationsample size estimationsingle‐cell
spellingShingle Yingxin Lin
Yue Cao
Hani Jieun Kim
Agus Salim
Terence P Speed
David M Lin
Pengyi Yang
Jean Yee Hwa Yang
scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
Molecular Systems Biology
cell type hierarchy
cell type identification
multiscale classification
sample size estimation
single‐cell
title scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_full scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_fullStr scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_full_unstemmed scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_short scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
title_sort scclassify sample size estimation and multiscale classification of cells using single and multiple reference
topic cell type hierarchy
cell type identification
multiscale classification
sample size estimation
single‐cell
url https://doi.org/10.15252/msb.20199389
work_keys_str_mv AT yingxinlin scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT yuecao scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT hanijieunkim scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT agussalim scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT terencepspeed scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT davidmlin scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT pengyiyang scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference
AT jeanyeehwayang scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference