scClassify: sample size estimation and multiscale classification of cells using single and multiple reference
Abstract Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer Nature
2020-06-01
|
| Series: | Molecular Systems Biology |
| Subjects: | |
| Online Access: | https://doi.org/10.15252/msb.20199389 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849235618529280000 |
|---|---|
| author | Yingxin Lin Yue Cao Hani Jieun Kim Agus Salim Terence P Speed David M Lin Pengyi Yang Jean Yee Hwa Yang |
| author_facet | Yingxin Lin Yue Cao Hani Jieun Kim Agus Salim Terence P Speed David M Lin Pengyi Yang Jean Yee Hwa Yang |
| author_sort | Yingxin Lin |
| collection | DOAJ |
| description | Abstract Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data. |
| format | Article |
| id | doaj-art-71435eb9e5d94b0db1e8fa3e180304ab |
| institution | Kabale University |
| issn | 1744-4292 |
| language | English |
| publishDate | 2020-06-01 |
| publisher | Springer Nature |
| record_format | Article |
| series | Molecular Systems Biology |
| spelling | doaj-art-71435eb9e5d94b0db1e8fa3e180304ab2025-08-20T04:02:44ZengSpringer NatureMolecular Systems Biology1744-42922020-06-0116611610.15252/msb.20199389scClassify: sample size estimation and multiscale classification of cells using single and multiple referenceYingxin Lin0Yue Cao1Hani Jieun Kim2Agus Salim3Terence P Speed4David M Lin5Pengyi Yang6Jean Yee Hwa Yang7School of Mathematics and Statistics, University of SydneySchool of Mathematics and Statistics, University of SydneySchool of Mathematics and Statistics, University of SydneyDepartment of Mathematics and Statistics, La Trobe UniversityBioinformatics Division, Walter and Eliza Hall Institute of Medical ResearchDepartment of Biomedical Sciences, Cornell UniversitySchool of Mathematics and Statistics, University of SydneySchool of Mathematics and Statistics, University of SydneyAbstract Automated cell type identification is a key computational challenge in single‐cell RNA‐sequencing (scRNA‐seq) data. To capitalise on the large collection of well‐annotated scRNA‐seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single‐cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state‐of‐the‐art methodology in automated cell type identification from scRNA‐seq data.https://doi.org/10.15252/msb.20199389cell type hierarchycell type identificationmultiscale classificationsample size estimationsingle‐cell |
| spellingShingle | Yingxin Lin Yue Cao Hani Jieun Kim Agus Salim Terence P Speed David M Lin Pengyi Yang Jean Yee Hwa Yang scClassify: sample size estimation and multiscale classification of cells using single and multiple reference Molecular Systems Biology cell type hierarchy cell type identification multiscale classification sample size estimation single‐cell |
| title | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
| title_full | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
| title_fullStr | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
| title_full_unstemmed | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
| title_short | scClassify: sample size estimation and multiscale classification of cells using single and multiple reference |
| title_sort | scclassify sample size estimation and multiscale classification of cells using single and multiple reference |
| topic | cell type hierarchy cell type identification multiscale classification sample size estimation single‐cell |
| url | https://doi.org/10.15252/msb.20199389 |
| work_keys_str_mv | AT yingxinlin scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT yuecao scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT hanijieunkim scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT agussalim scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT terencepspeed scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT davidmlin scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT pengyiyang scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference AT jeanyeehwayang scclassifysamplesizeestimationandmultiscaleclassificationofcellsusingsingleandmultiplereference |