Cell-type annotation with accurate unseen cell-type identification using multiple references.

The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becom...

Full description

Saved in:
Bibliographic Details
Main Authors: Yi-Xuan Xiong, Meng-Guo Wang, Luonan Chen, Xiao-Fei Zhang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-06-01
Series:PLoS Computational Biology
Online Access:https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011261&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849248223143657472
author Yi-Xuan Xiong
Meng-Guo Wang
Luonan Chen
Xiao-Fei Zhang
author_facet Yi-Xuan Xiong
Meng-Guo Wang
Luonan Chen
Xiao-Fei Zhang
author_sort Yi-Xuan Xiong
collection DOAJ
description The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.
format Article
id doaj-art-eca29efb6bd246be810c85f9d22ef756
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2023-06-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-eca29efb6bd246be810c85f9d22ef7562025-08-20T03:57:59ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582023-06-01196e101126110.1371/journal.pcbi.1011261Cell-type annotation with accurate unseen cell-type identification using multiple references.Yi-Xuan XiongMeng-Guo WangLuonan ChenXiao-Fei ZhangThe recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011261&type=printable
spellingShingle Yi-Xuan Xiong
Meng-Guo Wang
Luonan Chen
Xiao-Fei Zhang
Cell-type annotation with accurate unseen cell-type identification using multiple references.
PLoS Computational Biology
title Cell-type annotation with accurate unseen cell-type identification using multiple references.
title_full Cell-type annotation with accurate unseen cell-type identification using multiple references.
title_fullStr Cell-type annotation with accurate unseen cell-type identification using multiple references.
title_full_unstemmed Cell-type annotation with accurate unseen cell-type identification using multiple references.
title_short Cell-type annotation with accurate unseen cell-type identification using multiple references.
title_sort cell type annotation with accurate unseen cell type identification using multiple references
url https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1011261&type=printable
work_keys_str_mv AT yixuanxiong celltypeannotationwithaccurateunseencelltypeidentificationusingmultiplereferences
AT mengguowang celltypeannotationwithaccurateunseencelltypeidentificationusingmultiplereferences
AT luonanchen celltypeannotationwithaccurateunseencelltypeidentificationusingmultiplereferences
AT xiaofeizhang celltypeannotationwithaccurateunseencelltypeidentificationusingmultiplereferences