G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data

Single-cell RNA sequencing (scRNA-seq) technology measures the expression of thousands of genes at the cellular level. Analyzing single-cell transcriptome allows the identification of heterogeneous cell groups, cellular-level regulations, and the trajectory of cell development. An important aspect i...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuqing He, Jicong Fan, Tianwei Yu
Format: Article
Language:English
Published: Tsinghua University Press 2024-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020011
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832543052374212608
author Shuqing He
Jicong Fan
Tianwei Yu
author_facet Shuqing He
Jicong Fan
Tianwei Yu
author_sort Shuqing He
collection DOAJ
description Single-cell RNA sequencing (scRNA-seq) technology measures the expression of thousands of genes at the cellular level. Analyzing single-cell transcriptome allows the identification of heterogeneous cell groups, cellular-level regulations, and the trajectory of cell development. An important aspect in the analyses of scRNA-seq data is the clustering of cells, which is hampered by issues, such as high dimensionality, cell type imbalance, redundancy, and dropout. Given cells of each type are functionally consistent, incorporating biological relations among genes may improve the clustering results. In light of this, we have developed a deep-embedded clustering method, G3DC. This method combines a graph regularization based on the pre-existing gene network and a feature selector based on the ℓ2,1-norm regularization, along with a reconstruction loss, to generate a discriminatory and informative embedding. Utilizing the gene interaction network bolsters the clustering performance and aids in selecting functionally coherent genes, consequently enriching the clustering results. Extensive experiments have shown that G3DC offers high clustering accuracy with regard to agreement with true cell types, outperforming other leading single-cell clustering methods. In addition, G3DC selects biologically relevant genes that contribute to the clustering, providing insight into biological functionality that differentiates cell groups.
format Article
id doaj-art-ec860bb65c1541548b9e2b667e537c06
institution Kabale University
issn 2096-0654
language English
publishDate 2024-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-ec860bb65c1541548b9e2b667e537c062025-02-03T11:53:25ZengTsinghua University PressBig Data Mining and Analytics2096-06542024-09-017380982710.26599/BDMA.2024.9020011G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq DataShuqing He0Jicong Fan1Tianwei Yu2Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USASchool of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China, and also with Shenzhen Research Institute of Big Data, Shenzhen 518172, ChinaSchool of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China, and also with Warshel Institute for Computational Biology, Shenzhen 518172, ChinaSingle-cell RNA sequencing (scRNA-seq) technology measures the expression of thousands of genes at the cellular level. Analyzing single-cell transcriptome allows the identification of heterogeneous cell groups, cellular-level regulations, and the trajectory of cell development. An important aspect in the analyses of scRNA-seq data is the clustering of cells, which is hampered by issues, such as high dimensionality, cell type imbalance, redundancy, and dropout. Given cells of each type are functionally consistent, incorporating biological relations among genes may improve the clustering results. In light of this, we have developed a deep-embedded clustering method, G3DC. This method combines a graph regularization based on the pre-existing gene network and a feature selector based on the ℓ2,1-norm regularization, along with a reconstruction loss, to generate a discriminatory and informative embedding. Utilizing the gene interaction network bolsters the clustering performance and aids in selecting functionally coherent genes, consequently enriching the clustering results. Extensive experiments have shown that G3DC offers high clustering accuracy with regard to agreement with true cell types, outperforming other leading single-cell clustering methods. In addition, G3DC selects biologically relevant genes that contribute to the clustering, providing insight into biological functionality that differentiates cell groups.https://www.sciopen.com/article/10.26599/BDMA.2024.9020011gene graphsfeature selectiondeep learning
spellingShingle Shuqing He
Jicong Fan
Tianwei Yu
G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
Big Data Mining and Analytics
gene graphs
feature selection
deep learning
title G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
title_full G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
title_fullStr G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
title_full_unstemmed G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
title_short G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data
title_sort g3dc a gene graph guided selective deep clustering method for single cell rna seq data
topic gene graphs
feature selection
deep learning
url https://www.sciopen.com/article/10.26599/BDMA.2024.9020011
work_keys_str_mv AT shuqinghe g3dcagenegraphguidedselectivedeepclusteringmethodforsinglecellrnaseqdata
AT jicongfan g3dcagenegraphguidedselectivedeepclusteringmethodforsinglecellrnaseqdata
AT tianweiyu g3dcagenegraphguidedselectivedeepclusteringmethodforsinglecellrnaseqdata