A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data

Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including sup...

Full description

Saved in:
Bibliographic Details
Main Authors: Shahid Ahmad Wani, SMK Quadri, Mohammad Shuaib Mir, Yonis Gulzar
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/4/232
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including support vector machine (SVM), decision tree, random forest, logistic regression, gradient boosting, k-nearest neighbour, transformer, and naive Bayes, to determine their effectiveness for single-cell annotation. These methods were evaluated using four diverse datasets comprising hundreds of cell types across several tissues. Our results revealed that SVM consistently outperformed other techniques, emerging as the top performer in three out of the four datasets, followed closely by logistic regression. Most methods demonstrated robust capabilities in annotating major cell types and identifying rare cell populations, though naive Bayes was the least effective due to its inherent limitations in handling high-dimensional and interdependent data. This study provides valuable insights into the relative strengths and weaknesses of machine learning methods for single-cell annotation, offering guidance for selecting appropriate techniques in scRNA-seq analyses.
ISSN:1999-4893