A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data

Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including sup...

Full description

Saved in:
Bibliographic Details
Main Authors: Shahid Ahmad Wani, SMK Quadri, Mohammad Shuaib Mir, Yonis Gulzar
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/4/232
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850155946424139776
author Shahid Ahmad Wani
SMK Quadri
Mohammad Shuaib Mir
Yonis Gulzar
author_facet Shahid Ahmad Wani
SMK Quadri
Mohammad Shuaib Mir
Yonis Gulzar
author_sort Shahid Ahmad Wani
collection DOAJ
description Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including support vector machine (SVM), decision tree, random forest, logistic regression, gradient boosting, k-nearest neighbour, transformer, and naive Bayes, to determine their effectiveness for single-cell annotation. These methods were evaluated using four diverse datasets comprising hundreds of cell types across several tissues. Our results revealed that SVM consistently outperformed other techniques, emerging as the top performer in three out of the four datasets, followed closely by logistic regression. Most methods demonstrated robust capabilities in annotating major cell types and identifying rare cell populations, though naive Bayes was the least effective due to its inherent limitations in handling high-dimensional and interdependent data. This study provides valuable insights into the relative strengths and weaknesses of machine learning methods for single-cell annotation, offering guidance for selecting appropriate techniques in scRNA-seq analyses.
format Article
id doaj-art-5738ccb90d1b4bcfa4f313af03bf5859
institution OA Journals
issn 1999-4893
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj-art-5738ccb90d1b4bcfa4f313af03bf58592025-08-20T02:24:43ZengMDPI AGAlgorithms1999-48932025-04-0118423210.3390/a18040232A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq DataShahid Ahmad Wani0SMK Quadri1Mohammad Shuaib Mir2Yonis Gulzar3Department of Computer Science, Jamia Millia Islamia, New Delhi 110025, IndiaDepartment of Computer Science, Jamia Millia Islamia, New Delhi 110025, IndiaDepartment of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi ArabiaDepartment of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi ArabiaAccurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including support vector machine (SVM), decision tree, random forest, logistic regression, gradient boosting, k-nearest neighbour, transformer, and naive Bayes, to determine their effectiveness for single-cell annotation. These methods were evaluated using four diverse datasets comprising hundreds of cell types across several tissues. Our results revealed that SVM consistently outperformed other techniques, emerging as the top performer in three out of the four datasets, followed closely by logistic regression. Most methods demonstrated robust capabilities in annotating major cell types and identifying rare cell populations, though naive Bayes was the least effective due to its inherent limitations in handling high-dimensional and interdependent data. This study provides valuable insights into the relative strengths and weaknesses of machine learning methods for single-cell annotation, offering guidance for selecting appropriate techniques in scRNA-seq analyses.https://www.mdpi.com/1999-4893/18/4/232cell annotationsingle-cellclusteringmachine learningscRNA-seq
spellingShingle Shahid Ahmad Wani
SMK Quadri
Mohammad Shuaib Mir
Yonis Gulzar
A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
Algorithms
cell annotation
single-cell
clustering
machine learning
scRNA-seq
title A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
title_full A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
title_fullStr A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
title_full_unstemmed A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
title_short A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
title_sort comparative study of machine learning techniques for cell annotation of scrna seq data
topic cell annotation
single-cell
clustering
machine learning
scRNA-seq
url https://www.mdpi.com/1999-4893/18/4/232
work_keys_str_mv AT shahidahmadwani acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT smkquadri acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT mohammadshuaibmir acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT yonisgulzar acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT shahidahmadwani comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT smkquadri comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT mohammadshuaibmir comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata
AT yonisgulzar comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata