A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data
Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including sup...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Algorithms |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-4893/18/4/232 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850155946424139776 |
|---|---|
| author | Shahid Ahmad Wani SMK Quadri Mohammad Shuaib Mir Yonis Gulzar |
| author_facet | Shahid Ahmad Wani SMK Quadri Mohammad Shuaib Mir Yonis Gulzar |
| author_sort | Shahid Ahmad Wani |
| collection | DOAJ |
| description | Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including support vector machine (SVM), decision tree, random forest, logistic regression, gradient boosting, k-nearest neighbour, transformer, and naive Bayes, to determine their effectiveness for single-cell annotation. These methods were evaluated using four diverse datasets comprising hundreds of cell types across several tissues. Our results revealed that SVM consistently outperformed other techniques, emerging as the top performer in three out of the four datasets, followed closely by logistic regression. Most methods demonstrated robust capabilities in annotating major cell types and identifying rare cell populations, though naive Bayes was the least effective due to its inherent limitations in handling high-dimensional and interdependent data. This study provides valuable insights into the relative strengths and weaknesses of machine learning methods for single-cell annotation, offering guidance for selecting appropriate techniques in scRNA-seq analyses. |
| format | Article |
| id | doaj-art-5738ccb90d1b4bcfa4f313af03bf5859 |
| institution | OA Journals |
| issn | 1999-4893 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Algorithms |
| spelling | doaj-art-5738ccb90d1b4bcfa4f313af03bf58592025-08-20T02:24:43ZengMDPI AGAlgorithms1999-48932025-04-0118423210.3390/a18040232A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq DataShahid Ahmad Wani0SMK Quadri1Mohammad Shuaib Mir2Yonis Gulzar3Department of Computer Science, Jamia Millia Islamia, New Delhi 110025, IndiaDepartment of Computer Science, Jamia Millia Islamia, New Delhi 110025, IndiaDepartment of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi ArabiaDepartment of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi ArabiaAccurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including support vector machine (SVM), decision tree, random forest, logistic regression, gradient boosting, k-nearest neighbour, transformer, and naive Bayes, to determine their effectiveness for single-cell annotation. These methods were evaluated using four diverse datasets comprising hundreds of cell types across several tissues. Our results revealed that SVM consistently outperformed other techniques, emerging as the top performer in three out of the four datasets, followed closely by logistic regression. Most methods demonstrated robust capabilities in annotating major cell types and identifying rare cell populations, though naive Bayes was the least effective due to its inherent limitations in handling high-dimensional and interdependent data. This study provides valuable insights into the relative strengths and weaknesses of machine learning methods for single-cell annotation, offering guidance for selecting appropriate techniques in scRNA-seq analyses.https://www.mdpi.com/1999-4893/18/4/232cell annotationsingle-cellclusteringmachine learningscRNA-seq |
| spellingShingle | Shahid Ahmad Wani SMK Quadri Mohammad Shuaib Mir Yonis Gulzar A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data Algorithms cell annotation single-cell clustering machine learning scRNA-seq |
| title | A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data |
| title_full | A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data |
| title_fullStr | A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data |
| title_full_unstemmed | A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data |
| title_short | A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data |
| title_sort | comparative study of machine learning techniques for cell annotation of scrna seq data |
| topic | cell annotation single-cell clustering machine learning scRNA-seq |
| url | https://www.mdpi.com/1999-4893/18/4/232 |
| work_keys_str_mv | AT shahidahmadwani acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT smkquadri acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT mohammadshuaibmir acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT yonisgulzar acomparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT shahidahmadwani comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT smkquadri comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT mohammadshuaibmir comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata AT yonisgulzar comparativestudyofmachinelearningtechniquesforcellannotationofscrnaseqdata |