Improving doublet cell removal efficiency through multiple algorithm runs
Doublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets aft...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-01-01
|
Series: | Computational and Structural Biotechnology Journal |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S200103702500011X |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589979186888704 |
---|---|
author | Yong She Chaoye Wang Qi Zhao |
author_facet | Yong She Chaoye Wang Qi Zhao |
author_sort | Yong She |
collection | DOAJ |
description | Doublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets after application. In this study, we proposed a multi-round doublet removal (MRDR) strategy, that ran the algorithm in cycles multiple times to effectively reduce randomness while enhancing the effectiveness of doublet removal. We evaluated the MRDR strategy in 14 real-world datasets, 29 barcoded scRNA-seq datasets, and 106 synthetic datasets with four popular doublet detection tools, including DoubletFinder, cxds, bcds, and hybrid. We found that in real-world datasets, the DoubletFinder had a better performance in MRDR strategy compared to a single removal of doublets and the recall rate improved by 50 % for two rounds of doublet removal compared to one round, and the performance of the other three doublet algorithms improved the ROC by about 0.04. In barcoded scRNA-seq datasets, we found that using cxds for two rounds of doublet removal yielded the best results. Subsequently, in simulated datasets, we proved that the multi-round removal strategy was more effective in removing doublets than a single removal, with cxds showing the best results when applied twice, and the ROC of the four methods during the two rounds of removal improved by at least 0.05 compared to single removal. Finally, compared to running the algorithm once, we found that the MRDR strategy was more beneficial for differential gene expression analysis and cell trajectory inference when using default analysis parameters. Overall, we proved that the MRDR strategy was more effective in removing doublets and advantageous for downstream analyses, and the strategy could be incorporated into the standard analysis pipeline for scRNA-seq experiments and recommend using cxds to remove doublets through two rounds of algorithm iteration. |
format | Article |
id | doaj-art-64d6af126e2e42c3b81ae78cd9f6e9cc |
institution | Kabale University |
issn | 2001-0370 |
language | English |
publishDate | 2025-01-01 |
publisher | Elsevier |
record_format | Article |
series | Computational and Structural Biotechnology Journal |
spelling | doaj-art-64d6af126e2e42c3b81ae78cd9f6e9cc2025-01-24T04:44:53ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-0127451460Improving doublet cell removal efficiency through multiple algorithm runsYong She0Chaoye Wang1Qi Zhao2State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, ChinaState Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, ChinaCorresponding author.; State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, ChinaDoublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets after application. In this study, we proposed a multi-round doublet removal (MRDR) strategy, that ran the algorithm in cycles multiple times to effectively reduce randomness while enhancing the effectiveness of doublet removal. We evaluated the MRDR strategy in 14 real-world datasets, 29 barcoded scRNA-seq datasets, and 106 synthetic datasets with four popular doublet detection tools, including DoubletFinder, cxds, bcds, and hybrid. We found that in real-world datasets, the DoubletFinder had a better performance in MRDR strategy compared to a single removal of doublets and the recall rate improved by 50 % for two rounds of doublet removal compared to one round, and the performance of the other three doublet algorithms improved the ROC by about 0.04. In barcoded scRNA-seq datasets, we found that using cxds for two rounds of doublet removal yielded the best results. Subsequently, in simulated datasets, we proved that the multi-round removal strategy was more effective in removing doublets than a single removal, with cxds showing the best results when applied twice, and the ROC of the four methods during the two rounds of removal improved by at least 0.05 compared to single removal. Finally, compared to running the algorithm once, we found that the MRDR strategy was more beneficial for differential gene expression analysis and cell trajectory inference when using default analysis parameters. Overall, we proved that the MRDR strategy was more effective in removing doublets and advantageous for downstream analyses, and the strategy could be incorporated into the standard analysis pipeline for scRNA-seq experiments and recommend using cxds to remove doublets through two rounds of algorithm iteration.http://www.sciencedirect.com/science/article/pii/S200103702500011XSingle-cell RNA sequencingDoublet removalMulti-round doublet removal strategySynthetic dataset |
spellingShingle | Yong She Chaoye Wang Qi Zhao Improving doublet cell removal efficiency through multiple algorithm runs Computational and Structural Biotechnology Journal Single-cell RNA sequencing Doublet removal Multi-round doublet removal strategy Synthetic dataset |
title | Improving doublet cell removal efficiency through multiple algorithm runs |
title_full | Improving doublet cell removal efficiency through multiple algorithm runs |
title_fullStr | Improving doublet cell removal efficiency through multiple algorithm runs |
title_full_unstemmed | Improving doublet cell removal efficiency through multiple algorithm runs |
title_short | Improving doublet cell removal efficiency through multiple algorithm runs |
title_sort | improving doublet cell removal efficiency through multiple algorithm runs |
topic | Single-cell RNA sequencing Doublet removal Multi-round doublet removal strategy Synthetic dataset |
url | http://www.sciencedirect.com/science/article/pii/S200103702500011X |
work_keys_str_mv | AT yongshe improvingdoubletcellremovalefficiencythroughmultiplealgorithmruns AT chaoyewang improvingdoubletcellremovalefficiencythroughmultiplealgorithmruns AT qizhao improvingdoubletcellremovalefficiencythroughmultiplealgorithmruns |