Improving doublet cell removal efficiency through multiple algorithm runs

Doublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets aft...

Full description

Saved in:
Bibliographic Details
Main Authors: Yong She, Chaoye Wang, Qi Zhao
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S200103702500011X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589979186888704
author Yong She
Chaoye Wang
Qi Zhao
author_facet Yong She
Chaoye Wang
Qi Zhao
author_sort Yong She
collection DOAJ
description Doublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets after application. In this study, we proposed a multi-round doublet removal (MRDR) strategy, that ran the algorithm in cycles multiple times to effectively reduce randomness while enhancing the effectiveness of doublet removal. We evaluated the MRDR strategy in 14 real-world datasets, 29 barcoded scRNA-seq datasets, and 106 synthetic datasets with four popular doublet detection tools, including DoubletFinder, cxds, bcds, and hybrid. We found that in real-world datasets, the DoubletFinder had a better performance in MRDR strategy compared to a single removal of doublets and the recall rate improved by 50 % for two rounds of doublet removal compared to one round, and the performance of the other three doublet algorithms improved the ROC by about 0.04. In barcoded scRNA-seq datasets, we found that using cxds for two rounds of doublet removal yielded the best results. Subsequently, in simulated datasets, we proved that the multi-round removal strategy was more effective in removing doublets than a single removal, with cxds showing the best results when applied twice, and the ROC of the four methods during the two rounds of removal improved by at least 0.05 compared to single removal. Finally, compared to running the algorithm once, we found that the MRDR strategy was more beneficial for differential gene expression analysis and cell trajectory inference when using default analysis parameters. Overall, we proved that the MRDR strategy was more effective in removing doublets and advantageous for downstream analyses, and the strategy could be incorporated into the standard analysis pipeline for scRNA-seq experiments and recommend using cxds to remove doublets through two rounds of algorithm iteration.
format Article
id doaj-art-64d6af126e2e42c3b81ae78cd9f6e9cc
institution Kabale University
issn 2001-0370
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj-art-64d6af126e2e42c3b81ae78cd9f6e9cc2025-01-24T04:44:53ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-0127451460Improving doublet cell removal efficiency through multiple algorithm runsYong She0Chaoye Wang1Qi Zhao2State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, ChinaState Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, ChinaCorresponding author.; State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, ChinaDoublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets after application. In this study, we proposed a multi-round doublet removal (MRDR) strategy, that ran the algorithm in cycles multiple times to effectively reduce randomness while enhancing the effectiveness of doublet removal. We evaluated the MRDR strategy in 14 real-world datasets, 29 barcoded scRNA-seq datasets, and 106 synthetic datasets with four popular doublet detection tools, including DoubletFinder, cxds, bcds, and hybrid. We found that in real-world datasets, the DoubletFinder had a better performance in MRDR strategy compared to a single removal of doublets and the recall rate improved by 50 % for two rounds of doublet removal compared to one round, and the performance of the other three doublet algorithms improved the ROC by about 0.04. In barcoded scRNA-seq datasets, we found that using cxds for two rounds of doublet removal yielded the best results. Subsequently, in simulated datasets, we proved that the multi-round removal strategy was more effective in removing doublets than a single removal, with cxds showing the best results when applied twice, and the ROC of the four methods during the two rounds of removal improved by at least 0.05 compared to single removal. Finally, compared to running the algorithm once, we found that the MRDR strategy was more beneficial for differential gene expression analysis and cell trajectory inference when using default analysis parameters. Overall, we proved that the MRDR strategy was more effective in removing doublets and advantageous for downstream analyses, and the strategy could be incorporated into the standard analysis pipeline for scRNA-seq experiments and recommend using cxds to remove doublets through two rounds of algorithm iteration.http://www.sciencedirect.com/science/article/pii/S200103702500011XSingle-cell RNA sequencingDoublet removalMulti-round doublet removal strategySynthetic dataset
spellingShingle Yong She
Chaoye Wang
Qi Zhao
Improving doublet cell removal efficiency through multiple algorithm runs
Computational and Structural Biotechnology Journal
Single-cell RNA sequencing
Doublet removal
Multi-round doublet removal strategy
Synthetic dataset
title Improving doublet cell removal efficiency through multiple algorithm runs
title_full Improving doublet cell removal efficiency through multiple algorithm runs
title_fullStr Improving doublet cell removal efficiency through multiple algorithm runs
title_full_unstemmed Improving doublet cell removal efficiency through multiple algorithm runs
title_short Improving doublet cell removal efficiency through multiple algorithm runs
title_sort improving doublet cell removal efficiency through multiple algorithm runs
topic Single-cell RNA sequencing
Doublet removal
Multi-round doublet removal strategy
Synthetic dataset
url http://www.sciencedirect.com/science/article/pii/S200103702500011X
work_keys_str_mv AT yongshe improvingdoubletcellremovalefficiencythroughmultiplealgorithmruns
AT chaoyewang improvingdoubletcellremovalefficiencythroughmultiplealgorithmruns
AT qizhao improvingdoubletcellremovalefficiencythroughmultiplealgorithmruns