Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness

Abstract Batch effect correction (BEC) is fundamental to integrate multiple single-cell RNA sequencing datasets, and its success is critical to empower in-depth interrogation for biological insights. However, no simple metric is available to evaluate BEC performance with sensitivity to data overcorr...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoyue Hu, He Li, Ming Chen, Junbin Qian, Hangjin Jiang
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-025-07947-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849392318714478592
author Xiaoyue Hu
He Li
Ming Chen
Junbin Qian
Hangjin Jiang
author_facet Xiaoyue Hu
He Li
Ming Chen
Junbin Qian
Hangjin Jiang
author_sort Xiaoyue Hu
collection DOAJ
description Abstract Batch effect correction (BEC) is fundamental to integrate multiple single-cell RNA sequencing datasets, and its success is critical to empower in-depth interrogation for biological insights. However, no simple metric is available to evaluate BEC performance with sensitivity to data overcorrection, which erases true biological variations and leads to false biological discoveries. Here, we propose RBET, a reference-informed statistical framework for evaluating the success of BEC. Using extensive simulations and six real data examples including scRNA-seq and scATAC-seq datasets with different numbers of batches, batch effect sizes and numbers of cell types, we demonstrate that RBET evaluates the performance of BEC methods more fairly with biologically meaningful insights from data, while other methods may lead to false results. Moreover, RBET is computationally efficient, sensitive to overcorrection and robust to large batch effect sizes. Thus, RBET provides a robust guideline on selecting case-specific BEC method, and the concept of RBET is extendable to other modalities.
format Article
id doaj-art-9a7e20682e834674b3ad30d6f26eef65
institution Kabale University
issn 2399-3642
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Communications Biology
spelling doaj-art-9a7e20682e834674b3ad30d6f26eef652025-08-20T03:40:47ZengNature PortfolioCommunications Biology2399-36422025-03-018111310.1038/s42003-025-07947-7Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awarenessXiaoyue Hu0He Li1Ming Chen2Junbin Qian3Hangjin Jiang4Center for Data Science, Zhejiang UniversityCenter for Data Science, Zhejiang UniversityCollege of Life Sciences, Zhejiang UniversityZhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women’s Hospital, Zhejiang University School of MedicineCenter for Data Science, Zhejiang UniversityAbstract Batch effect correction (BEC) is fundamental to integrate multiple single-cell RNA sequencing datasets, and its success is critical to empower in-depth interrogation for biological insights. However, no simple metric is available to evaluate BEC performance with sensitivity to data overcorrection, which erases true biological variations and leads to false biological discoveries. Here, we propose RBET, a reference-informed statistical framework for evaluating the success of BEC. Using extensive simulations and six real data examples including scRNA-seq and scATAC-seq datasets with different numbers of batches, batch effect sizes and numbers of cell types, we demonstrate that RBET evaluates the performance of BEC methods more fairly with biologically meaningful insights from data, while other methods may lead to false results. Moreover, RBET is computationally efficient, sensitive to overcorrection and robust to large batch effect sizes. Thus, RBET provides a robust guideline on selecting case-specific BEC method, and the concept of RBET is extendable to other modalities.https://doi.org/10.1038/s42003-025-07947-7
spellingShingle Xiaoyue Hu
He Li
Ming Chen
Junbin Qian
Hangjin Jiang
Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness
Communications Biology
title Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness
title_full Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness
title_fullStr Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness
title_full_unstemmed Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness
title_short Reference-informed evaluation of batch correction for single-cell omics data with overcorrection awareness
title_sort reference informed evaluation of batch correction for single cell omics data with overcorrection awareness
url https://doi.org/10.1038/s42003-025-07947-7
work_keys_str_mv AT xiaoyuehu referenceinformedevaluationofbatchcorrectionforsinglecellomicsdatawithovercorrectionawareness
AT heli referenceinformedevaluationofbatchcorrectionforsinglecellomicsdatawithovercorrectionawareness
AT mingchen referenceinformedevaluationofbatchcorrectionforsinglecellomicsdatawithovercorrectionawareness
AT junbinqian referenceinformedevaluationofbatchcorrectionforsinglecellomicsdatawithovercorrectionawareness
AT hangjinjiang referenceinformedevaluationofbatchcorrectionforsinglecellomicsdatawithovercorrectionawareness