scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization

Abstract The rapid advancement of single-cell technologies has created an urgent need for effective methods to integrate and harmonize single-cell data. Technical and biological variations across studies complicate data integration, while conventional tools often struggle with reliance on gene expre...

Full description

Saved in:
Bibliographic Details
Main Authors: Bowen Zhao, Kailu Song, Dong-Qing Wei, Yi Xiong, Jun Ding
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Communications Biology
Online Access:https://doi.org/10.1038/s42003-025-07692-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The rapid advancement of single-cell technologies has created an urgent need for effective methods to integrate and harmonize single-cell data. Technical and biological variations across studies complicate data integration, while conventional tools often struggle with reliance on gene expression distribution assumptions and over-correction. Here, we present scCobra, a deep generative neural network designed to overcome these challenges through contrastive learning with domain adaptation. scCobra effectively mitigates batch effects, minimizes over-correction, and ensures biologically meaningful data integration without assuming specific gene expression distributions. It enables online label transfer across datasets with batch effects, allowing continuous integration of new data without retraining. Additionally, scCobra supports batch effect simulation, advanced multi-omic integration, and scalable processing of large datasets. By integrating and harmonizing datasets from similar studies, scCobra expands the available data for investigating specific biological problems, improving cross-study comparability, and revealing insights that may be obscured in isolated datasets.
ISSN:2399-3642