CYCLONE: recycle contrastive learning for integrating single-cell gene expression data

Abstract Background Combining single-cell transcriptome sequencing results from several batches reduces batch effect, which improves our understanding of cellular identity and function. Results This paper introduces CYCLONE, a new method for integrating single-cell gene expression data using a recyc...

Full description

Saved in:
Bibliographic Details
Main Authors: Han Ji, Xinwei He, Hongwei Li
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06214-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Combining single-cell transcriptome sequencing results from several batches reduces batch effect, which improves our understanding of cellular identity and function. Results This paper introduces CYCLONE, a new method for integrating single-cell gene expression data using a recycle contrastive learning network. The contrastive learning network and the VAE model work together to jointly train the low-dimensional representations. Additionally, they update the indices of inter-batch MNN pairs to generate positive pairs from a reduced-noise low-dimensional space. Meanwhile, CYCLONE cyclically updates the MNN pairs by iteratively training the low-dimensional space to gradually improve the confidence of the positive sample pairs, and augments the MNN pairs with KNN pairs to identify batch-specific cell types, thus avoiding the problems associated with overcorrecting for the batch effect. The performance of CYCLONE was evaluated on simulated and real scRNA-seq datasets, confirming its ability to improve clustering accuracy while successfully eliminating batch effects. In addition, experiments on batch-specific cell types identification validated CYCLONE’s ability to retain batch-specific information while eliminating batch effect, thus preserving batch-specific cell types. Conclusion CYCLONE is an effective integration method based on recycle contrastive learning that improves the accuracy of cell clustering while successfully eliminating batch effects and preserving batch-specific information.
ISSN:1471-2105