Estimating the effect of tissue- and blood-derived cell reference matrices on deconvolving bulk transcriptomic datasets

Cell deconvolution is a widely used method to characterize the composition of the mixed cell population in bulk transcriptomic datasets. While tissue- and blood-derived cell reference matrices (CRMs) are commonly used, their impact on deconvolution accuracy has yet to be systematically evaluated. In...

Full description

Saved in:
Bibliographic Details
Main Authors: Siqi Sun, Shweta Yadav, Mulini Pingili, Dan Chang, Jing Wang
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037025003216
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cell deconvolution is a widely used method to characterize the composition of the mixed cell population in bulk transcriptomic datasets. While tissue- and blood-derived cell reference matrices (CRMs) are commonly used, their impact on deconvolution accuracy has yet to be systematically evaluated. In this study, we developed tissue- and blood-derived CRMs using single-cell RNA sequencing (scRNA-seq) data from inflammatory bowel disease (IBD). Three publicly available blood-derived CRMs (IRIS, LM22, and ImmunoStates) were incorporated for benchmarking. Deconvolution performance was evaluated using both public bulk transcriptomic datasets and simulated pseudobulk samples by goodness-of-fit and cell fractions correlation. Two infliximab-treated bulk datasets were used to identify treatment-related cell types. In addition, lung adenocarcinoma (LUAD) single-cell and bulk transcriptomic datasets were also used for deconvolution evaluation. We found tissue-derived CRMs consistently outperformed blood-derived CRMs in deconvolving bulk tissue transcriptomes, exhibiting higher goodness-of-fit and more accurate cellular proportion estimates, particularly for immune and stromal cells. They also revealed more treatment-related cell types. In contrast, all CRMs performed similarly when applied to blood bulk transcriptomics. These trends also were shown in the LUAD datasets. Our results emphasize the importance of selecting appropriate CRMs for cell deconvolution in bulk tissue transcriptomes, particularly in immunology and oncology. Such considerations can be extended to encompass other disease implications. The R package (DeconvRef) for building user-defined CRMs is available at https://github.com/alohasiqi/DeconvRef
ISSN:2001-0370