Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.

Gene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei-Yu Lin, Melissa Kartawinata, Bethany R Jebson, Restuadi Restuadi, Hannah Peckham, Anna Radziszewska, Claire T Deakin, Coziana Ciurtin, CLUSTER Consortium, Lucy R Wedderburn, Chris Wallace
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-03-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1012859
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849313281322254336
author Wei-Yu Lin
Melissa Kartawinata
Bethany R Jebson
Restuadi Restuadi
Hannah Peckham
Anna Radziszewska
Claire T Deakin
Coziana Ciurtin
CLUSTER Consortium
Lucy R Wedderburn
Chris Wallace
author_facet Wei-Yu Lin
Melissa Kartawinata
Bethany R Jebson
Restuadi Restuadi
Hannah Peckham
Anna Radziszewska
Claire T Deakin
Coziana Ciurtin
CLUSTER Consortium
Lucy R Wedderburn
Chris Wallace
author_sort Wei-Yu Lin
collection DOAJ
description Gene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be used to estimate cell fractions from bulk expression data and infer average cell-type expression in a set of samples (e.g., cases or controls), but imputing sample-level cell-type expression is required for more detailed analyses, such as relating expression to quantitative traits, and is less commonly addressed. Here, we assessed the accuracy of imputing sample-level cell-type expression using a real dataset where mixed peripheral blood mononuclear cells (PBMC) and sorted (CD4, CD8, CD14, CD19) RNA sequencing data were generated from the same subjects (N=158), and pseudobulk datasets synthesised from eQTLgen single cell RNA-seq data. We compared three domain-specific methods, CIBERSORTx, bMIND and debCAM/swCAM, and two cross-domain machine learning methods, multiple response LASSO and ridge, that had not been used for this task before. We also assessed the methods according to their ability to recover differential gene expression (DGE) results. LASSO/ridge showed higher sensitivity but lower specificity for recovering DGE signals seen in observed data compared to deconvolution methods, although LASSO/ridge had higher area under curves than deconvolution methods. Machine learning methods have the potential to outperform domain-specific methods when suitable training data are available.
format Article
id doaj-art-513197c246284facb690f140e0fcd6d4
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2025-03-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-513197c246284facb690f140e0fcd6d42025-08-20T03:52:48ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-03-01213e101285910.1371/journal.pcbi.1012859Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.Wei-Yu LinMelissa KartawinataBethany R JebsonRestuadi RestuadiHannah PeckhamAnna RadziszewskaClaire T DeakinCoziana CiurtinCLUSTER ConsortiumLucy R WedderburnChris WallaceGene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be used to estimate cell fractions from bulk expression data and infer average cell-type expression in a set of samples (e.g., cases or controls), but imputing sample-level cell-type expression is required for more detailed analyses, such as relating expression to quantitative traits, and is less commonly addressed. Here, we assessed the accuracy of imputing sample-level cell-type expression using a real dataset where mixed peripheral blood mononuclear cells (PBMC) and sorted (CD4, CD8, CD14, CD19) RNA sequencing data were generated from the same subjects (N=158), and pseudobulk datasets synthesised from eQTLgen single cell RNA-seq data. We compared three domain-specific methods, CIBERSORTx, bMIND and debCAM/swCAM, and two cross-domain machine learning methods, multiple response LASSO and ridge, that had not been used for this task before. We also assessed the methods according to their ability to recover differential gene expression (DGE) results. LASSO/ridge showed higher sensitivity but lower specificity for recovering DGE signals seen in observed data compared to deconvolution methods, although LASSO/ridge had higher area under curves than deconvolution methods. Machine learning methods have the potential to outperform domain-specific methods when suitable training data are available.https://doi.org/10.1371/journal.pcbi.1012859
spellingShingle Wei-Yu Lin
Melissa Kartawinata
Bethany R Jebson
Restuadi Restuadi
Hannah Peckham
Anna Radziszewska
Claire T Deakin
Coziana Ciurtin
CLUSTER Consortium
Lucy R Wedderburn
Chris Wallace
Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.
PLoS Computational Biology
title Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.
title_full Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.
title_fullStr Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.
title_full_unstemmed Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.
title_short Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods.
title_sort penalised regression improves imputation of cell type specific expression using rna seq data from mixed cell populations compared to domain specific methods
url https://doi.org/10.1371/journal.pcbi.1012859
work_keys_str_mv AT weiyulin penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT melissakartawinata penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT bethanyrjebson penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT restuadirestuadi penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT hannahpeckham penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT annaradziszewska penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT clairetdeakin penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT cozianaciurtin penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT clusterconsortium penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT lucyrwedderburn penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods
AT chriswallace penalisedregressionimprovesimputationofcelltypespecificexpressionusingrnaseqdatafrommixedcellpopulationscomparedtodomainspecificmethods