Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.

Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and inter...

Full description

Saved in:
Bibliographic Details
Main Authors: Silvia Pineda, Francisco X Real, Manolis Kogevinas, Alfredo Carrato, Stephen J Chanock, Núria Malats, Kristel Van Steen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-12-01
Series:PLoS Genetics
Online Access:https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1005689&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850161930903224320
author Silvia Pineda
Francisco X Real
Manolis Kogevinas
Alfredo Carrato
Stephen J Chanock
Núria Malats
Kristel Van Steen
author_facet Silvia Pineda
Francisco X Real
Manolis Kogevinas
Alfredo Carrato
Stephen J Chanock
Núria Malats
Kristel Van Steen
author_sort Silvia Pineda
collection DOAJ
description Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.
format Article
id doaj-art-fdab4e880735414896b74b34fefe0463
institution OA Journals
issn 1553-7390
1553-7404
language English
publishDate 2015-12-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj-art-fdab4e880735414896b74b34fefe04632025-08-20T02:22:40ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042015-12-011112e100568910.1371/journal.pgen.1005689Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.Silvia PinedaFrancisco X RealManolis KogevinasAlfredo CarratoStephen J ChanockNúria MalatsKristel Van SteenOmics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1005689&type=printable
spellingShingle Silvia Pineda
Francisco X Real
Manolis Kogevinas
Alfredo Carrato
Stephen J Chanock
Núria Malats
Kristel Van Steen
Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.
PLoS Genetics
title Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.
title_full Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.
title_fullStr Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.
title_full_unstemmed Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.
title_short Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.
title_sort integration analysis of three omics data using penalized regression methods an application to bladder cancer
url https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1005689&type=printable
work_keys_str_mv AT silviapineda integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer
AT franciscoxreal integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer
AT manoliskogevinas integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer
AT alfredocarrato integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer
AT stephenjchanock integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer
AT nuriamalats integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer
AT kristelvansteen integrationanalysisofthreeomicsdatausingpenalizedregressionmethodsanapplicationtobladdercancer