Joint Screening for Ultra-High Dimensional Multi-Omics Data

Investigators often face ultra-high dimensional multi-omics data, where identifying significant genes and omics within a gene is of interest. In such data, each gene forms a group consisting of its multiple omics. Moreover, some genes may also be highly correlated. This leads to a tri-level hierarch...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ulrich Kemmo Tsafack, Chien-Wei Lin , Kwang Woo Ahn
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Bioengineering
Subjects:	variable selection screening multi-omics ultra-high dimensional data
Online Access:	https://www.mdpi.com/2306-5354/11/12/1193
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850036530717917184
author	Ulrich Kemmo Tsafack Chien-Wei Lin Kwang Woo Ahn
author_facet	Ulrich Kemmo Tsafack Chien-Wei Lin Kwang Woo Ahn
author_sort	Ulrich Kemmo Tsafack
collection	DOAJ
description	Investigators often face ultra-high dimensional multi-omics data, where identifying significant genes and omics within a gene is of interest. In such data, each gene forms a group consisting of its multiple omics. Moreover, some genes may also be highly correlated. This leads to a tri-level hierarchical structured data: the cluster level, which is the group of correlated genes, the subgroup level, which is the group of omics of the same gene, and the individual level, which consists of omics. Screening is widely used to remove unimportant variables so that the number of remaining variables becomes smaller than the sample size. Penalized regression with the remaining variables after performing screening is then used to identify important variables. To screen unimportant genes, we propose to cluster genes and conduct screening. We show that the proposed screening method possesses the sure screening property. Extensive simulations show that the proposed screening method outperforms competing methods. We apply the proposed variable selection method to the TCGA breast cancer dataset to identify genes and omics that are related to breast cancer.
format	Article
id	doaj-art-3866e13a2c4647bcaa5a34bd0902e718
institution	DOAJ
issn	2306-5354
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Bioengineering
spelling	doaj-art-3866e13a2c4647bcaa5a34bd0902e7182025-08-20T02:57:07ZengMDPI AGBioengineering2306-53542024-11-011112119310.3390/bioengineering11121193Joint Screening for Ultra-High Dimensional Multi-Omics DataUlrich Kemmo Tsafack0Chien-Wei Lin 1Kwang Woo Ahn2Division of Biostatistics, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USADivision of Biostatistics, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USADivision of Biostatistics, Medical College of Wisconsin (MCW), Milwaukee, WI 53226, USAInvestigators often face ultra-high dimensional multi-omics data, where identifying significant genes and omics within a gene is of interest. In such data, each gene forms a group consisting of its multiple omics. Moreover, some genes may also be highly correlated. This leads to a tri-level hierarchical structured data: the cluster level, which is the group of correlated genes, the subgroup level, which is the group of omics of the same gene, and the individual level, which consists of omics. Screening is widely used to remove unimportant variables so that the number of remaining variables becomes smaller than the sample size. Penalized regression with the remaining variables after performing screening is then used to identify important variables. To screen unimportant genes, we propose to cluster genes and conduct screening. We show that the proposed screening method possesses the sure screening property. Extensive simulations show that the proposed screening method outperforms competing methods. We apply the proposed variable selection method to the TCGA breast cancer dataset to identify genes and omics that are related to breast cancer.https://www.mdpi.com/2306-5354/11/12/1193variable selectionscreeningmulti-omicsultra-high dimensional data
spellingShingle	Ulrich Kemmo Tsafack Chien-Wei Lin Kwang Woo Ahn Joint Screening for Ultra-High Dimensional Multi-Omics Data Bioengineering variable selection screening multi-omics ultra-high dimensional data
title	Joint Screening for Ultra-High Dimensional Multi-Omics Data
title_full	Joint Screening for Ultra-High Dimensional Multi-Omics Data
title_fullStr	Joint Screening for Ultra-High Dimensional Multi-Omics Data
title_full_unstemmed	Joint Screening for Ultra-High Dimensional Multi-Omics Data
title_short	Joint Screening for Ultra-High Dimensional Multi-Omics Data
title_sort	joint screening for ultra high dimensional multi omics data
topic	variable selection screening multi-omics ultra-high dimensional data
url	https://www.mdpi.com/2306-5354/11/12/1193
work_keys_str_mv	AT ulrichkemmotsafack jointscreeningforultrahighdimensionalmultiomicsdata AT chienweilin jointscreeningforultrahighdimensionalmultiomicsdata AT kwangwooahn jointscreeningforultrahighdimensionalmultiomicsdata

Joint Screening for Ultra-High Dimensional Multi-Omics Data

Similar Items