Stability of Feature Selection in Multi-Omics Data Analysis

In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets...

Full description

Saved in:
Bibliographic Details
Main Authors: Tomasz Łukaszuk, Jerzy Krawczuk, Kamil Żyła, Jacek Kęsik
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/11103
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850107393416888320
author Tomasz Łukaszuk
Jerzy Krawczuk
Kamil Żyła
Jacek Kęsik
author_facet Tomasz Łukaszuk
Jerzy Krawczuk
Kamil Żyła
Jacek Kęsik
author_sort Tomasz Łukaszuk
collection DOAJ
description In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; <i>mirna</i> consistently exhibited the highest stability across classifiers, while the <i>mutation</i> and <i>rna</i> layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.
format Article
id doaj-art-b0ad09f923e7441f885b57fbf5cd0597
institution OA Journals
issn 2076-3417
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-b0ad09f923e7441f885b57fbf5cd05972025-08-20T02:38:35ZengMDPI AGApplied Sciences2076-34172024-11-0114231110310.3390/app142311103Stability of Feature Selection in Multi-Omics Data AnalysisTomasz Łukaszuk0Jerzy Krawczuk1Kamil Żyła2Jacek Kęsik3Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, PolandFaculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, PolandDepartment of Computer Science, Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, PolandDepartment of Computer Science, Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, PolandIn the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; <i>mirna</i> consistently exhibited the highest stability across classifiers, while the <i>mutation</i> and <i>rna</i> layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.https://www.mdpi.com/2076-3417/14/23/11103multi-omicshigh-dimensional datacancer genomicsfeature selectionstabilityL1 regularization
spellingShingle Tomasz Łukaszuk
Jerzy Krawczuk
Kamil Żyła
Jacek Kęsik
Stability of Feature Selection in Multi-Omics Data Analysis
Applied Sciences
multi-omics
high-dimensional data
cancer genomics
feature selection
stability
L1 regularization
title Stability of Feature Selection in Multi-Omics Data Analysis
title_full Stability of Feature Selection in Multi-Omics Data Analysis
title_fullStr Stability of Feature Selection in Multi-Omics Data Analysis
title_full_unstemmed Stability of Feature Selection in Multi-Omics Data Analysis
title_short Stability of Feature Selection in Multi-Omics Data Analysis
title_sort stability of feature selection in multi omics data analysis
topic multi-omics
high-dimensional data
cancer genomics
feature selection
stability
L1 regularization
url https://www.mdpi.com/2076-3417/14/23/11103
work_keys_str_mv AT tomaszłukaszuk stabilityoffeatureselectioninmultiomicsdataanalysis
AT jerzykrawczuk stabilityoffeatureselectioninmultiomicsdataanalysis
AT kamilzyła stabilityoffeatureselectioninmultiomicsdataanalysis
AT jacekkesik stabilityoffeatureselectioninmultiomicsdataanalysis