Stability of Feature Selection in Multi-Omics Data Analysis
In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/23/11103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850107393416888320 |
|---|---|
| author | Tomasz Łukaszuk Jerzy Krawczuk Kamil Żyła Jacek Kęsik |
| author_facet | Tomasz Łukaszuk Jerzy Krawczuk Kamil Żyła Jacek Kęsik |
| author_sort | Tomasz Łukaszuk |
| collection | DOAJ |
| description | In the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; <i>mirna</i> consistently exhibited the highest stability across classifiers, while the <i>mutation</i> and <i>rna</i> layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses. |
| format | Article |
| id | doaj-art-b0ad09f923e7441f885b57fbf5cd0597 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-b0ad09f923e7441f885b57fbf5cd05972025-08-20T02:38:35ZengMDPI AGApplied Sciences2076-34172024-11-0114231110310.3390/app142311103Stability of Feature Selection in Multi-Omics Data AnalysisTomasz Łukaszuk0Jerzy Krawczuk1Kamil Żyła2Jacek Kęsik3Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, PolandFaculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, PolandDepartment of Computer Science, Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, PolandDepartment of Computer Science, Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, PolandIn the rapidly evolving field of multi-omics data analysis, understanding the stability of feature selection is critical for reliable biomarker discovery and clinical applications. This study investigates the stability of feature-selection methods across various cancer types by utilizing 15 datasets from The Cancer Genome Atlas (TCGA). We employed classifiers with embedded feature selection, including Support Vector Machines (SVM), Logistic Regression (LR), and Lasso regression, each incorporating L1 regularization. Through a comprehensive evaluation using five-fold cross-validation, we measured feature-selection stability and assessed the accuracy of predictions regarding TP53 mutations, a known indicator of poor clinical outcomes in cancer patients. All three classifiers demonstrated optimal feature-selection stability, measured by the Nogueira metric, with higher regularization (fewer selected features), while lower regularization generally resulted in decreased stability across all omics layers. Our findings indicate differences in feature stability across the various omics layers; <i>mirna</i> consistently exhibited the highest stability across classifiers, while the <i>mutation</i> and <i>rna</i> layers were generally less stable, particularly with lower regularization. This work highlights the importance of careful feature selection and validation in high-dimensional datasets to enhance the robustness and reliability of multi-omics analyses.https://www.mdpi.com/2076-3417/14/23/11103multi-omicshigh-dimensional datacancer genomicsfeature selectionstabilityL1 regularization |
| spellingShingle | Tomasz Łukaszuk Jerzy Krawczuk Kamil Żyła Jacek Kęsik Stability of Feature Selection in Multi-Omics Data Analysis Applied Sciences multi-omics high-dimensional data cancer genomics feature selection stability L1 regularization |
| title | Stability of Feature Selection in Multi-Omics Data Analysis |
| title_full | Stability of Feature Selection in Multi-Omics Data Analysis |
| title_fullStr | Stability of Feature Selection in Multi-Omics Data Analysis |
| title_full_unstemmed | Stability of Feature Selection in Multi-Omics Data Analysis |
| title_short | Stability of Feature Selection in Multi-Omics Data Analysis |
| title_sort | stability of feature selection in multi omics data analysis |
| topic | multi-omics high-dimensional data cancer genomics feature selection stability L1 regularization |
| url | https://www.mdpi.com/2076-3417/14/23/11103 |
| work_keys_str_mv | AT tomaszłukaszuk stabilityoffeatureselectioninmultiomicsdataanalysis AT jerzykrawczuk stabilityoffeatureselectioninmultiomicsdataanalysis AT kamilzyła stabilityoffeatureselectioninmultiomicsdataanalysis AT jacekkesik stabilityoffeatureselectioninmultiomicsdataanalysis |