Efficient Hybrid-Robust Approach for Cancer Biomarker Discovery Using Omics Data

DNA Microarray datasets, also known as “omics” data, are important for the diagnosis of numerous diseases, including cancer and tumors. In the analysis of these data, feature selection techniques and classification algorithms are the workhorse for choosing candidate genes that...

Full description

Saved in:
Bibliographic Details
Main Authors: Karima Sid, Soumia Zertal, Mohamed Batouche, Soumeya Zerabi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10935339/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:DNA Microarray datasets, also known as “omics” data, are important for the diagnosis of numerous diseases, including cancer and tumors. In the analysis of these data, feature selection techniques and classification algorithms are the workhorse for choosing candidate genes that serve as cancer biomarkers. However, microarray datasets present a challenge; they contain a greater number of features than the samples, which affects the performance of algorithms used in the analysis process. In order to extract precise information, it is necessary to employ a method that is both robust and performant. This paper emphasizes the importance of accurate and stable gene selection for the discovery of knowledge derived from high-dimensional data. A novel hybrid framework was put forth for consideration, comprising three distinct stages: Clustering, Parallel Filtering, and Hybrid-Parallel Optimization. In each step, a combination of techniques and algorithms is used to improve the results in terms of stability and/or accuracy. The proposal is evaluated and tested according to different scenarios; using thirteen gene expression datasets and two classifiers: Artificial Neural Network (ANN) and Naïve Bayes (NB). Comparison with related work demonstrates the efficacy of this approach, which enhances classification accuracy and stability while reducing the number of selected genes.
ISSN:2169-3536