Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis

With the advent of the big data era, we often deal with datasets containing a large number of redundant features, and in this context, dimensionality reduction of data becomes crucial. To address this issue, this study proposes a double filter and double wrapper (DFDW) feature selection algorithm fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Hong Chen, Yuefeng Zheng
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11002483/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the advent of the big data era, we often deal with datasets containing a large number of redundant features, and in this context, dimensionality reduction of data becomes crucial. To address this issue, this study proposes a double filter and double wrapper (DFDW) feature selection algorithm for high-dimensional data. In the double filter stage, the algorithm first evaluates all features from two perspectives using two filter algorithms: ReliefF and the Pearson correlation coefficient. It then selects the top k features and obtains a candidate feature subset F by taking the intersection. Next, the standard Cauchy distribution was used for population initialization. Subsequently, the algorithm enters the double wrapper stage, where it uses the Random Walk Whale Optimization Algorithm (RWWOA) and the improved Adaptive Differential Evolution (ADE) to jointly optimize and obtain the optimal feature subset. Among them, in order to overcome the problem of single algorithm falling into the local optimum, the Algorithm Iteration Mechanism is proposed, which selectively runs two wrapper algorithms to make the algorithm jump out of local optimum and explore a broader optimization space. Finally, we verified the effectiveness of the algorithm through three sets of comparative experiments. The experimental results show that the DFDW algorithm performed well in obtaining the optimal feature subsets on 10 high-dimensional datasets, with an average classification accuracy of more than 95.1% on 8 datasets, a dimensionality reduction rate of less than 0.64% on all datasets, and the lowest dimensionality reduction rate of 0.19%.
ISSN:2169-3536