Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis

With the advent of the big data era, we often deal with datasets containing a large number of redundant features, and in this context, dimensionality reduction of data becomes crucial. To address this issue, this study proposes a double filter and double wrapper (DFDW) feature selection algorithm fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Hong Chen, Yuefeng Zheng
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11002483/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850261635788177408
author Hong Chen
Yuefeng Zheng
author_facet Hong Chen
Yuefeng Zheng
author_sort Hong Chen
collection DOAJ
description With the advent of the big data era, we often deal with datasets containing a large number of redundant features, and in this context, dimensionality reduction of data becomes crucial. To address this issue, this study proposes a double filter and double wrapper (DFDW) feature selection algorithm for high-dimensional data. In the double filter stage, the algorithm first evaluates all features from two perspectives using two filter algorithms: ReliefF and the Pearson correlation coefficient. It then selects the top k features and obtains a candidate feature subset F by taking the intersection. Next, the standard Cauchy distribution was used for population initialization. Subsequently, the algorithm enters the double wrapper stage, where it uses the Random Walk Whale Optimization Algorithm (RWWOA) and the improved Adaptive Differential Evolution (ADE) to jointly optimize and obtain the optimal feature subset. Among them, in order to overcome the problem of single algorithm falling into the local optimum, the Algorithm Iteration Mechanism is proposed, which selectively runs two wrapper algorithms to make the algorithm jump out of local optimum and explore a broader optimization space. Finally, we verified the effectiveness of the algorithm through three sets of comparative experiments. The experimental results show that the DFDW algorithm performed well in obtaining the optimal feature subsets on 10 high-dimensional datasets, with an average classification accuracy of more than 95.1% on 8 datasets, a dimensionality reduction rate of less than 0.64% on all datasets, and the lowest dimensionality reduction rate of 0.19%.
format Article
id doaj-art-8831c4064f19456ea07124d3db83d207
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-8831c4064f19456ea07124d3db83d2072025-08-20T01:55:21ZengIEEEIEEE Access2169-35362025-01-0113861858620210.1109/ACCESS.2025.356927111002483Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data AnalysisHong Chen0https://orcid.org/0009-0007-5194-7148Yuefeng Zheng1https://orcid.org/0000-0002-5764-6887School of Mathematics and Computer Science, Jilin Normal University, Siping, ChinaSchool of Mathematics and Computer Science, Jilin Normal University, Siping, ChinaWith the advent of the big data era, we often deal with datasets containing a large number of redundant features, and in this context, dimensionality reduction of data becomes crucial. To address this issue, this study proposes a double filter and double wrapper (DFDW) feature selection algorithm for high-dimensional data. In the double filter stage, the algorithm first evaluates all features from two perspectives using two filter algorithms: ReliefF and the Pearson correlation coefficient. It then selects the top k features and obtains a candidate feature subset F by taking the intersection. Next, the standard Cauchy distribution was used for population initialization. Subsequently, the algorithm enters the double wrapper stage, where it uses the Random Walk Whale Optimization Algorithm (RWWOA) and the improved Adaptive Differential Evolution (ADE) to jointly optimize and obtain the optimal feature subset. Among them, in order to overcome the problem of single algorithm falling into the local optimum, the Algorithm Iteration Mechanism is proposed, which selectively runs two wrapper algorithms to make the algorithm jump out of local optimum and explore a broader optimization space. Finally, we verified the effectiveness of the algorithm through three sets of comparative experiments. The experimental results show that the DFDW algorithm performed well in obtaining the optimal feature subsets on 10 high-dimensional datasets, with an average classification accuracy of more than 95.1% on 8 datasets, a dimensionality reduction rate of less than 0.64% on all datasets, and the lowest dimensionality reduction rate of 0.19%.https://ieeexplore.ieee.org/document/11002483/Feature selectionwhale optimization algorithmdifferential evolutionReliefFPearson correlation coefficient
spellingShingle Hong Chen
Yuefeng Zheng
Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis
IEEE Access
Feature selection
whale optimization algorithm
differential evolution
ReliefF
Pearson correlation coefficient
title Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis
title_full Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis
title_fullStr Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis
title_full_unstemmed Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis
title_short Double Filter and Double Wrapper Feature Selection Algorithm for High-Dimensional Data Analysis
title_sort double filter and double wrapper feature selection algorithm for high dimensional data analysis
topic Feature selection
whale optimization algorithm
differential evolution
ReliefF
Pearson correlation coefficient
url https://ieeexplore.ieee.org/document/11002483/
work_keys_str_mv AT hongchen doublefilteranddoublewrapperfeatureselectionalgorithmforhighdimensionaldataanalysis
AT yuefengzheng doublefilteranddoublewrapperfeatureselectionalgorithmforhighdimensionaldataanalysis