Evaluating Sparse Feature Selection Methods: A Theoretical and Empirical Perspective

This paper analyzes two main categories of feature selection: filter methods (such as minimum redundancy maximum relevance, CHI2, Kruskal–Wallis, and ANOVA) and embedded methods (such as alternating direction method of multipliers (BP_ADMM), least absolute shrinkage and selection operator, and ortho...

Full description

Saved in:
Bibliographic Details
Main Authors: Monica Fira, Liviu Goras, Hariton-Nicolae Costin
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3752
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper analyzes two main categories of feature selection: filter methods (such as minimum redundancy maximum relevance, CHI2, Kruskal–Wallis, and ANOVA) and embedded methods (such as alternating direction method of multipliers (BP_ADMM), least absolute shrinkage and selection operator, and orthogonal matching pursuit). The mathematical foundations of feature selection methods inspired by compressed detection are presented, highlighting how the principles of sparse signal recovery can be applied to identify the most relevant features. The results have been obtained using two biomedical databases. The used algorithms have, as their starting point, the notion of sparsity, but the version implemented and tested in this work is adapted for feature selection. The experimental results show that BP_ADMM achieves the highest classification accuracy (77% for arrhythmia_database and 100% for oncological_database), surpassing both the full feature set and the other methods tested in this study, which makes it the optimal feature selection option. The analysis shows that embedded methods strike a balance between accuracy and efficiency by selecting features during the model training, unlike filtering methods, which ignore feature interactions. Although more accurate, embedded methods are slower and depend on the chosen algorithm. Although less comprehensive than wrapper methods, they offer a strong trade-off between speed and performance when computational resources allow for it.
ISSN:2076-3417