Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare

Originally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, va...

Full description

Saved in:
Bibliographic Details
Main Authors: Okan Bulut, Bin Tan, Elisabetta Mazzullo, Ali Syed
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/6/476
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850167800356667392
author Okan Bulut
Bin Tan
Elisabetta Mazzullo
Ali Syed
author_facet Okan Bulut
Bin Tan
Elisabetta Mazzullo
Ali Syed
author_sort Okan Bulut
collection DOAJ
description Originally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, various RFE variants have emerged, each introducing methodological enhancements. To help researchers better understand and apply RFE more effectively, this study organizes existing variants into four methodological categories: (1) integration with different machine learning models, (2) combinations of multiple feature importance metrics, (3) modifications to the original RFE process, and (4) hybridization with other feature selection or dimensionality reduction techniques. Rather than conducting a systematic review, we present a narrative synthesis supported by illustrative studies from EDM to demonstrate how different variants have been applied in practice. We also conduct an empirical evaluation of five representative RFE variants across two domains: a regression task using a large-scale educational dataset and a classification task using a clinical dataset on chronic heart failure. Our evaluation benchmarks predictive accuracy, feature selection stability, and runtime efficiency. Results show that the evaluation metrics vary significantly across RFE variants. For example, while RFE wrapped with tree-based models such as Random Forest and Extreme Gradient Boosting (XGBoost) yields strong predictive performance, these methods tend to retain large feature sets and incur high computational costs. In contrast, a variant known as Enhanced RFE achieves substantial feature reduction with only marginal accuracy loss, offering a favorable balance between efficiency and performance. These findings underscore the trade-offs among accuracy, interpretability, and computational cost across RFE variants, providing practical guidance for selecting the most appropriate algorithm based on domain-specific needs and constraints.
format Article
id doaj-art-c53cb5ccc3ea41aba64d71cb2f9f856c
institution OA Journals
issn 2078-2489
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-c53cb5ccc3ea41aba64d71cb2f9f856c2025-08-20T02:21:07ZengMDPI AGInformation2078-24892025-06-0116647610.3390/info16060476Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and HealthcareOkan Bulut0Bin Tan1Elisabetta Mazzullo2Ali Syed3Centre for Research in Applied Measurement and Evaluation, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaMeasurement, Evaluation, and Data Science, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaMeasurement, Evaluation, and Data Science, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaPharmacology, Faculty of Science, University of Alberta, Edmonton, AB T6G 2G5, CanadaOriginally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, various RFE variants have emerged, each introducing methodological enhancements. To help researchers better understand and apply RFE more effectively, this study organizes existing variants into four methodological categories: (1) integration with different machine learning models, (2) combinations of multiple feature importance metrics, (3) modifications to the original RFE process, and (4) hybridization with other feature selection or dimensionality reduction techniques. Rather than conducting a systematic review, we present a narrative synthesis supported by illustrative studies from EDM to demonstrate how different variants have been applied in practice. We also conduct an empirical evaluation of five representative RFE variants across two domains: a regression task using a large-scale educational dataset and a classification task using a clinical dataset on chronic heart failure. Our evaluation benchmarks predictive accuracy, feature selection stability, and runtime efficiency. Results show that the evaluation metrics vary significantly across RFE variants. For example, while RFE wrapped with tree-based models such as Random Forest and Extreme Gradient Boosting (XGBoost) yields strong predictive performance, these methods tend to retain large feature sets and incur high computational costs. In contrast, a variant known as Enhanced RFE achieves substantial feature reduction with only marginal accuracy loss, offering a favorable balance between efficiency and performance. These findings underscore the trade-offs among accuracy, interpretability, and computational cost across RFE variants, providing practical guidance for selecting the most appropriate algorithm based on domain-specific needs and constraints.https://www.mdpi.com/2078-2489/16/6/476feature selectioneducational data miningdimensionalityrecursive feature eliminationhealthcare
spellingShingle Okan Bulut
Bin Tan
Elisabetta Mazzullo
Ali Syed
Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
Information
feature selection
educational data mining
dimensionality
recursive feature elimination
healthcare
title Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_full Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_fullStr Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_full_unstemmed Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_short Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_sort benchmarking variants of recursive feature elimination insights from predictive tasks in education and healthcare
topic feature selection
educational data mining
dimensionality
recursive feature elimination
healthcare
url https://www.mdpi.com/2078-2489/16/6/476
work_keys_str_mv AT okanbulut benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare
AT bintan benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare
AT elisabettamazzullo benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare
AT alisyed benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare