Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare

Originally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, va...

Full description

Saved in:

Bibliographic Details
Main Authors:	Okan Bulut, Bin Tan, Elisabetta Mazzullo, Ali Syed
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Information
Subjects:	feature selection educational data mining dimensionality recursive feature elimination healthcare
Online Access:	https://www.mdpi.com/2078-2489/16/6/476
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850167800356667392
author	Okan Bulut Bin Tan Elisabetta Mazzullo Ali Syed
author_facet	Okan Bulut Bin Tan Elisabetta Mazzullo Ali Syed
author_sort	Okan Bulut
collection	DOAJ
description	Originally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, various RFE variants have emerged, each introducing methodological enhancements. To help researchers better understand and apply RFE more effectively, this study organizes existing variants into four methodological categories: (1) integration with different machine learning models, (2) combinations of multiple feature importance metrics, (3) modifications to the original RFE process, and (4) hybridization with other feature selection or dimensionality reduction techniques. Rather than conducting a systematic review, we present a narrative synthesis supported by illustrative studies from EDM to demonstrate how different variants have been applied in practice. We also conduct an empirical evaluation of five representative RFE variants across two domains: a regression task using a large-scale educational dataset and a classification task using a clinical dataset on chronic heart failure. Our evaluation benchmarks predictive accuracy, feature selection stability, and runtime efficiency. Results show that the evaluation metrics vary significantly across RFE variants. For example, while RFE wrapped with tree-based models such as Random Forest and Extreme Gradient Boosting (XGBoost) yields strong predictive performance, these methods tend to retain large feature sets and incur high computational costs. In contrast, a variant known as Enhanced RFE achieves substantial feature reduction with only marginal accuracy loss, offering a favorable balance between efficiency and performance. These findings underscore the trade-offs among accuracy, interpretability, and computational cost across RFE variants, providing practical guidance for selecting the most appropriate algorithm based on domain-specific needs and constraints.
format	Article
id	doaj-art-c53cb5ccc3ea41aba64d71cb2f9f856c
institution	OA Journals
issn	2078-2489
language	English
publishDate	2025-06-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj-art-c53cb5ccc3ea41aba64d71cb2f9f856c2025-08-20T02:21:07ZengMDPI AGInformation2078-24892025-06-0116647610.3390/info16060476Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and HealthcareOkan Bulut0Bin Tan1Elisabetta Mazzullo2Ali Syed3Centre for Research in Applied Measurement and Evaluation, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaMeasurement, Evaluation, and Data Science, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaMeasurement, Evaluation, and Data Science, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaPharmacology, Faculty of Science, University of Alberta, Edmonton, AB T6G 2G5, CanadaOriginally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, various RFE variants have emerged, each introducing methodological enhancements. To help researchers better understand and apply RFE more effectively, this study organizes existing variants into four methodological categories: (1) integration with different machine learning models, (2) combinations of multiple feature importance metrics, (3) modifications to the original RFE process, and (4) hybridization with other feature selection or dimensionality reduction techniques. Rather than conducting a systematic review, we present a narrative synthesis supported by illustrative studies from EDM to demonstrate how different variants have been applied in practice. We also conduct an empirical evaluation of five representative RFE variants across two domains: a regression task using a large-scale educational dataset and a classification task using a clinical dataset on chronic heart failure. Our evaluation benchmarks predictive accuracy, feature selection stability, and runtime efficiency. Results show that the evaluation metrics vary significantly across RFE variants. For example, while RFE wrapped with tree-based models such as Random Forest and Extreme Gradient Boosting (XGBoost) yields strong predictive performance, these methods tend to retain large feature sets and incur high computational costs. In contrast, a variant known as Enhanced RFE achieves substantial feature reduction with only marginal accuracy loss, offering a favorable balance between efficiency and performance. These findings underscore the trade-offs among accuracy, interpretability, and computational cost across RFE variants, providing practical guidance for selecting the most appropriate algorithm based on domain-specific needs and constraints.https://www.mdpi.com/2078-2489/16/6/476feature selectioneducational data miningdimensionalityrecursive feature eliminationhealthcare
spellingShingle	Okan Bulut Bin Tan Elisabetta Mazzullo Ali Syed Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare Information feature selection educational data mining dimensionality recursive feature elimination healthcare
title	Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_full	Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_fullStr	Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_full_unstemmed	Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_short	Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
title_sort	benchmarking variants of recursive feature elimination insights from predictive tasks in education and healthcare
topic	feature selection educational data mining dimensionality recursive feature elimination healthcare
url	https://www.mdpi.com/2078-2489/16/6/476
work_keys_str_mv	AT okanbulut benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare AT bintan benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare AT elisabettamazzullo benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare AT alisyed benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare

Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare

Similar Items