Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare
Originally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, va...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2078-2489/16/6/476 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850167800356667392 |
|---|---|
| author | Okan Bulut Bin Tan Elisabetta Mazzullo Ali Syed |
| author_facet | Okan Bulut Bin Tan Elisabetta Mazzullo Ali Syed |
| author_sort | Okan Bulut |
| collection | DOAJ |
| description | Originally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, various RFE variants have emerged, each introducing methodological enhancements. To help researchers better understand and apply RFE more effectively, this study organizes existing variants into four methodological categories: (1) integration with different machine learning models, (2) combinations of multiple feature importance metrics, (3) modifications to the original RFE process, and (4) hybridization with other feature selection or dimensionality reduction techniques. Rather than conducting a systematic review, we present a narrative synthesis supported by illustrative studies from EDM to demonstrate how different variants have been applied in practice. We also conduct an empirical evaluation of five representative RFE variants across two domains: a regression task using a large-scale educational dataset and a classification task using a clinical dataset on chronic heart failure. Our evaluation benchmarks predictive accuracy, feature selection stability, and runtime efficiency. Results show that the evaluation metrics vary significantly across RFE variants. For example, while RFE wrapped with tree-based models such as Random Forest and Extreme Gradient Boosting (XGBoost) yields strong predictive performance, these methods tend to retain large feature sets and incur high computational costs. In contrast, a variant known as Enhanced RFE achieves substantial feature reduction with only marginal accuracy loss, offering a favorable balance between efficiency and performance. These findings underscore the trade-offs among accuracy, interpretability, and computational cost across RFE variants, providing practical guidance for selecting the most appropriate algorithm based on domain-specific needs and constraints. |
| format | Article |
| id | doaj-art-c53cb5ccc3ea41aba64d71cb2f9f856c |
| institution | OA Journals |
| issn | 2078-2489 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Information |
| spelling | doaj-art-c53cb5ccc3ea41aba64d71cb2f9f856c2025-08-20T02:21:07ZengMDPI AGInformation2078-24892025-06-0116647610.3390/info16060476Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and HealthcareOkan Bulut0Bin Tan1Elisabetta Mazzullo2Ali Syed3Centre for Research in Applied Measurement and Evaluation, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaMeasurement, Evaluation, and Data Science, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaMeasurement, Evaluation, and Data Science, Faculty of Education, University of Alberta, Edmonton, AB T6G 2G5, CanadaPharmacology, Faculty of Science, University of Alberta, Edmonton, AB T6G 2G5, CanadaOriginally developed as an effective feature selection method in healthcare predictive analytics, Recursive Feature Elimination (RFE) has gained increasing popularity in Educational Data Mining (EDM) due to its ability to handle high-dimensional data and support interpretable modeling. Over time, various RFE variants have emerged, each introducing methodological enhancements. To help researchers better understand and apply RFE more effectively, this study organizes existing variants into four methodological categories: (1) integration with different machine learning models, (2) combinations of multiple feature importance metrics, (3) modifications to the original RFE process, and (4) hybridization with other feature selection or dimensionality reduction techniques. Rather than conducting a systematic review, we present a narrative synthesis supported by illustrative studies from EDM to demonstrate how different variants have been applied in practice. We also conduct an empirical evaluation of five representative RFE variants across two domains: a regression task using a large-scale educational dataset and a classification task using a clinical dataset on chronic heart failure. Our evaluation benchmarks predictive accuracy, feature selection stability, and runtime efficiency. Results show that the evaluation metrics vary significantly across RFE variants. For example, while RFE wrapped with tree-based models such as Random Forest and Extreme Gradient Boosting (XGBoost) yields strong predictive performance, these methods tend to retain large feature sets and incur high computational costs. In contrast, a variant known as Enhanced RFE achieves substantial feature reduction with only marginal accuracy loss, offering a favorable balance between efficiency and performance. These findings underscore the trade-offs among accuracy, interpretability, and computational cost across RFE variants, providing practical guidance for selecting the most appropriate algorithm based on domain-specific needs and constraints.https://www.mdpi.com/2078-2489/16/6/476feature selectioneducational data miningdimensionalityrecursive feature eliminationhealthcare |
| spellingShingle | Okan Bulut Bin Tan Elisabetta Mazzullo Ali Syed Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare Information feature selection educational data mining dimensionality recursive feature elimination healthcare |
| title | Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare |
| title_full | Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare |
| title_fullStr | Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare |
| title_full_unstemmed | Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare |
| title_short | Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare |
| title_sort | benchmarking variants of recursive feature elimination insights from predictive tasks in education and healthcare |
| topic | feature selection educational data mining dimensionality recursive feature elimination healthcare |
| url | https://www.mdpi.com/2078-2489/16/6/476 |
| work_keys_str_mv | AT okanbulut benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare AT bintan benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare AT elisabettamazzullo benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare AT alisyed benchmarkingvariantsofrecursivefeatureeliminationinsightsfrompredictivetasksineducationandhealthcare |