Optimizing Heart Disease Prediction: A Comparative Analysis of Tree-Based Ensembles With Feature Expansion and Selection

Cardiovascular disease (CVD) is the leading cause of death worldwide, emphasizing the importance of accurate early detection. This study examines the efficacy of tree-based ensemble machine learning models that have been improved using Feature Expansion and Selection (FES-EM). We considered the Mend...

Full description

Saved in:
Bibliographic Details
Main Authors: K. Aswini, Kriti Arya
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11084798/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cardiovascular disease (CVD) is the leading cause of death worldwide, emphasizing the importance of accurate early detection. This study examines the efficacy of tree-based ensemble machine learning models that have been improved using Feature Expansion and Selection (FES-EM). We considered the Mendeley CVD dataset of 1,000 patient records for this study, where new features were generated with feature generation operators, and the most significant ones were then retained by applying four feature selection techniques. Three hyperparameter tuning strategies were used to train and tune seven tree-based ensemble models to identify the best-performing approach. The results identified Decision Tree-Based Recursive Feature Elimination (DTRFECV) with AdaBoost optimized for grid search as the most effective model, achieving 98.20% sensitivity, 97.98% F1 score, and 97.75% accuracy. Validation on both primary and secondary datasets demonstrated that models incorporating FES outperformed those without FES. These findings underscore the novelty of a multi-stage machine learning pipeline that integrates clinically driven feature expansion with systematic optimization to build a robust and generalizable framework for early and accurate CVD diagnosis.
ISSN:2169-3536