Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
Ensemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to imp...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/7/1214 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850188161950416896 |
|---|---|
| author | Oyebayo Ridwan Olaniran Ali Rashash R. Alzahrani Nada MohammedSaeed Alharbi Asma Ahmad Alzahrani |
| author_facet | Oyebayo Ridwan Olaniran Ali Rashash R. Alzahrani Nada MohammedSaeed Alharbi Asma Ahmad Alzahrani |
| author_sort | Oyebayo Ridwan Olaniran |
| collection | DOAJ |
| description | Ensemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to improve binary classification tasks. Unlike traditional random forests, which rely on piecewise constant predictions in terminal nodes, RGALF fits GAM logistic regression (LR) models to the data in each terminal node, enabling it to capture complex nonlinear relationships and interactions among predictors. By aggregating these node-specific GAMs, RGALF addresses multicollinearity, enhances interpretability, and achieves superior bias–variance tradeoffs, particularly in nonlinear settings. Theoretical analysis confirms that RGALF achieves Stone’s optimal rates for additive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mi>k</mi><mo>/</mo><mo>(</mo><mn>2</mn><mi>k</mi><mo>+</mo><mi>d</mi><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula>) under appropriate conditions, outperforming the slower convergence of traditional random forests (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). Furthermore, empirical results demonstrate RGALF’s effectiveness across both simulated and real-world datasets. In simulations, RGALF demonstrates superior performance over random forests (RFs), reducing variance by up to 69% and bias by 19% in nonlinear settings, with significant MSE improvements (0.032 vs. RF’s 0.054 at <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1000</mn></mrow></semantics></math></inline-formula>), while achieving optimal convergence rates (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.48</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> vs. RF’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.29</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). On real-world medical datasets, RGALF attains near-perfect accuracy and AUC: 100% accuracy/AUC for Heart Failure and Hepatitis C (HCV) prediction, 99% accuracy/100% AUC for Pima Diabetes, and 98.8% accuracy/100% AUC for Indian Liver Patient (ILPD), outperforming state-of-the-art methods. Notably, RGALF captures complex biomarker interactions (BMI–insulin in diabetes) missed by traditional models. |
| format | Article |
| id | doaj-art-211c4cf3f6a94d5c8403f74cda8831e5 |
| institution | OA Journals |
| issn | 2227-7390 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Mathematics |
| spelling | doaj-art-211c4cf3f6a94d5c8403f74cda8831e52025-08-20T02:15:58ZengMDPI AGMathematics2227-73902025-04-01137121410.3390/math13071214Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary ClassificationOyebayo Ridwan Olaniran0Ali Rashash R. Alzahrani1Nada MohammedSaeed Alharbi2Asma Ahmad Alzahrani3Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin 1515, NigeriaMathematics Department, Faculty of Sciences, Umm Al-Qura University, Makkah 24382, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Taibah University, Al-Madinah Al-Munawara 42353, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Al-Baha University, Alaqiq, Al-Baha 65799, Saudi ArabiaEnsemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to improve binary classification tasks. Unlike traditional random forests, which rely on piecewise constant predictions in terminal nodes, RGALF fits GAM logistic regression (LR) models to the data in each terminal node, enabling it to capture complex nonlinear relationships and interactions among predictors. By aggregating these node-specific GAMs, RGALF addresses multicollinearity, enhances interpretability, and achieves superior bias–variance tradeoffs, particularly in nonlinear settings. Theoretical analysis confirms that RGALF achieves Stone’s optimal rates for additive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mi>k</mi><mo>/</mo><mo>(</mo><mn>2</mn><mi>k</mi><mo>+</mo><mi>d</mi><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula>) under appropriate conditions, outperforming the slower convergence of traditional random forests (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). Furthermore, empirical results demonstrate RGALF’s effectiveness across both simulated and real-world datasets. In simulations, RGALF demonstrates superior performance over random forests (RFs), reducing variance by up to 69% and bias by 19% in nonlinear settings, with significant MSE improvements (0.032 vs. RF’s 0.054 at <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1000</mn></mrow></semantics></math></inline-formula>), while achieving optimal convergence rates (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.48</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> vs. RF’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.29</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). On real-world medical datasets, RGALF attains near-perfect accuracy and AUC: 100% accuracy/AUC for Heart Failure and Hepatitis C (HCV) prediction, 99% accuracy/100% AUC for Pima Diabetes, and 98.8% accuracy/100% AUC for Indian Liver Patient (ILPD), outperforming state-of-the-art methods. Notably, RGALF captures complex biomarker interactions (BMI–insulin in diabetes) missed by traditional models.https://www.mdpi.com/2227-7390/13/7/1214generalized additive model (GAM)random forest (RF)logistic regression (LR)ensemble methodsbinary classificationnonlinearity |
| spellingShingle | Oyebayo Ridwan Olaniran Ali Rashash R. Alzahrani Nada MohammedSaeed Alharbi Asma Ahmad Alzahrani Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification Mathematics generalized additive model (GAM) random forest (RF) logistic regression (LR) ensemble methods binary classification nonlinearity |
| title | Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification |
| title_full | Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification |
| title_fullStr | Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification |
| title_full_unstemmed | Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification |
| title_short | Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification |
| title_sort | random generalized additive logistic forest a novel ensemble method for robust binary classification |
| topic | generalized additive model (GAM) random forest (RF) logistic regression (LR) ensemble methods binary classification nonlinearity |
| url | https://www.mdpi.com/2227-7390/13/7/1214 |
| work_keys_str_mv | AT oyebayoridwanolaniran randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification AT alirashashralzahrani randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification AT nadamohammedsaeedalharbi randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification AT asmaahmadalzahrani randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification |