Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification

Ensemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to imp...

Full description

Saved in:
Bibliographic Details
Main Authors: Oyebayo Ridwan Olaniran, Ali Rashash R. Alzahrani, Nada MohammedSaeed Alharbi, Asma Ahmad Alzahrani
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/7/1214
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850188161950416896
author Oyebayo Ridwan Olaniran
Ali Rashash R. Alzahrani
Nada MohammedSaeed Alharbi
Asma Ahmad Alzahrani
author_facet Oyebayo Ridwan Olaniran
Ali Rashash R. Alzahrani
Nada MohammedSaeed Alharbi
Asma Ahmad Alzahrani
author_sort Oyebayo Ridwan Olaniran
collection DOAJ
description Ensemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to improve binary classification tasks. Unlike traditional random forests, which rely on piecewise constant predictions in terminal nodes, RGALF fits GAM logistic regression (LR) models to the data in each terminal node, enabling it to capture complex nonlinear relationships and interactions among predictors. By aggregating these node-specific GAMs, RGALF addresses multicollinearity, enhances interpretability, and achieves superior bias–variance tradeoffs, particularly in nonlinear settings. Theoretical analysis confirms that RGALF achieves Stone’s optimal rates for additive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mi>k</mi><mo>/</mo><mo>(</mo><mn>2</mn><mi>k</mi><mo>+</mo><mi>d</mi><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula>) under appropriate conditions, outperforming the slower convergence of traditional random forests (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). Furthermore, empirical results demonstrate RGALF’s effectiveness across both simulated and real-world datasets. In simulations, RGALF demonstrates superior performance over random forests (RFs), reducing variance by up to 69% and bias by 19% in nonlinear settings, with significant MSE improvements (0.032 vs. RF’s 0.054 at <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1000</mn></mrow></semantics></math></inline-formula>), while achieving optimal convergence rates (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.48</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> vs. RF’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.29</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). On real-world medical datasets, RGALF attains near-perfect accuracy and AUC: 100% accuracy/AUC for Heart Failure and Hepatitis C (HCV) prediction, 99% accuracy/100% AUC for Pima Diabetes, and 98.8% accuracy/100% AUC for Indian Liver Patient (ILPD), outperforming state-of-the-art methods. Notably, RGALF captures complex biomarker interactions (BMI–insulin in diabetes) missed by traditional models.
format Article
id doaj-art-211c4cf3f6a94d5c8403f74cda8831e5
institution OA Journals
issn 2227-7390
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-211c4cf3f6a94d5c8403f74cda8831e52025-08-20T02:15:58ZengMDPI AGMathematics2227-73902025-04-01137121410.3390/math13071214Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary ClassificationOyebayo Ridwan Olaniran0Ali Rashash R. Alzahrani1Nada MohammedSaeed Alharbi2Asma Ahmad Alzahrani3Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin 1515, NigeriaMathematics Department, Faculty of Sciences, Umm Al-Qura University, Makkah 24382, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Taibah University, Al-Madinah Al-Munawara 42353, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Al-Baha University, Alaqiq, Al-Baha 65799, Saudi ArabiaEnsemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to improve binary classification tasks. Unlike traditional random forests, which rely on piecewise constant predictions in terminal nodes, RGALF fits GAM logistic regression (LR) models to the data in each terminal node, enabling it to capture complex nonlinear relationships and interactions among predictors. By aggregating these node-specific GAMs, RGALF addresses multicollinearity, enhances interpretability, and achieves superior bias–variance tradeoffs, particularly in nonlinear settings. Theoretical analysis confirms that RGALF achieves Stone’s optimal rates for additive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mi>k</mi><mo>/</mo><mo>(</mo><mn>2</mn><mi>k</mi><mo>+</mo><mi>d</mi><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula>) under appropriate conditions, outperforming the slower convergence of traditional random forests (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). Furthermore, empirical results demonstrate RGALF’s effectiveness across both simulated and real-world datasets. In simulations, RGALF demonstrates superior performance over random forests (RFs), reducing variance by up to 69% and bias by 19% in nonlinear settings, with significant MSE improvements (0.032 vs. RF’s 0.054 at <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1000</mn></mrow></semantics></math></inline-formula>), while achieving optimal convergence rates (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.48</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> vs. RF’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.29</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). On real-world medical datasets, RGALF attains near-perfect accuracy and AUC: 100% accuracy/AUC for Heart Failure and Hepatitis C (HCV) prediction, 99% accuracy/100% AUC for Pima Diabetes, and 98.8% accuracy/100% AUC for Indian Liver Patient (ILPD), outperforming state-of-the-art methods. Notably, RGALF captures complex biomarker interactions (BMI–insulin in diabetes) missed by traditional models.https://www.mdpi.com/2227-7390/13/7/1214generalized additive model (GAM)random forest (RF)logistic regression (LR)ensemble methodsbinary classificationnonlinearity
spellingShingle Oyebayo Ridwan Olaniran
Ali Rashash R. Alzahrani
Nada MohammedSaeed Alharbi
Asma Ahmad Alzahrani
Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
Mathematics
generalized additive model (GAM)
random forest (RF)
logistic regression (LR)
ensemble methods
binary classification
nonlinearity
title Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_full Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_fullStr Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_full_unstemmed Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_short Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_sort random generalized additive logistic forest a novel ensemble method for robust binary classification
topic generalized additive model (GAM)
random forest (RF)
logistic regression (LR)
ensemble methods
binary classification
nonlinearity
url https://www.mdpi.com/2227-7390/13/7/1214
work_keys_str_mv AT oyebayoridwanolaniran randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification
AT alirashashralzahrani randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification
AT nadamohammedsaeedalharbi randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification
AT asmaahmadalzahrani randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification