Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification

Ensemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to imp...

Full description

Saved in:

Bibliographic Details
Main Authors:	Oyebayo Ridwan Olaniran, Ali Rashash R. Alzahrani, Nada MohammedSaeed Alharbi, Asma Ahmad Alzahrani
Format:	Article
Language:	English
Published:	MDPI AG 2025-04-01
Series:	Mathematics
Subjects:	generalized additive model (GAM) random forest (RF) logistic regression (LR) ensemble methods binary classification nonlinearity
Online Access:	https://www.mdpi.com/2227-7390/13/7/1214
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850188161950416896
author	Oyebayo Ridwan Olaniran Ali Rashash R. Alzahrani Nada MohammedSaeed Alharbi Asma Ahmad Alzahrani
author_facet	Oyebayo Ridwan Olaniran Ali Rashash R. Alzahrani Nada MohammedSaeed Alharbi Asma Ahmad Alzahrani
author_sort	Oyebayo Ridwan Olaniran
collection	DOAJ
description	Ensemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to improve binary classification tasks. Unlike traditional random forests, which rely on piecewise constant predictions in terminal nodes, RGALF fits GAM logistic regression (LR) models to the data in each terminal node, enabling it to capture complex nonlinear relationships and interactions among predictors. By aggregating these node-specific GAMs, RGALF addresses multicollinearity, enhances interpretability, and achieves superior bias–variance tradeoffs, particularly in nonlinear settings. Theoretical analysis confirms that RGALF achieves Stone’s optimal rates for additive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mi>k</mi><mo>/</mo><mo>(</mo><mn>2</mn><mi>k</mi><mo>+</mo><mi>d</mi><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula>) under appropriate conditions, outperforming the slower convergence of traditional random forests (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). Furthermore, empirical results demonstrate RGALF’s effectiveness across both simulated and real-world datasets. In simulations, RGALF demonstrates superior performance over random forests (RFs), reducing variance by up to 69% and bias by 19% in nonlinear settings, with significant MSE improvements (0.032 vs. RF’s 0.054 at <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1000</mn></mrow></semantics></math></inline-formula>), while achieving optimal convergence rates (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.48</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> vs. RF’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.29</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). On real-world medical datasets, RGALF attains near-perfect accuracy and AUC: 100% accuracy/AUC for Heart Failure and Hepatitis C (HCV) prediction, 99% accuracy/100% AUC for Pima Diabetes, and 98.8% accuracy/100% AUC for Indian Liver Patient (ILPD), outperforming state-of-the-art methods. Notably, RGALF captures complex biomarker interactions (BMI–insulin in diabetes) missed by traditional models.
format	Article
id	doaj-art-211c4cf3f6a94d5c8403f74cda8831e5
institution	OA Journals
issn	2227-7390
language	English
publishDate	2025-04-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-211c4cf3f6a94d5c8403f74cda8831e52025-08-20T02:15:58ZengMDPI AGMathematics2227-73902025-04-01137121410.3390/math13071214Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary ClassificationOyebayo Ridwan Olaniran0Ali Rashash R. Alzahrani1Nada MohammedSaeed Alharbi2Asma Ahmad Alzahrani3Department of Statistics, Faculty of Physical Sciences, University of Ilorin, Ilorin 1515, NigeriaMathematics Department, Faculty of Sciences, Umm Al-Qura University, Makkah 24382, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Taibah University, Al-Madinah Al-Munawara 42353, Saudi ArabiaDepartment of Mathematics, Faculty of Science, Al-Baha University, Alaqiq, Al-Baha 65799, Saudi ArabiaEnsemble methods have proven highly effective in enhancing predictive performance by combining multiple models. We introduce a novel ensemble approach, the Random Generalized Additive Logistic Forest (RGALF), which integrates generalized additive models (GAMs) within a random forest framework to improve binary classification tasks. Unlike traditional random forests, which rely on piecewise constant predictions in terminal nodes, RGALF fits GAM logistic regression (LR) models to the data in each terminal node, enabling it to capture complex nonlinear relationships and interactions among predictors. By aggregating these node-specific GAMs, RGALF addresses multicollinearity, enhances interpretability, and achieves superior bias–variance tradeoffs, particularly in nonlinear settings. Theoretical analysis confirms that RGALF achieves Stone’s optimal rates for additive models (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mi>k</mi><mo>/</mo><mo>(</mo><mn>2</mn><mi>k</mi><mo>+</mo><mi>d</mi><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula>) under appropriate conditions, outperforming the slower convergence of traditional random forests (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>2</mn><mo>/</mo><mn>3</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). Furthermore, empirical results demonstrate RGALF’s effectiveness across both simulated and real-world datasets. In simulations, RGALF demonstrates superior performance over random forests (RFs), reducing variance by up to 69% and bias by 19% in nonlinear settings, with significant MSE improvements (0.032 vs. RF’s 0.054 at <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>n</mi><mo>=</mo><mn>1000</mn></mrow></semantics></math></inline-formula>), while achieving optimal convergence rates (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.48</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula> vs. RF’s <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="script">O</mi><mo>(</mo><msup><mi>n</mi><mrow><mo>−</mo><mn>0.29</mn></mrow></msup><mo>)</mo></mrow></semantics></math></inline-formula>). On real-world medical datasets, RGALF attains near-perfect accuracy and AUC: 100% accuracy/AUC for Heart Failure and Hepatitis C (HCV) prediction, 99% accuracy/100% AUC for Pima Diabetes, and 98.8% accuracy/100% AUC for Indian Liver Patient (ILPD), outperforming state-of-the-art methods. Notably, RGALF captures complex biomarker interactions (BMI–insulin in diabetes) missed by traditional models.https://www.mdpi.com/2227-7390/13/7/1214generalized additive model (GAM)random forest (RF)logistic regression (LR)ensemble methodsbinary classificationnonlinearity
spellingShingle	Oyebayo Ridwan Olaniran Ali Rashash R. Alzahrani Nada MohammedSaeed Alharbi Asma Ahmad Alzahrani Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification Mathematics generalized additive model (GAM) random forest (RF) logistic regression (LR) ensemble methods binary classification nonlinearity
title	Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_full	Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_fullStr	Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_full_unstemmed	Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_short	Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification
title_sort	random generalized additive logistic forest a novel ensemble method for robust binary classification
topic	generalized additive model (GAM) random forest (RF) logistic regression (LR) ensemble methods binary classification nonlinearity
url	https://www.mdpi.com/2227-7390/13/7/1214
work_keys_str_mv	AT oyebayoridwanolaniran randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification AT alirashashralzahrani randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification AT nadamohammedsaeedalharbi randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification AT asmaahmadalzahrani randomgeneralizedadditivelogisticforestanovelensemblemethodforrobustbinaryclassification

Random Generalized Additive Logistic Forest: A Novel Ensemble Method for Robust Binary Classification

Similar Items