Mixed effect gradient boosting for high-dimensional longitudinal data

Abstract High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergis...

Full description

Saved in:
Bibliographic Details
Main Authors: Oyebayo Ridwan Olaniran, Saidat Fehintola Olaniran, Jeza Allohibi, Abdulmajeed Atiah Alharbi, Nada MohammedSaeed Alharbi
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-16526-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226344194375680
author Oyebayo Ridwan Olaniran
Saidat Fehintola Olaniran
Jeza Allohibi
Abdulmajeed Atiah Alharbi
Nada MohammedSaeed Alharbi
author_facet Oyebayo Ridwan Olaniran
Saidat Fehintola Olaniran
Jeza Allohibi
Abdulmajeed Atiah Alharbi
Nada MohammedSaeed Alharbi
author_sort Oyebayo Ridwan Olaniran
collection DOAJ
description Abstract High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting’s inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes $$(p=2000)$$ ( p = 2000 ) . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data $$(n=12$$ ( n = 12 subjects, $$p=33,297$$ p = 33 , 297 transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters.
format Article
id doaj-art-40c7585868ba45b0b01ee59dde8cd644
institution Kabale University
issn 2045-2322
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-40c7585868ba45b0b01ee59dde8cd6442025-08-24T11:26:01ZengNature PortfolioScientific Reports2045-23222025-08-0115112410.1038/s41598-025-16526-zMixed effect gradient boosting for high-dimensional longitudinal dataOyebayo Ridwan Olaniran0Saidat Fehintola Olaniran1Jeza Allohibi2Abdulmajeed Atiah Alharbi3Nada MohammedSaeed Alharbi4Department of Statistics, Faculty of Physical Sciences, University of IlorinDepartment of Statistics and Mathematical Sciences, Faculty of Pure and Applied Sciences, Kwara State UniversityDepartment of Mathematics, Taibah University, Faculty of ScienceDepartment of Mathematics, Taibah University, Faculty of ScienceDepartment of Mathematics, Taibah University, Faculty of ScienceAbstract High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting’s inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes $$(p=2000)$$ ( p = 2000 ) . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data $$(n=12$$ ( n = 12 subjects, $$p=33,297$$ p = 33 , 297 transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters.https://doi.org/10.1038/s41598-025-16526-zMixed Effect ModelLongitudinal DataGradient BoostingHigh-dimensional Data
spellingShingle Oyebayo Ridwan Olaniran
Saidat Fehintola Olaniran
Jeza Allohibi
Abdulmajeed Atiah Alharbi
Nada MohammedSaeed Alharbi
Mixed effect gradient boosting for high-dimensional longitudinal data
Scientific Reports
Mixed Effect Model
Longitudinal Data
Gradient Boosting
High-dimensional Data
title Mixed effect gradient boosting for high-dimensional longitudinal data
title_full Mixed effect gradient boosting for high-dimensional longitudinal data
title_fullStr Mixed effect gradient boosting for high-dimensional longitudinal data
title_full_unstemmed Mixed effect gradient boosting for high-dimensional longitudinal data
title_short Mixed effect gradient boosting for high-dimensional longitudinal data
title_sort mixed effect gradient boosting for high dimensional longitudinal data
topic Mixed Effect Model
Longitudinal Data
Gradient Boosting
High-dimensional Data
url https://doi.org/10.1038/s41598-025-16526-z
work_keys_str_mv AT oyebayoridwanolaniran mixedeffectgradientboostingforhighdimensionallongitudinaldata
AT saidatfehintolaolaniran mixedeffectgradientboostingforhighdimensionallongitudinaldata
AT jezaallohibi mixedeffectgradientboostingforhighdimensionallongitudinaldata
AT abdulmajeedatiahalharbi mixedeffectgradientboostingforhighdimensionallongitudinaldata
AT nadamohammedsaeedalharbi mixedeffectgradientboostingforhighdimensionallongitudinaldata