Mixed effect gradient boosting for high-dimensional longitudinal data
Abstract High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergis...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-16526-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849226344194375680 |
|---|---|
| author | Oyebayo Ridwan Olaniran Saidat Fehintola Olaniran Jeza Allohibi Abdulmajeed Atiah Alharbi Nada MohammedSaeed Alharbi |
| author_facet | Oyebayo Ridwan Olaniran Saidat Fehintola Olaniran Jeza Allohibi Abdulmajeed Atiah Alharbi Nada MohammedSaeed Alharbi |
| author_sort | Oyebayo Ridwan Olaniran |
| collection | DOAJ |
| description | Abstract High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting’s inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes $$(p=2000)$$ ( p = 2000 ) . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data $$(n=12$$ ( n = 12 subjects, $$p=33,297$$ p = 33 , 297 transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters. |
| format | Article |
| id | doaj-art-40c7585868ba45b0b01ee59dde8cd644 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-40c7585868ba45b0b01ee59dde8cd6442025-08-24T11:26:01ZengNature PortfolioScientific Reports2045-23222025-08-0115112410.1038/s41598-025-16526-zMixed effect gradient boosting for high-dimensional longitudinal dataOyebayo Ridwan Olaniran0Saidat Fehintola Olaniran1Jeza Allohibi2Abdulmajeed Atiah Alharbi3Nada MohammedSaeed Alharbi4Department of Statistics, Faculty of Physical Sciences, University of IlorinDepartment of Statistics and Mathematical Sciences, Faculty of Pure and Applied Sciences, Kwara State UniversityDepartment of Mathematics, Taibah University, Faculty of ScienceDepartment of Mathematics, Taibah University, Faculty of ScienceDepartment of Mathematics, Taibah University, Faculty of ScienceAbstract High-dimensional longitudinal data present significant analytical challenges due to intricate within-subject correlations and an overwhelming ratio of predictors to observations. To address these challenges, we introduce Mixed-Effect Gradient Boosting (MEGB), a novel R package that synergises gradient boosting with mixed-effects modelling to simultaneously account for population-level fixed effects and subject-specific random variability. MEGB provides a unified framework for analysing repeated measures data that accommodates complex covariance structures while harnessing gradient boosting’s inherent regularisation for robust feature selection and prediction. In comprehensive simulations spanning linear and nonlinear data-generating processes, MEGB achieved 35-76% lower mean squared error (MSE) compared to state-of-the-art alternatives like Mixed-Effect Random Forests (MERF) and REEMForest, while maintaining 55-70% true positive rates for variable selection in ultra-high-dimensional regimes $$(p=2000)$$ ( p = 2000 ) . Demonstrating practical utility, we applied MEGB to maternal cell-free plasma RNA data $$(n=12$$ ( n = 12 subjects, $$p=33,297$$ p = 33 , 297 transcripts), where it identified 9 key placental transcripts driving fetal RNA dynamics across pregnancy trimesters.https://doi.org/10.1038/s41598-025-16526-zMixed Effect ModelLongitudinal DataGradient BoostingHigh-dimensional Data |
| spellingShingle | Oyebayo Ridwan Olaniran Saidat Fehintola Olaniran Jeza Allohibi Abdulmajeed Atiah Alharbi Nada MohammedSaeed Alharbi Mixed effect gradient boosting for high-dimensional longitudinal data Scientific Reports Mixed Effect Model Longitudinal Data Gradient Boosting High-dimensional Data |
| title | Mixed effect gradient boosting for high-dimensional longitudinal data |
| title_full | Mixed effect gradient boosting for high-dimensional longitudinal data |
| title_fullStr | Mixed effect gradient boosting for high-dimensional longitudinal data |
| title_full_unstemmed | Mixed effect gradient boosting for high-dimensional longitudinal data |
| title_short | Mixed effect gradient boosting for high-dimensional longitudinal data |
| title_sort | mixed effect gradient boosting for high dimensional longitudinal data |
| topic | Mixed Effect Model Longitudinal Data Gradient Boosting High-dimensional Data |
| url | https://doi.org/10.1038/s41598-025-16526-z |
| work_keys_str_mv | AT oyebayoridwanolaniran mixedeffectgradientboostingforhighdimensionallongitudinaldata AT saidatfehintolaolaniran mixedeffectgradientboostingforhighdimensionallongitudinaldata AT jezaallohibi mixedeffectgradientboostingforhighdimensionallongitudinaldata AT abdulmajeedatiahalharbi mixedeffectgradientboostingforhighdimensionallongitudinaldata AT nadamohammedsaeedalharbi mixedeffectgradientboostingforhighdimensionallongitudinaldata |