Regularized regression outperforms trees for predicting cognitive function in the Health and Retirement Study

Background: Generalized linear models have been favored in healthcare research due to their interpretability. In contrast, tree-based models, such as random forest or boosted trees, are often preferred in machine learning (ML) and commercial settings due to their strong predictive performance. Howev...

Full description

Saved in:
Bibliographic Details
Main Authors: Kyle Masato Ishikawa, Deborah Taira, Joseph Keaweʻaimoku Kaholokula, Matthew Uechi, James Davis, Eunjung Lim
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827025000775
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background: Generalized linear models have been favored in healthcare research due to their interpretability. In contrast, tree-based models, such as random forest or boosted trees, are often preferred in machine learning (ML) and commercial settings due to their strong predictive performance. However, for clinical applications, model interpretability remains essential for actionable results and patient understanding. This study used ML to detect cognitive decline for the purpose of timely screening and uncovering associations with psychosocial determinants. All models were interpreted to enhance transparency and understanding of their predictions. Methods: Data from the 2018 to 2020 Health and Retirement Study was used to create three linear regression models and three tree-based models. Ten percent of the sample was withheld for estimating performance, and model tuning used five-fold cross validation with two repeats. Survey frequency weights were applied during tuning, training, and final evaluation. Model performance was evaluated using RMSE and R2 and interpretability was assessed via coefficients, variable importance, and decision trees. Results: The elastic net model had the best performance (RMSE = 3.520, R2 = 0.435), followed by standard linear regression, boosted trees, random forest, multivariate adaptive regression splines, and lastly, decision trees. Across all models, baseline cognitive function and frequency of computer use were the most influential predictors. Conclusion: Elastic net regression outperformed tree-based models, suggesting that cognitive outcomes may be best modeled with additive linear relationships. Its ability to remove correlated and weak predictors contributed to its balance of interpretability and predictive performance for this particular dataset.
ISSN:2666-8270