A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.

<h4>Background</h4>DNA methylation (DNAm) provides a window to characterize the impacts of environmental exposures and the biological aging process. Epigenetic clocks are often trained on DNAm using penalized regression of CpG sites, but recent evidence suggests potential benefits of tra...

Full description

Saved in:
Bibliographic Details
Main Authors: Dennis Khodasevich, Nina Holland, Lars van der Laan, Andres Cardenas
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-02-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1012768
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823856927904366592
author Dennis Khodasevich
Nina Holland
Lars van der Laan
Andres Cardenas
author_facet Dennis Khodasevich
Nina Holland
Lars van der Laan
Andres Cardenas
author_sort Dennis Khodasevich
collection DOAJ
description <h4>Background</h4>DNA methylation (DNAm) provides a window to characterize the impacts of environmental exposures and the biological aging process. Epigenetic clocks are often trained on DNAm using penalized regression of CpG sites, but recent evidence suggests potential benefits of training epigenetic predictors on principal components.<h4>Methodology/findings</h4>We developed a pipeline to simultaneously train three epigenetic predictors; a traditional CpG Clock, a PCA Clock, and a SuperLearner PCA Clock (SL PCA). We gathered publicly available DNAm datasets to generate i) a novel childhood epigenetic clock, ii) a reconstructed Hannum adult blood clock, and iii) as a proof of concept, a predictor of polybrominated biphenyl exposure using the three developmental methodologies. We used correlation coefficients and median absolute error to assess fit between predicted and observed measures, as well as agreement between duplicates. The SL PCA clocks improved fit with observed phenotypes relative to the PCA clocks or CpG clocks across several datasets. We found evidence for higher agreement between duplicate samples run on alternate DNAm arrays when using SL PCA clocks relative to traditional methods. Analyses examining associations between relevant exposures and epigenetic age acceleration (EAA) produced more precise effect estimates when using predictions derived from SL PCA clocks.<h4>Conclusions</h4>We introduce a novel method for the development of DNAm-based predictors that combines the improved reliability conferred by training on principal components with advanced ensemble-based machine learning. Coupling SuperLearner with PCA in the predictor development process may be especially relevant for studies with longitudinal designs utilizing multiple array types, as well as for the development of predictors of more complex phenotypic traits.
format Article
id doaj-art-b482c4187e3d468db2def109da6b08fc
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2025-02-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-b482c4187e3d468db2def109da6b08fc2025-02-12T05:30:34ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-02-01212e101276810.1371/journal.pcbi.1012768A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.Dennis KhodasevichNina HollandLars van der LaanAndres Cardenas<h4>Background</h4>DNA methylation (DNAm) provides a window to characterize the impacts of environmental exposures and the biological aging process. Epigenetic clocks are often trained on DNAm using penalized regression of CpG sites, but recent evidence suggests potential benefits of training epigenetic predictors on principal components.<h4>Methodology/findings</h4>We developed a pipeline to simultaneously train three epigenetic predictors; a traditional CpG Clock, a PCA Clock, and a SuperLearner PCA Clock (SL PCA). We gathered publicly available DNAm datasets to generate i) a novel childhood epigenetic clock, ii) a reconstructed Hannum adult blood clock, and iii) as a proof of concept, a predictor of polybrominated biphenyl exposure using the three developmental methodologies. We used correlation coefficients and median absolute error to assess fit between predicted and observed measures, as well as agreement between duplicates. The SL PCA clocks improved fit with observed phenotypes relative to the PCA clocks or CpG clocks across several datasets. We found evidence for higher agreement between duplicate samples run on alternate DNAm arrays when using SL PCA clocks relative to traditional methods. Analyses examining associations between relevant exposures and epigenetic age acceleration (EAA) produced more precise effect estimates when using predictions derived from SL PCA clocks.<h4>Conclusions</h4>We introduce a novel method for the development of DNAm-based predictors that combines the improved reliability conferred by training on principal components with advanced ensemble-based machine learning. Coupling SuperLearner with PCA in the predictor development process may be especially relevant for studies with longitudinal designs utilizing multiple array types, as well as for the development of predictors of more complex phenotypic traits.https://doi.org/10.1371/journal.pcbi.1012768
spellingShingle Dennis Khodasevich
Nina Holland
Lars van der Laan
Andres Cardenas
A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.
PLoS Computational Biology
title A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.
title_full A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.
title_fullStr A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.
title_full_unstemmed A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.
title_short A SuperLearner-based pipeline for the development of DNA methylation-derived predictors of phenotypic traits.
title_sort superlearner based pipeline for the development of dna methylation derived predictors of phenotypic traits
url https://doi.org/10.1371/journal.pcbi.1012768
work_keys_str_mv AT denniskhodasevich asuperlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT ninaholland asuperlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT larsvanderlaan asuperlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT andrescardenas asuperlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT denniskhodasevich superlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT ninaholland superlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT larsvanderlaan superlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits
AT andrescardenas superlearnerbasedpipelineforthedevelopmentofdnamethylationderivedpredictorsofphenotypictraits