Learning Gaussian graphical models from correlated data

Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after “adjusting” for the ef...

Full description

Saved in:
Bibliographic Details
Main Authors: Zeyuan Song, Sophia Gunn, Stefano Monti, Gina M. Peloso, Ching-Ti Liu, Kathryn Lunetta, Paola Sebastiani
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Systems Biology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fsysb.2025.1589079/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849708983507484672
author Zeyuan Song
Zeyuan Song
Sophia Gunn
Stefano Monti
Stefano Monti
Gina M. Peloso
Ching-Ti Liu
Kathryn Lunetta
Paola Sebastiani
Paola Sebastiani
Paola Sebastiani
author_facet Zeyuan Song
Zeyuan Song
Sophia Gunn
Stefano Monti
Stefano Monti
Gina M. Peloso
Ching-Ti Liu
Kathryn Lunetta
Paola Sebastiani
Paola Sebastiani
Paola Sebastiani
author_sort Zeyuan Song
collection DOAJ
description Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after “adjusting” for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the Gaussian Graphic Model that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.
format Article
id doaj-art-850c62e08435461dbfc02f64a138c2de
institution DOAJ
issn 2674-0702
language English
publishDate 2025-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Systems Biology
spelling doaj-art-850c62e08435461dbfc02f64a138c2de2025-08-20T03:15:27ZengFrontiers Media S.A.Frontiers in Systems Biology2674-07022025-07-01510.3389/fsysb.2025.15890791589079Learning Gaussian graphical models from correlated dataZeyuan Song0Zeyuan Song1Sophia Gunn2Stefano Monti3Stefano Monti4Gina M. Peloso5Ching-Ti Liu6Kathryn Lunetta7Paola Sebastiani8Paola Sebastiani9Paola Sebastiani10Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United StatesDepartment of Medicine, Tufts University School of Medicine, Boston, MA, United StatesThe New York Genome Center, New York, NY, United StatesSection of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United StatesBioinformatics Program, Boston University, Boston, MA, United StatesDepartment of Biostatistics, Boston University School of Public Health, Boston, MA, United StatesDepartment of Biostatistics, Boston University School of Public Health, Boston, MA, United StatesDepartment of Biostatistics, Boston University School of Public Health, Boston, MA, United StatesInstitute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United StatesDepartment of Medicine, Tufts University School of Medicine, Boston, MA, United StatesData Intensive Study Center, Tufts University, Medford, MA, United StatesGaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after “adjusting” for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the Gaussian Graphic Model that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.https://www.frontiersin.org/articles/10.3389/fsysb.2025.1589079/fullGaussian graphical modelscorelated databootstrappolygenic risk scorepartial correlation
spellingShingle Zeyuan Song
Zeyuan Song
Sophia Gunn
Stefano Monti
Stefano Monti
Gina M. Peloso
Ching-Ti Liu
Kathryn Lunetta
Paola Sebastiani
Paola Sebastiani
Paola Sebastiani
Learning Gaussian graphical models from correlated data
Frontiers in Systems Biology
Gaussian graphical models
corelated data
bootstrap
polygenic risk score
partial correlation
title Learning Gaussian graphical models from correlated data
title_full Learning Gaussian graphical models from correlated data
title_fullStr Learning Gaussian graphical models from correlated data
title_full_unstemmed Learning Gaussian graphical models from correlated data
title_short Learning Gaussian graphical models from correlated data
title_sort learning gaussian graphical models from correlated data
topic Gaussian graphical models
corelated data
bootstrap
polygenic risk score
partial correlation
url https://www.frontiersin.org/articles/10.3389/fsysb.2025.1589079/full
work_keys_str_mv AT zeyuansong learninggaussiangraphicalmodelsfromcorrelateddata
AT zeyuansong learninggaussiangraphicalmodelsfromcorrelateddata
AT sophiagunn learninggaussiangraphicalmodelsfromcorrelateddata
AT stefanomonti learninggaussiangraphicalmodelsfromcorrelateddata
AT stefanomonti learninggaussiangraphicalmodelsfromcorrelateddata
AT ginampeloso learninggaussiangraphicalmodelsfromcorrelateddata
AT chingtiliu learninggaussiangraphicalmodelsfromcorrelateddata
AT kathrynlunetta learninggaussiangraphicalmodelsfromcorrelateddata
AT paolasebastiani learninggaussiangraphicalmodelsfromcorrelateddata
AT paolasebastiani learninggaussiangraphicalmodelsfromcorrelateddata
AT paolasebastiani learninggaussiangraphicalmodelsfromcorrelateddata