Learning Gaussian graphical models from correlated data
Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after “adjusting” for the ef...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-07-01
|
| Series: | Frontiers in Systems Biology |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fsysb.2025.1589079/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849708983507484672 |
|---|---|
| author | Zeyuan Song Zeyuan Song Sophia Gunn Stefano Monti Stefano Monti Gina M. Peloso Ching-Ti Liu Kathryn Lunetta Paola Sebastiani Paola Sebastiani Paola Sebastiani |
| author_facet | Zeyuan Song Zeyuan Song Sophia Gunn Stefano Monti Stefano Monti Gina M. Peloso Ching-Ti Liu Kathryn Lunetta Paola Sebastiani Paola Sebastiani Paola Sebastiani |
| author_sort | Zeyuan Song |
| collection | DOAJ |
| description | Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after “adjusting” for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the Gaussian Graphic Model that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss. |
| format | Article |
| id | doaj-art-850c62e08435461dbfc02f64a138c2de |
| institution | DOAJ |
| issn | 2674-0702 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Systems Biology |
| spelling | doaj-art-850c62e08435461dbfc02f64a138c2de2025-08-20T03:15:27ZengFrontiers Media S.A.Frontiers in Systems Biology2674-07022025-07-01510.3389/fsysb.2025.15890791589079Learning Gaussian graphical models from correlated dataZeyuan Song0Zeyuan Song1Sophia Gunn2Stefano Monti3Stefano Monti4Gina M. Peloso5Ching-Ti Liu6Kathryn Lunetta7Paola Sebastiani8Paola Sebastiani9Paola Sebastiani10Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United StatesDepartment of Medicine, Tufts University School of Medicine, Boston, MA, United StatesThe New York Genome Center, New York, NY, United StatesSection of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United StatesBioinformatics Program, Boston University, Boston, MA, United StatesDepartment of Biostatistics, Boston University School of Public Health, Boston, MA, United StatesDepartment of Biostatistics, Boston University School of Public Health, Boston, MA, United StatesDepartment of Biostatistics, Boston University School of Public Health, Boston, MA, United StatesInstitute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, United StatesDepartment of Medicine, Tufts University School of Medicine, Boston, MA, United StatesData Intensive Study Center, Tufts University, Medford, MA, United StatesGaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after “adjusting” for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the Gaussian Graphic Model that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.https://www.frontiersin.org/articles/10.3389/fsysb.2025.1589079/fullGaussian graphical modelscorelated databootstrappolygenic risk scorepartial correlation |
| spellingShingle | Zeyuan Song Zeyuan Song Sophia Gunn Stefano Monti Stefano Monti Gina M. Peloso Ching-Ti Liu Kathryn Lunetta Paola Sebastiani Paola Sebastiani Paola Sebastiani Learning Gaussian graphical models from correlated data Frontiers in Systems Biology Gaussian graphical models corelated data bootstrap polygenic risk score partial correlation |
| title | Learning Gaussian graphical models from correlated data |
| title_full | Learning Gaussian graphical models from correlated data |
| title_fullStr | Learning Gaussian graphical models from correlated data |
| title_full_unstemmed | Learning Gaussian graphical models from correlated data |
| title_short | Learning Gaussian graphical models from correlated data |
| title_sort | learning gaussian graphical models from correlated data |
| topic | Gaussian graphical models corelated data bootstrap polygenic risk score partial correlation |
| url | https://www.frontiersin.org/articles/10.3389/fsysb.2025.1589079/full |
| work_keys_str_mv | AT zeyuansong learninggaussiangraphicalmodelsfromcorrelateddata AT zeyuansong learninggaussiangraphicalmodelsfromcorrelateddata AT sophiagunn learninggaussiangraphicalmodelsfromcorrelateddata AT stefanomonti learninggaussiangraphicalmodelsfromcorrelateddata AT stefanomonti learninggaussiangraphicalmodelsfromcorrelateddata AT ginampeloso learninggaussiangraphicalmodelsfromcorrelateddata AT chingtiliu learninggaussiangraphicalmodelsfromcorrelateddata AT kathrynlunetta learninggaussiangraphicalmodelsfromcorrelateddata AT paolasebastiani learninggaussiangraphicalmodelsfromcorrelateddata AT paolasebastiani learninggaussiangraphicalmodelsfromcorrelateddata AT paolasebastiani learninggaussiangraphicalmodelsfromcorrelateddata |