Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages

Abstract Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen Wang, Havell Markus, Avantika R. Diwadkar, Chachrit Khunsriraksakul, Laura Carrel, Bingshan Li, Xue Zhong, Xingyan Wang, Xiaowei Zhan, Galen T. Foulke, Nancy J. Olsen, Dajiang J. Liu, Bibo Jiang
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-55636-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559321878462464
author Chen Wang
Havell Markus
Avantika R. Diwadkar
Chachrit Khunsriraksakul
Laura Carrel
Bingshan Li
Xue Zhong
Xingyan Wang
Xiaowei Zhan
Galen T. Foulke
Nancy J. Olsen
Dajiang J. Liu
Bibo Jiang
author_facet Chen Wang
Havell Markus
Avantika R. Diwadkar
Chachrit Khunsriraksakul
Laura Carrel
Bingshan Li
Xue Zhong
Xingyan Wang
Xiaowei Zhan
Galen T. Foulke
Nancy J. Olsen
Dajiang J. Liu
Bibo Jiang
author_sort Chen Wang
collection DOAJ
description Abstract Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction $${R}^{2}$$ R 2 and the resulting PRS yields the strongest correlation with progression prevalence.
format Article
id doaj-art-dcdb9d33eb044eb0897989b8468b8fa4
institution Kabale University
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-dcdb9d33eb044eb0897989b8468b8fa42025-01-05T12:37:58ZengNature PortfolioNature Communications2041-17232025-01-0116111710.1038/s41467-024-55636-6Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stagesChen Wang0Havell Markus1Avantika R. Diwadkar2Chachrit Khunsriraksakul3Laura Carrel4Bingshan Li5Xue Zhong6Xingyan Wang7Xiaowei Zhan8Galen T. Foulke9Nancy J. Olsen10Dajiang J. Liu11Bibo Jiang12Bioinformatics and Genomics Graduate Program, College of Medicine, Penn State UniversityBioinformatics and Genomics Graduate Program, College of Medicine, Penn State UniversityBioinformatics and Genomics Graduate Program, College of Medicine, Penn State UniversityBioinformatics and Genomics Graduate Program, College of Medicine, Penn State UniversityDepartment of Biochemistry and Molecular Biology, College of Medicine, Penn State UniversityDepartment of Molecular Physiology & Biophysics, Vanderbilt UniversityDepartment of Medicine, Division of Genetic Medicine, Vanderbilt University Medical CenterDepartment of Public Health Sciences, College of Medicine, Penn State UniversityDepartment of Statistical Science, Southern Methodist UniversityDepartment of Public Health Sciences, College of Medicine, Penn State UniversityDepartment of Medicine, College of Medicine, Penn State UniversityBioinformatics and Genomics Graduate Program, College of Medicine, Penn State UniversityDepartment of Public Health Sciences, College of Medicine, Penn State UniversityAbstract Autoimmune diseases often exhibit a preclinical stage before diagnosis. Electronic health record (EHR) based-biobanks contain genetic data and diagnostic information, which can identify preclinical individuals at risk for progression. Biobanks typically have small numbers of cases, which are not sufficient to construct accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have shared genetic basis, which we can exploit to improve prediction accuracy. We propose a novel method Genetic Progression Score (GPS) that integrates biobank and case-control study to predict the disease progression risk. Via penalized regression, GPS incorporates PRS weights for case-control studies as prior and forces model parameters to be similar to the prior if the prior improves prediction accuracy. In simulations, GPS consistently yields better prediction accuracy than alternative strategies relying on biobank or case-control samples only and those combining biobank and case-control samples. The improvement is particularly evident when biobank sample is smaller or the genetic correlation is lower. We derive PRS for the progression from preclinical rheumatoid arthritis and systemic lupus erythematosus in the BioVU biobank and validate them in All of Us. For both diseases, GPS achieves the highest prediction $${R}^{2}$$ R 2 and the resulting PRS yields the strongest correlation with progression prevalence.https://doi.org/10.1038/s41467-024-55636-6
spellingShingle Chen Wang
Havell Markus
Avantika R. Diwadkar
Chachrit Khunsriraksakul
Laura Carrel
Bingshan Li
Xue Zhong
Xingyan Wang
Xiaowei Zhan
Galen T. Foulke
Nancy J. Olsen
Dajiang J. Liu
Bibo Jiang
Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
Nature Communications
title Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
title_full Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
title_fullStr Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
title_full_unstemmed Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
title_short Integrating electronic health records and GWAS summary statistics to predict the progression of autoimmune diseases from preclinical stages
title_sort integrating electronic health records and gwas summary statistics to predict the progression of autoimmune diseases from preclinical stages
url https://doi.org/10.1038/s41467-024-55636-6
work_keys_str_mv AT chenwang integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT havellmarkus integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT avantikardiwadkar integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT chachritkhunsriraksakul integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT lauracarrel integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT bingshanli integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT xuezhong integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT xingyanwang integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT xiaoweizhan integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT galentfoulke integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT nancyjolsen integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT dajiangjliu integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages
AT bibojiang integratingelectronichealthrecordsandgwassummarystatisticstopredicttheprogressionofautoimmunediseasesfrompreclinicalstages