Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
Abstract Gene-environment interaction (G×E) analysis elucidates the interplay between genetic and environmental factors. Genome-wide association studies (GWAS) have expanded to encompass complex traits like time-to-event and ordinal traits, which provide richer phenotypic information. However, most...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-57887-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850208724537311232 |
|---|---|
| author | Yuzhuo Ma Yanlong Zhao Ji-Feng Zhang Wenjian Bi |
| author_facet | Yuzhuo Ma Yanlong Zhao Ji-Feng Zhang Wenjian Bi |
| author_sort | Yuzhuo Ma |
| collection | DOAJ |
| description | Abstract Gene-environment interaction (G×E) analysis elucidates the interplay between genetic and environmental factors. Genome-wide association studies (GWAS) have expanded to encompass complex traits like time-to-event and ordinal traits, which provide richer phenotypic information. However, most existing scalable approaches focus only on quantitative or binary traits. Here we propose SPAGxECCT, a scalable and accurate framework for diverse trait types. SPAGxECCT fits a genotype-independent model and employs a hybrid strategy including saddlepoint approximation (SPA) for accurate p value calculation, especially for low-frequency variants and unbalanced phenotypic distributions. We extend SPAGxECCT to SPAGxEmixCCT, which accounts for population stratification and is applicable to multi-ancestry or admixed populations. SPAGxEmixCCT can further be extended to SPAGxEmixCCT-local, which identifies ancestry-specific G×E effects using local ancestry. Through extensive simulations and real data analyses of UK Biobank data, we demonstrate that SPAGxECCT and SPAGxEmixCCT are scalable to analyze large-scale study cohort, control type I error rates effectively, and maintain power. |
| format | Article |
| id | doaj-art-c5b50c36dacf4291983fde5acdf1e236 |
| institution | OA Journals |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-c5b50c36dacf4291983fde5acdf1e2362025-08-20T02:10:10ZengNature PortfolioNature Communications2041-17232025-03-0116112110.1038/s41467-025-57887-3Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanksYuzhuo Ma0Yanlong Zhao1Ji-Feng Zhang2Wenjian Bi3Department of Medical Genetics, School of Basic Medical Sciences, Peking UniversityState Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of SciencesState Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of SciencesDepartment of Medical Genetics, School of Basic Medical Sciences, Peking UniversityAbstract Gene-environment interaction (G×E) analysis elucidates the interplay between genetic and environmental factors. Genome-wide association studies (GWAS) have expanded to encompass complex traits like time-to-event and ordinal traits, which provide richer phenotypic information. However, most existing scalable approaches focus only on quantitative or binary traits. Here we propose SPAGxECCT, a scalable and accurate framework for diverse trait types. SPAGxECCT fits a genotype-independent model and employs a hybrid strategy including saddlepoint approximation (SPA) for accurate p value calculation, especially for low-frequency variants and unbalanced phenotypic distributions. We extend SPAGxECCT to SPAGxEmixCCT, which accounts for population stratification and is applicable to multi-ancestry or admixed populations. SPAGxEmixCCT can further be extended to SPAGxEmixCCT-local, which identifies ancestry-specific G×E effects using local ancestry. Through extensive simulations and real data analyses of UK Biobank data, we demonstrate that SPAGxECCT and SPAGxEmixCCT are scalable to analyze large-scale study cohort, control type I error rates effectively, and maintain power.https://doi.org/10.1038/s41467-025-57887-3 |
| spellingShingle | Yuzhuo Ma Yanlong Zhao Ji-Feng Zhang Wenjian Bi Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks Nature Communications |
| title | Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks |
| title_full | Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks |
| title_fullStr | Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks |
| title_full_unstemmed | Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks |
| title_short | Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks |
| title_sort | efficient and accurate framework for genome wide gene environment interaction analysis in large scale biobanks |
| url | https://doi.org/10.1038/s41467-025-57887-3 |
| work_keys_str_mv | AT yuzhuoma efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks AT yanlongzhao efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks AT jifengzhang efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks AT wenjianbi efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks |