Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks

Abstract Gene-environment interaction (G×E) analysis elucidates the interplay between genetic and environmental factors. Genome-wide association studies (GWAS) have expanded to encompass complex traits like time-to-event and ordinal traits, which provide richer phenotypic information. However, most...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuzhuo Ma, Yanlong Zhao, Ji-Feng Zhang, Wenjian Bi
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-57887-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850208724537311232
author Yuzhuo Ma
Yanlong Zhao
Ji-Feng Zhang
Wenjian Bi
author_facet Yuzhuo Ma
Yanlong Zhao
Ji-Feng Zhang
Wenjian Bi
author_sort Yuzhuo Ma
collection DOAJ
description Abstract Gene-environment interaction (G×E) analysis elucidates the interplay between genetic and environmental factors. Genome-wide association studies (GWAS) have expanded to encompass complex traits like time-to-event and ordinal traits, which provide richer phenotypic information. However, most existing scalable approaches focus only on quantitative or binary traits. Here we propose SPAGxECCT, a scalable and accurate framework for diverse trait types. SPAGxECCT fits a genotype-independent model and employs a hybrid strategy including saddlepoint approximation (SPA) for accurate p value calculation, especially for low-frequency variants and unbalanced phenotypic distributions. We extend SPAGxECCT to SPAGxEmixCCT, which accounts for population stratification and is applicable to multi-ancestry or admixed populations. SPAGxEmixCCT can further be extended to SPAGxEmixCCT-local, which identifies ancestry-specific G×E effects using local ancestry. Through extensive simulations and real data analyses of UK Biobank data, we demonstrate that SPAGxECCT and SPAGxEmixCCT are scalable to analyze large-scale study cohort, control type I error rates effectively, and maintain power.
format Article
id doaj-art-c5b50c36dacf4291983fde5acdf1e236
institution OA Journals
issn 2041-1723
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-c5b50c36dacf4291983fde5acdf1e2362025-08-20T02:10:10ZengNature PortfolioNature Communications2041-17232025-03-0116112110.1038/s41467-025-57887-3Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanksYuzhuo Ma0Yanlong Zhao1Ji-Feng Zhang2Wenjian Bi3Department of Medical Genetics, School of Basic Medical Sciences, Peking UniversityState Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of SciencesState Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of SciencesDepartment of Medical Genetics, School of Basic Medical Sciences, Peking UniversityAbstract Gene-environment interaction (G×E) analysis elucidates the interplay between genetic and environmental factors. Genome-wide association studies (GWAS) have expanded to encompass complex traits like time-to-event and ordinal traits, which provide richer phenotypic information. However, most existing scalable approaches focus only on quantitative or binary traits. Here we propose SPAGxECCT, a scalable and accurate framework for diverse trait types. SPAGxECCT fits a genotype-independent model and employs a hybrid strategy including saddlepoint approximation (SPA) for accurate p value calculation, especially for low-frequency variants and unbalanced phenotypic distributions. We extend SPAGxECCT to SPAGxEmixCCT, which accounts for population stratification and is applicable to multi-ancestry or admixed populations. SPAGxEmixCCT can further be extended to SPAGxEmixCCT-local, which identifies ancestry-specific G×E effects using local ancestry. Through extensive simulations and real data analyses of UK Biobank data, we demonstrate that SPAGxECCT and SPAGxEmixCCT are scalable to analyze large-scale study cohort, control type I error rates effectively, and maintain power.https://doi.org/10.1038/s41467-025-57887-3
spellingShingle Yuzhuo Ma
Yanlong Zhao
Ji-Feng Zhang
Wenjian Bi
Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
Nature Communications
title Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
title_full Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
title_fullStr Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
title_full_unstemmed Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
title_short Efficient and accurate framework for genome-wide gene-environment interaction analysis in large-scale biobanks
title_sort efficient and accurate framework for genome wide gene environment interaction analysis in large scale biobanks
url https://doi.org/10.1038/s41467-025-57887-3
work_keys_str_mv AT yuzhuoma efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks
AT yanlongzhao efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks
AT jifengzhang efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks
AT wenjianbi efficientandaccurateframeworkforgenomewidegeneenvironmentinteractionanalysisinlargescalebiobanks