Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data
Abstract We conducted the first comprehensive association analysis of a coronary artery disease (CAD) cohort within the recently released UK Biobank (UKB) whole genome sequencing dataset. We employed fine mapping tool PolyFun and pinpoint rs10757274 as the most likely causal SNV within the 9p21.3 CA...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-95286-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850208727407263744 |
|---|---|
| author | Letitia M.F. Sng Anubhav Kaphle Mitchell J. O’Brien Brendan Hosking Roc Reguant Johan Verjans Yatish Jain Natalie A. Twine Denis C. Bauer |
| author_facet | Letitia M.F. Sng Anubhav Kaphle Mitchell J. O’Brien Brendan Hosking Roc Reguant Johan Verjans Yatish Jain Natalie A. Twine Denis C. Bauer |
| author_sort | Letitia M.F. Sng |
| collection | DOAJ |
| description | Abstract We conducted the first comprehensive association analysis of a coronary artery disease (CAD) cohort within the recently released UK Biobank (UKB) whole genome sequencing dataset. We employed fine mapping tool PolyFun and pinpoint rs10757274 as the most likely causal SNV within the 9p21.3 CAD risk locus. Notably, we show that machine-learning (ML) approaches, REGENIE and VariantSpark, exhibited greater sensitivity compared to traditional single-SNV logistic regression, uncovering rs28451064 a known risk locus in 21q22.11. Our findings underscore the utility of leveraging advanced computational techniques and cloud-based resources for mega-biobank analyses. Aligning with the paradigm shift of bringing compute to data, we demonstrate a 44% cost reduction and 94% speedup through compute architecture optimisation on UK Biobank’s Research Analysis Platform using our RAPpoet approach. We discuss three considerations for researchers implementing novel workflows for datasets hosted on cloud-platforms, to pave the way for harnessing mega-biobank-sized data through scalable, cost-effective cloud computing solutions. |
| format | Article |
| id | doaj-art-ca75fc9fbf1048d9818e8e489ad35b50 |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-ca75fc9fbf1048d9818e8e489ad35b502025-08-20T02:10:10ZengNature PortfolioScientific Reports2045-23222025-03-011511910.1038/s41598-025-95286-2Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing dataLetitia M.F. Sng0Anubhav Kaphle1Mitchell J. O’Brien2Brendan Hosking3Roc Reguant4Johan Verjans5Yatish Jain6Natalie A. Twine7Denis C. Bauer8Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Australian institute for Machine Learning, University of AdelaideAustralian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO)Applied Biosciences, Faculty of Science and Engineering, Macquarie UniversityAbstract We conducted the first comprehensive association analysis of a coronary artery disease (CAD) cohort within the recently released UK Biobank (UKB) whole genome sequencing dataset. We employed fine mapping tool PolyFun and pinpoint rs10757274 as the most likely causal SNV within the 9p21.3 CAD risk locus. Notably, we show that machine-learning (ML) approaches, REGENIE and VariantSpark, exhibited greater sensitivity compared to traditional single-SNV logistic regression, uncovering rs28451064 a known risk locus in 21q22.11. Our findings underscore the utility of leveraging advanced computational techniques and cloud-based resources for mega-biobank analyses. Aligning with the paradigm shift of bringing compute to data, we demonstrate a 44% cost reduction and 94% speedup through compute architecture optimisation on UK Biobank’s Research Analysis Platform using our RAPpoet approach. We discuss three considerations for researchers implementing novel workflows for datasets hosted on cloud-platforms, to pave the way for harnessing mega-biobank-sized data through scalable, cost-effective cloud computing solutions.https://doi.org/10.1038/s41598-025-95286-2Population-scale geneticsUK BiobankDNAnexusCloud-computingGWASTrusted research environments |
| spellingShingle | Letitia M.F. Sng Anubhav Kaphle Mitchell J. O’Brien Brendan Hosking Roc Reguant Johan Verjans Yatish Jain Natalie A. Twine Denis C. Bauer Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data Scientific Reports Population-scale genetics UK Biobank DNAnexus Cloud-computing GWAS Trusted research environments |
| title | Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data |
| title_full | Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data |
| title_fullStr | Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data |
| title_full_unstemmed | Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data |
| title_short | Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data |
| title_sort | optimizing uk biobank cloud based research analysis platform to fine map coronary artery disease loci in whole genome sequencing data |
| topic | Population-scale genetics UK Biobank DNAnexus Cloud-computing GWAS Trusted research environments |
| url | https://doi.org/10.1038/s41598-025-95286-2 |
| work_keys_str_mv | AT letitiamfsng optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT anubhavkaphle optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT mitchelljobrien optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT brendanhosking optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT rocreguant optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT johanverjans optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT yatishjain optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT natalieatwine optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata AT deniscbauer optimizingukbiobankcloudbasedresearchanalysisplatformtofinemapcoronaryarterydiseaselociinwholegenomesequencingdata |