Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS

Summary: Trusted execution environments (TEEs), such as Intel SGX, enable secure, privacy-preserving computations but may have computational resource constraints. To address this, methods like SkSES use sketching for genome-wide association studies (GWAS) across distributed datasets while maintainin...

Full description

Saved in:
Bibliographic Details
Main Authors: Junyan Xu, Kaiyuan Zhu, Jieling Cai, Can Kockan, Natnatee Dokmai, Hyunghoon Cho, David P. Woodruff, S. Cenk Sahinalp
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:iScience
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589004225002718
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849773325941735424
author Junyan Xu
Kaiyuan Zhu
Jieling Cai
Can Kockan
Natnatee Dokmai
Hyunghoon Cho
David P. Woodruff
S. Cenk Sahinalp
author_facet Junyan Xu
Kaiyuan Zhu
Jieling Cai
Can Kockan
Natnatee Dokmai
Hyunghoon Cho
David P. Woodruff
S. Cenk Sahinalp
author_sort Junyan Xu
collection DOAJ
description Summary: Trusted execution environments (TEEs), such as Intel SGX, enable secure, privacy-preserving computations but may have computational resource constraints. To address this, methods like SkSES use sketching for genome-wide association studies (GWAS) across distributed datasets while maintaining privacy. Here, we present a learning-augmented version of SkSES for more accurate identification of significant SNPs. Our method first conducts GWAS on a public training dataset to locally identify significant SNPs. These SNPs are assigned dedicated memory to enable more precise selection of significant SNPs over the entire dataset while optimizing memory usage. Our method maintains the stringent privacy guarantees of SkSES, ensuring sensitive genotype data remains undisclosed to other institutions or cloud providers. Experimental results on benchmark datasets show the learning-augmented version achieves up to 40% higher accuracy compared to the original SkSES under identical memory constraints. This advancement improves the scalability and effectiveness of collaborative GWAS studies in TEEs.
format Article
id doaj-art-0bbc87a2f45d44f4a3b64ab452a64cd9
institution DOAJ
issn 2589-0042
language English
publishDate 2025-03-01
publisher Elsevier
record_format Article
series iScience
spelling doaj-art-0bbc87a2f45d44f4a3b64ab452a64cd92025-08-20T03:02:06ZengElsevieriScience2589-00422025-03-0128311201110.1016/j.isci.2025.112011Learning-augmented sketching offers improved performance for privacy preserving and secure GWASJunyan Xu0Kaiyuan Zhu1Jieling Cai2Can Kockan3Natnatee Dokmai4Hyunghoon Cho5David P. Woodruff6S. Cenk Sahinalp7Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USADepartment of Computer Science and Engineering, UC San Diego, La Jolla, CA, USADepartment of Computer Science, Tufts University, Medford, MA, USABroad Institute of MIT and Harvard, Cambridge, MA, USADepartment of Biomedical Informatics and Data Science, Yale University, New Haven, CT, USADepartment of Biomedical Informatics and Data Science, Yale University, New Haven, CT, USADepartment of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USACancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA; Corresponding authorSummary: Trusted execution environments (TEEs), such as Intel SGX, enable secure, privacy-preserving computations but may have computational resource constraints. To address this, methods like SkSES use sketching for genome-wide association studies (GWAS) across distributed datasets while maintaining privacy. Here, we present a learning-augmented version of SkSES for more accurate identification of significant SNPs. Our method first conducts GWAS on a public training dataset to locally identify significant SNPs. These SNPs are assigned dedicated memory to enable more precise selection of significant SNPs over the entire dataset while optimizing memory usage. Our method maintains the stringent privacy guarantees of SkSES, ensuring sensitive genotype data remains undisclosed to other institutions or cloud providers. Experimental results on benchmark datasets show the learning-augmented version achieves up to 40% higher accuracy compared to the original SkSES under identical memory constraints. This advancement improves the scalability and effectiveness of collaborative GWAS studies in TEEs.http://www.sciencedirect.com/science/article/pii/S2589004225002718GeneticsHealth technology
spellingShingle Junyan Xu
Kaiyuan Zhu
Jieling Cai
Can Kockan
Natnatee Dokmai
Hyunghoon Cho
David P. Woodruff
S. Cenk Sahinalp
Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
iScience
Genetics
Health technology
title Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
title_full Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
title_fullStr Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
title_full_unstemmed Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
title_short Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
title_sort learning augmented sketching offers improved performance for privacy preserving and secure gwas
topic Genetics
Health technology
url http://www.sciencedirect.com/science/article/pii/S2589004225002718
work_keys_str_mv AT junyanxu learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT kaiyuanzhu learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT jielingcai learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT cankockan learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT natnateedokmai learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT hyunghooncho learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT davidpwoodruff learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas
AT scenksahinalp learningaugmentedsketchingoffersimprovedperformanceforprivacypreservingandsecuregwas