Synthetic data for privacy-preserving clinical risk prediction
Abstract Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches—such as federated learning—analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act i...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-10-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-024-72894-y |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849235242774167552 |
|---|---|
| author | Zhaozhi Qian Thomas Callender Bogdan Cebere Sam M. Janes Neal Navani Mihaela van der Schaar |
| author_facet | Zhaozhi Qian Thomas Callender Bogdan Cebere Sam M. Janes Neal Navani Mihaela van der Schaar |
| author_sort | Zhaozhi Qian |
| collection | DOAJ |
| description | Abstract Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches—such as federated learning—analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system. |
| format | Article |
| id | doaj-art-7630251e41d64bfea3d915e53a907c70 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-7630251e41d64bfea3d915e53a907c702025-08-20T04:02:51ZengNature PortfolioScientific Reports2045-23222024-10-0114111410.1038/s41598-024-72894-ySynthetic data for privacy-preserving clinical risk predictionZhaozhi Qian0Thomas Callender1Bogdan Cebere2Sam M. Janes3Neal Navani4Mihaela van der Schaar5University of CambridgeUniversity College LondonUniversity of CambridgeUniversity College LondonUniversity College LondonUniversity of CambridgeAbstract Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches—such as federated learning—analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system.https://doi.org/10.1038/s41598-024-72894-ySynthetic dataMachine learningRisk-prediction |
| spellingShingle | Zhaozhi Qian Thomas Callender Bogdan Cebere Sam M. Janes Neal Navani Mihaela van der Schaar Synthetic data for privacy-preserving clinical risk prediction Scientific Reports Synthetic data Machine learning Risk-prediction |
| title | Synthetic data for privacy-preserving clinical risk prediction |
| title_full | Synthetic data for privacy-preserving clinical risk prediction |
| title_fullStr | Synthetic data for privacy-preserving clinical risk prediction |
| title_full_unstemmed | Synthetic data for privacy-preserving clinical risk prediction |
| title_short | Synthetic data for privacy-preserving clinical risk prediction |
| title_sort | synthetic data for privacy preserving clinical risk prediction |
| topic | Synthetic data Machine learning Risk-prediction |
| url | https://doi.org/10.1038/s41598-024-72894-y |
| work_keys_str_mv | AT zhaozhiqian syntheticdataforprivacypreservingclinicalriskprediction AT thomascallender syntheticdataforprivacypreservingclinicalriskprediction AT bogdancebere syntheticdataforprivacypreservingclinicalriskprediction AT sammjanes syntheticdataforprivacypreservingclinicalriskprediction AT nealnavani syntheticdataforprivacypreservingclinicalriskprediction AT mihaelavanderschaar syntheticdataforprivacypreservingclinicalriskprediction |