Federated Analysis With Differential Privacy in Oncology Research: Longitudinal Observational Study Across Hospital Data Warehouses

Abstract BackgroundFederated analytics in health care allows researchers to perform statistical queries on remote datasets without access to the raw data. This method arose from the need to perform statistical analysis on larger datasets collected at multiple health care cente...

Full description

Saved in:
Bibliographic Details
Main Authors: Théo Ryffel, Perrine Créquit, Maëlle Baillet, Jason Paumier, Yasmine Marfoq, Olivier Girardot, Thierry Chanet, Ronan Sy, Louise Bayssat, Julien Mazières, Vincent Vuiblet, Julien Ancel, Maxime Dewolf, François Margraff, Camille Bachot, Jacek Chmiel
Format: Article
Language:English
Published: JMIR Publications 2025-07-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2025/1/e59685
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract BackgroundFederated analytics in health care allows researchers to perform statistical queries on remote datasets without access to the raw data. This method arose from the need to perform statistical analysis on larger datasets collected at multiple health care centers while avoiding regulatory, governance, and privacy issues that might arise if raw data were collected at a central location outside the health care centers. Despite some pioneering work, federated analytics is still not widely used on real-world data, and to our knowledge, no real-world study has yet combined it with other privacy-enhancing techniques such as differential privacy (DP). ObjectiveThe first objective of this study was to deploy a federated architecture in a real-world setting. The oncology study used for this deployment compared the medical health care management of patients with metastatic non–small cell lung cancer before and after the first wave of COVID-19 pandemic. The second goal was to test DP in this real-world scenario to assess its practicality and use as a privacy-enhancing technology. MethodsA federated architecture platform was set up in the Toulouse, Reims, and Foch centers. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD (Data aggregation through anonymous summary-statistics from harmonized individual-level databases), a federated analysis R library, and a new open-source DP DataSHIELD package was implemented (dsPrivacy). ResultsA total of 50 patients were enrolled in the Toulouse and Reims centers and 49 in the Foch center. We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once a sufficient setup has been established to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that DP can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work. ConclusionsThe federated architecture platform made it possible to run a multicenter study on real-world oncology data while ensuring strong privacy guarantees using differential privacy.
ISSN:2291-9694