Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics

Abstract BackgroundData from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternative...

Full description

Saved in:
Bibliographic Details
Main Authors: Félix Camirand Lemyre, Simon Lévesque, Marie-Pier Domingue, Klaus Herrmann, Jean-François Ethier
Format: Article
Language:English
Published: JMIR Publications 2024-11-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2024/1/e53622
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850085915929608192
author Félix Camirand Lemyre
Simon Lévesque
Marie-Pier Domingue
Klaus Herrmann
Jean-François Ethier
author_facet Félix Camirand Lemyre
Simon Lévesque
Marie-Pier Domingue
Klaus Herrmann
Jean-François Ethier
author_sort Félix Camirand Lemyre
collection DOAJ
description Abstract BackgroundData from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks. ObjectiveThis study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings. MethodsA scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators. ResultsFrom the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity. ConclusionsThis study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.
format Article
id doaj-art-6df7c87a3c544485b6db280f6f86cb7b
institution DOAJ
issn 2291-9694
language English
publishDate 2024-11-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj-art-6df7c87a3c544485b6db280f6f86cb7b2025-08-20T02:43:36ZengJMIR PublicationsJMIR Medical Informatics2291-96942024-11-0112e53622e5362210.2196/53622Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health AnalyticsFélix Camirand Lemyrehttp://orcid.org/0000-0003-3277-2729Simon Lévesquehttp://orcid.org/0009-0002-6994-2752Marie-Pier Dominguehttp://orcid.org/0009-0002-2582-6071Klaus Herrmannhttp://orcid.org/0000-0002-8044-5717Jean-François Ethierhttp://orcid.org/0000-0001-9408-0109 Abstract BackgroundData from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks. ObjectiveThis study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings. MethodsA scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators. ResultsFrom the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity. ConclusionsThis study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.https://medinform.jmir.org/2024/1/e53622
spellingShingle Félix Camirand Lemyre
Simon Lévesque
Marie-Pier Domingue
Klaus Herrmann
Jean-François Ethier
Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
JMIR Medical Informatics
title Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
title_full Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
title_fullStr Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
title_full_unstemmed Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
title_short Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics
title_sort distributed statistical analyses a scoping review and examples of operational frameworks adapted to health analytics
url https://medinform.jmir.org/2024/1/e53622
work_keys_str_mv AT felixcamirandlemyre distributedstatisticalanalysesascopingreviewandexamplesofoperationalframeworksadaptedtohealthanalytics
AT simonlevesque distributedstatisticalanalysesascopingreviewandexamplesofoperationalframeworksadaptedtohealthanalytics
AT mariepierdomingue distributedstatisticalanalysesascopingreviewandexamplesofoperationalframeworksadaptedtohealthanalytics
AT klausherrmann distributedstatisticalanalysesascopingreviewandexamplesofoperationalframeworksadaptedtohealthanalytics
AT jeanfrancoisethier distributedstatisticalanalysesascopingreviewandexamplesofoperationalframeworksadaptedtohealthanalytics