COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data

Abstract Clinical insights from real-world data often require aggregating information from institutions to ensure sufficient sample sizes and generalizability. However, patient privacy concerns only limit the sharing of patient-level data, and traditional federated learning algorithms, relying on ex...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiong Wu, Jenna M. Reps, Lu Li, Bingyu Zhang, Yiwen Lu, Jiayi Tong, Dazheng Zhang, Thomas Lumley, Milou T. Brand, Mui Van Zandt, Thomas Falconer, Xing He, Yu Huang, Haoyang Li, Chao Yan, Guojun Tang, Andrew E. Williams, Fei Wang, Jiang Bian, Bradley Malin, George Hripcsak, Martijn J. Schuemie, Yun Lu, Steve Drew, Jiayu Zhou, David A. Asch, Yong Chen
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01781-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849234407732281344
author Qiong Wu
Jenna M. Reps
Lu Li
Bingyu Zhang
Yiwen Lu
Jiayi Tong
Dazheng Zhang
Thomas Lumley
Milou T. Brand
Mui Van Zandt
Thomas Falconer
Xing He
Yu Huang
Haoyang Li
Chao Yan
Guojun Tang
Andrew E. Williams
Fei Wang
Jiang Bian
Bradley Malin
George Hripcsak
Martijn J. Schuemie
Yun Lu
Steve Drew
Jiayu Zhou
David A. Asch
Yong Chen
author_facet Qiong Wu
Jenna M. Reps
Lu Li
Bingyu Zhang
Yiwen Lu
Jiayi Tong
Dazheng Zhang
Thomas Lumley
Milou T. Brand
Mui Van Zandt
Thomas Falconer
Xing He
Yu Huang
Haoyang Li
Chao Yan
Guojun Tang
Andrew E. Williams
Fei Wang
Jiang Bian
Bradley Malin
George Hripcsak
Martijn J. Schuemie
Yun Lu
Steve Drew
Jiayu Zhou
David A. Asch
Yong Chen
author_sort Qiong Wu
collection DOAJ
description Abstract Clinical insights from real-world data often require aggregating information from institutions to ensure sufficient sample sizes and generalizability. However, patient privacy concerns only limit the sharing of patient-level data, and traditional federated learning algorithms, relying on extensive back-and-forth communications, can be inefficient to implement. We introduce the Collaborative One-shot Lossless Algorithm for Generalized Linear Models (COLA-GLM), a novel federated learning algorithm that supports diverse outcome types via generalized linear models and achieves results identical to a pooled patient-level data analysis (lossless) with only a single round of aggregated data exchange (one-shot). To further protect aggregated institutional data, we developed a secure extension, secure-COLA-GLM, utilizing homomorphic encryption. We demonstrated the effectiveness and lossless property of COLA-GLM through applications to an international influenza cohort and a decentralized U.S. COVID-19 mortality study. COLA-GLM and secure-COLA-GLM offer a scalable, efficient solution for decentralized collaborative learning involving multiple data partners and diverse security requirements.
format Article
id doaj-art-a6ac6f3c86984b538bac08b4a37b0784
institution Kabale University
issn 2398-6352
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-a6ac6f3c86984b538bac08b4a37b07842025-08-20T04:03:11ZengNature Portfolionpj Digital Medicine2398-63522025-07-018111110.1038/s41746-025-01781-1COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare dataQiong Wu0Jenna M. Reps1Lu Li2Bingyu Zhang3Yiwen Lu4Jiayi Tong5Dazheng Zhang6Thomas Lumley7Milou T. Brand8Mui Van Zandt9Thomas Falconer10Xing He11Yu Huang12Haoyang Li13Chao Yan14Guojun Tang15Andrew E. Williams16Fei Wang17Jiang Bian18Bradley Malin19George Hripcsak20Martijn J. Schuemie21Yun Lu22Steve Drew23Jiayu Zhou24David A. Asch25Yong Chen26Department of Biostatistics and Health Data Science, University of PittsburghObservational Health Data Sciences and InformaticsThe Center for Health AI and Synthesis of Evidence (CHASE), University of PennsylvaniaThe Center for Health AI and Synthesis of Evidence (CHASE), University of PennsylvaniaThe Center for Health AI and Synthesis of Evidence (CHASE), University of PennsylvaniaDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of MedicineDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of MedicineDepartment of Statistics, Faculty of Science, University of AucklandReal World Solutions, IQVIAObservational Health Data Sciences and InformaticsDepartment of Biomedical Informatics, Columbia University Irving Medical CenterDepartment of Biostatistics and Health Data Science, Indiana UniversityDepartment of Biostatistics and Health Data Science, Indiana UniversityDepartment of Population Health Sciences, Weill Cornell MedicineDepartment of Biomedical Informatics, Vanderbilt University Medical CenterDepartment of Electrical and Software Engineering, University of CalgaryClinical and Translational Science Institute, Tufts Medical CenterDepartment of Population Health Sciences, Weill Cornell MedicineDepartment of Biostatistics and Health Data Science, Indiana UniversityDepartment of Biomedical Informatics, Vanderbilt University Medical CenterDepartment of Biomedical Informatics, Columbia University Irving Medical CenterObservational Health Data Sciences and InformaticsCenter for Biologics Evaluation and Research, Food and Drug AdministrationDepartment of Electrical and Software Engineering, University of CalgarySchool of Information, University of MichiganLeonard Davis Institute of Health Economics, University of PennsylvaniaDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of MedicineAbstract Clinical insights from real-world data often require aggregating information from institutions to ensure sufficient sample sizes and generalizability. However, patient privacy concerns only limit the sharing of patient-level data, and traditional federated learning algorithms, relying on extensive back-and-forth communications, can be inefficient to implement. We introduce the Collaborative One-shot Lossless Algorithm for Generalized Linear Models (COLA-GLM), a novel federated learning algorithm that supports diverse outcome types via generalized linear models and achieves results identical to a pooled patient-level data analysis (lossless) with only a single round of aggregated data exchange (one-shot). To further protect aggregated institutional data, we developed a secure extension, secure-COLA-GLM, utilizing homomorphic encryption. We demonstrated the effectiveness and lossless property of COLA-GLM through applications to an international influenza cohort and a decentralized U.S. COVID-19 mortality study. COLA-GLM and secure-COLA-GLM offer a scalable, efficient solution for decentralized collaborative learning involving multiple data partners and diverse security requirements.https://doi.org/10.1038/s41746-025-01781-1
spellingShingle Qiong Wu
Jenna M. Reps
Lu Li
Bingyu Zhang
Yiwen Lu
Jiayi Tong
Dazheng Zhang
Thomas Lumley
Milou T. Brand
Mui Van Zandt
Thomas Falconer
Xing He
Yu Huang
Haoyang Li
Chao Yan
Guojun Tang
Andrew E. Williams
Fei Wang
Jiang Bian
Bradley Malin
George Hripcsak
Martijn J. Schuemie
Yun Lu
Steve Drew
Jiayu Zhou
David A. Asch
Yong Chen
COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
npj Digital Medicine
title COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
title_full COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
title_fullStr COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
title_full_unstemmed COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
title_short COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
title_sort cola glm collaborative one shot and lossless algorithms of generalized linear models for decentralized observational healthcare data
url https://doi.org/10.1038/s41746-025-01781-1
work_keys_str_mv AT qiongwu colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT jennamreps colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT luli colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT bingyuzhang colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT yiwenlu colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT jiayitong colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT dazhengzhang colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT thomaslumley colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT miloutbrand colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT muivanzandt colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT thomasfalconer colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT xinghe colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT yuhuang colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT haoyangli colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT chaoyan colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT guojuntang colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT andrewewilliams colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT feiwang colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT jiangbian colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT bradleymalin colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT georgehripcsak colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT martijnjschuemie colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT yunlu colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT stevedrew colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT jiayuzhou colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT davidaasch colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata
AT yongchen colaglmcollaborativeoneshotandlosslessalgorithmsofgeneralizedlinearmodelsfordecentralizedobservationalhealthcaredata