A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation

Abstract The exponential growth of Big Data in healthcare, particularly in AI-driven medical diagnostics, has raised critical concerns about data privacy in medical image classification. With over 30% of healthcare organizations worldwide experiencing data breaches in the past year, the demand for s...

Full description

Saved in:
Bibliographic Details
Main Authors: Rahul Haripriya, Nilay Khare, Manish Pandey, Sreemoyee Biswas
Format: Article
Language:English
Published: SpringerOpen 2025-05-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01169-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849312139478564864
author Rahul Haripriya
Nilay Khare
Manish Pandey
Sreemoyee Biswas
author_facet Rahul Haripriya
Nilay Khare
Manish Pandey
Sreemoyee Biswas
author_sort Rahul Haripriya
collection DOAJ
description Abstract The exponential growth of Big Data in healthcare, particularly in AI-driven medical diagnostics, has raised critical concerns about data privacy in medical image classification. With over 30% of healthcare organizations worldwide experiencing data breaches in the past year, the demand for secure, privacy-preserving solutions is more urgent than ever. This study explores a federated learning approach combined with transfer learning to enhance privacy in medical image classification using ResNet and VGG16 architectures. Pre-trained on ImageNet and fine tuned on three specialized medical datasets TB chest X-rays, brain tumor MRI scans, and diabetic retinopathy images these models were deployed in a simulated multi-center healthcare environment. A major contribution of this work is the development of an adaptive aggregation methodology, which dynamically selects between Federated Averaging (FedAvg) and Federated Stochastic Gradient Descent (FedSGD) based on real-time data divergence observed across participating clients. Unlike conventional static aggregation methods, which uniformly apply the same update rule regardless of data heterogeneity, the proposed adaptive approach monitors gradients and data distributions at each communication round and selects the most suitable aggregation method dynamically. This adaptive strategy not only improves convergence but also optimizes resource utilization, making it suitable for multi-center healthcare networks where data heterogeneity is prevalent. The novelty of the proposed adaptive aggregation lies in its ability to maintain robust performance while minimizing computational costs, making it feasible for large-scale healthcare AI networks, such as hospital federated learning systems. Comparative analysis with baseline FL models, including FedAvg and FedSGD, shows that the adaptive aggregation method achieves comparable accuracy (up to 96.3%) while significantly reducing execution time by approximately 20% and maintaining a competitive F1-score. Additionally, the integration of privacy-preserving techniques ensures that sensitive patient data remains secure throughout the learning process. By integrating transfer learning with federated learning, this study presents a scalable and privacy-preserving framework for Big Data analytics in healthcare. The findings underscore the potential of adaptive aggregation to enhance federated learning efficiency across heterogeneous datasets, enabling medical institutions to develop high-accuracy diagnostic models without direct access to patient data.
format Article
id doaj-art-9fc4e916e90d4b86897a5b2ef1d3ec39
institution Kabale University
issn 2196-1115
language English
publishDate 2025-05-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-9fc4e916e90d4b86897a5b2ef1d3ec392025-08-20T03:53:12ZengSpringerOpenJournal of Big Data2196-11152025-05-0112115610.1186/s40537-025-01169-8A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregationRahul Haripriya0Nilay Khare1Manish Pandey2Sreemoyee Biswas3Department of Computer Science and Engineering, Maulana Azad National Institute of TechnologyDepartment of Computer Science and Engineering, Maulana Azad National Institute of TechnologyDepartment of Computer Science and Engineering, Maulana Azad National Institute of TechnologyDepartment of Computer Science and Engineering, Maulana Azad National Institute of TechnologyAbstract The exponential growth of Big Data in healthcare, particularly in AI-driven medical diagnostics, has raised critical concerns about data privacy in medical image classification. With over 30% of healthcare organizations worldwide experiencing data breaches in the past year, the demand for secure, privacy-preserving solutions is more urgent than ever. This study explores a federated learning approach combined with transfer learning to enhance privacy in medical image classification using ResNet and VGG16 architectures. Pre-trained on ImageNet and fine tuned on three specialized medical datasets TB chest X-rays, brain tumor MRI scans, and diabetic retinopathy images these models were deployed in a simulated multi-center healthcare environment. A major contribution of this work is the development of an adaptive aggregation methodology, which dynamically selects between Federated Averaging (FedAvg) and Federated Stochastic Gradient Descent (FedSGD) based on real-time data divergence observed across participating clients. Unlike conventional static aggregation methods, which uniformly apply the same update rule regardless of data heterogeneity, the proposed adaptive approach monitors gradients and data distributions at each communication round and selects the most suitable aggregation method dynamically. This adaptive strategy not only improves convergence but also optimizes resource utilization, making it suitable for multi-center healthcare networks where data heterogeneity is prevalent. The novelty of the proposed adaptive aggregation lies in its ability to maintain robust performance while minimizing computational costs, making it feasible for large-scale healthcare AI networks, such as hospital federated learning systems. Comparative analysis with baseline FL models, including FedAvg and FedSGD, shows that the adaptive aggregation method achieves comparable accuracy (up to 96.3%) while significantly reducing execution time by approximately 20% and maintaining a competitive F1-score. Additionally, the integration of privacy-preserving techniques ensures that sensitive patient data remains secure throughout the learning process. By integrating transfer learning with federated learning, this study presents a scalable and privacy-preserving framework for Big Data analytics in healthcare. The findings underscore the potential of adaptive aggregation to enhance federated learning efficiency across heterogeneous datasets, enabling medical institutions to develop high-accuracy diagnostic models without direct access to patient data.https://doi.org/10.1186/s40537-025-01169-8Federated learningBig DataMachine learningArtificial intelligenceData privacyTransfer learning
spellingShingle Rahul Haripriya
Nilay Khare
Manish Pandey
Sreemoyee Biswas
A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation
Journal of Big Data
Federated learning
Big Data
Machine learning
Artificial intelligence
Data privacy
Transfer learning
title A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation
title_full A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation
title_fullStr A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation
title_full_unstemmed A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation
title_short A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation
title_sort privacy enhanced framework for collaborative big data analysis in healthcare using adaptive federated learning aggregation
topic Federated learning
Big Data
Machine learning
Artificial intelligence
Data privacy
Transfer learning
url https://doi.org/10.1186/s40537-025-01169-8
work_keys_str_mv AT rahulharipriya aprivacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT nilaykhare aprivacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT manishpandey aprivacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT sreemoyeebiswas aprivacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT rahulharipriya privacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT nilaykhare privacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT manishpandey privacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation
AT sreemoyeebiswas privacyenhancedframeworkforcollaborativebigdataanalysisinhealthcareusingadaptivefederatedlearningaggregation