Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data

Federated Learning (FL) allows multiple clients to train a common model without sharing their private training data. In practice, federated optimization struggles with sub-optimal model utility because data is not independent and identically distributed (non-IID). Recent work has proposed to cluster...

Full description

Saved in:
Bibliographic Details
Main Authors: Daniel Scheliga, Patrick Mäder, Marco Seeland
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:Applied Artificial Intelligence
Online Access:https://www.tandfonline.com/doi/10.1080/08839514.2024.2394756
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850116282683228160
author Daniel Scheliga
Patrick Mäder
Marco Seeland
author_facet Daniel Scheliga
Patrick Mäder
Marco Seeland
author_sort Daniel Scheliga
collection DOAJ
description Federated Learning (FL) allows multiple clients to train a common model without sharing their private training data. In practice, federated optimization struggles with sub-optimal model utility because data is not independent and identically distributed (non-IID). Recent work has proposed to cluster clients according to dataset fingerprints to improve model utility in such situations. These fingerprints aim to capture the key characteristics of clients’ local data distributions. Recently, a mechanism was proposed to calculate dataset fingerprints from raw client data. We find that this fingerprinting mechanism comes with substantial time and memory consumption, limiting its practical use to small datasets. Additionally, shared raw data fingerprints can directly leak sensitive visual information, in certain cases even resembling the original client training data. To alleviate these problems, we propose a Feature-based dataset FingerPrinting mechanism (FFP). We use the MedMNIST database to develop a highly realistic case study for FL on medical image data. Compared to existing methods, our proposed FFP reduces the computational overhead of fingerprint calculation while achieving similar model utility. Furthermore, FFP mitigates the risk of raw data leakage from fingerprints by design.
format Article
id doaj-art-09c8d0aa6936474bb77dbd525abbaac5
institution OA Journals
issn 0883-9514
1087-6545
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj-art-09c8d0aa6936474bb77dbd525abbaac52025-08-20T02:36:22ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452024-12-0138110.1080/08839514.2024.2394756Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image DataDaniel Scheliga0Patrick Mäder1Marco Seeland2Department of Computer Science and Automation, Data-intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ilmenau, GermanyDepartment of Computer Science and Automation, Data-intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ilmenau, GermanyDepartment of Computer Science and Automation, Data-intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ilmenau, GermanyFederated Learning (FL) allows multiple clients to train a common model without sharing their private training data. In practice, federated optimization struggles with sub-optimal model utility because data is not independent and identically distributed (non-IID). Recent work has proposed to cluster clients according to dataset fingerprints to improve model utility in such situations. These fingerprints aim to capture the key characteristics of clients’ local data distributions. Recently, a mechanism was proposed to calculate dataset fingerprints from raw client data. We find that this fingerprinting mechanism comes with substantial time and memory consumption, limiting its practical use to small datasets. Additionally, shared raw data fingerprints can directly leak sensitive visual information, in certain cases even resembling the original client training data. To alleviate these problems, we propose a Feature-based dataset FingerPrinting mechanism (FFP). We use the MedMNIST database to develop a highly realistic case study for FL on medical image data. Compared to existing methods, our proposed FFP reduces the computational overhead of fingerprint calculation while achieving similar model utility. Furthermore, FFP mitigates the risk of raw data leakage from fingerprints by design.https://www.tandfonline.com/doi/10.1080/08839514.2024.2394756
spellingShingle Daniel Scheliga
Patrick Mäder
Marco Seeland
Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data
Applied Artificial Intelligence
title Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data
title_full Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data
title_fullStr Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data
title_full_unstemmed Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data
title_short Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data
title_sort feature based dataset fingerprinting for clustered federated learning on medical image data
url https://www.tandfonline.com/doi/10.1080/08839514.2024.2394756
work_keys_str_mv AT danielscheliga featurebaseddatasetfingerprintingforclusteredfederatedlearningonmedicalimagedata
AT patrickmader featurebaseddatasetfingerprintingforclusteredfederatedlearningonmedicalimagedata
AT marcoseeland featurebaseddatasetfingerprintingforclusteredfederatedlearningonmedicalimagedata