Adding Data Quality to Federated Learning Performance Improvement

Massive data generation from Internet of Things (IoT) devices increases the demand for efficient data analysis to extract relevant and actionable insights. As a result, Federated Learning (FL) allows IoT devices to collaborate in Artificial Intelligence (AI) training models while preserving data pri...

Full description

Saved in:
Bibliographic Details
Main Authors: Ernesto Gurgel Valente Neto, Solon Alves Peixoto, Valderi Reis Quietinho Leithardt, Juan Francisco de Paz Santana, Julio C. S. Dos Anjos
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11029230/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850077463356375040
author Ernesto Gurgel Valente Neto
Solon Alves Peixoto
Valderi Reis Quietinho Leithardt
Juan Francisco de Paz Santana
Julio C. S. Dos Anjos
author_facet Ernesto Gurgel Valente Neto
Solon Alves Peixoto
Valderi Reis Quietinho Leithardt
Juan Francisco de Paz Santana
Julio C. S. Dos Anjos
author_sort Ernesto Gurgel Valente Neto
collection DOAJ
description Massive data generation from Internet of Things (IoT) devices increases the demand for efficient data analysis to extract relevant and actionable insights. As a result, Federated Learning (FL) allows IoT devices to collaborate in Artificial Intelligence (AI) training models while preserving data privacy. However, selecting high-quality data for training remains a critical challenge in FL environments with non-independent and identically distributed (non-iid) data. Poor-quality data introduces errors, delays convergence, and increases computational costs. This study develops a data quality analysis algorithm for both FL and centralized environments to address these challenges. The proposed algorithm reduces computational costs, eliminates unnecessary data processing, and accelerates the convergence of AI models. The experiments utilized the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, and performance evaluation was based on main literature metrics, including accuracy, recall, F1 score, and precision. Results show a maximum observed execution time reduction of up to 56.49%, with an accuracy loss of approximately 0.50%.
format Article
id doaj-art-66ad9fc86cf34677aca0b8869c31147e
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-66ad9fc86cf34677aca0b8869c31147e2025-08-20T02:45:49ZengIEEEIEEE Access2169-35362025-01-011312662312664810.1109/ACCESS.2025.357830111029230Adding Data Quality to Federated Learning Performance ImprovementErnesto Gurgel Valente Neto0https://orcid.org/0000-0002-9881-199XSolon Alves Peixoto1https://orcid.org/0000-0002-3864-2506Valderi Reis Quietinho Leithardt2https://orcid.org/0000-0003-0446-9271Juan Francisco de Paz Santana3https://orcid.org/0000-0001-9461-7922Julio C. S. Dos Anjos4https://orcid.org/0000-0003-3623-2762PPGETI, Federal University of Ceará, Fortaleza, BrazilDepartment of Data Science, Federal University of Ceará Campus Itapajé, Itapajé, BrazilInstituto Universitário de Lisboa (ISCTE-IUL), ISTAR, Lisboa, PortugalExpert Systems and Applications Laboratory, University of Salamanca, Salamanca, SpainDepartment of Data Science, Federal University of Ceará Campus Itapajé, Graduate Program in Teleinformatics Engineering (PPGETI/UFC) Technological Center Campus of Pici, Ceará, Fortaleza, BrazilMassive data generation from Internet of Things (IoT) devices increases the demand for efficient data analysis to extract relevant and actionable insights. As a result, Federated Learning (FL) allows IoT devices to collaborate in Artificial Intelligence (AI) training models while preserving data privacy. However, selecting high-quality data for training remains a critical challenge in FL environments with non-independent and identically distributed (non-iid) data. Poor-quality data introduces errors, delays convergence, and increases computational costs. This study develops a data quality analysis algorithm for both FL and centralized environments to address these challenges. The proposed algorithm reduces computational costs, eliminates unnecessary data processing, and accelerates the convergence of AI models. The experiments utilized the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, and performance evaluation was based on main literature metrics, including accuracy, recall, F1 score, and precision. Results show a maximum observed execution time reduction of up to 56.49%, with an accuracy loss of approximately 0.50%.https://ieeexplore.ieee.org/document/11029230/Data qualitydeep learningfederated learningIoTIIDnon-IID
spellingShingle Ernesto Gurgel Valente Neto
Solon Alves Peixoto
Valderi Reis Quietinho Leithardt
Juan Francisco de Paz Santana
Julio C. S. Dos Anjos
Adding Data Quality to Federated Learning Performance Improvement
IEEE Access
Data quality
deep learning
federated learning
IoT
IID
non-IID
title Adding Data Quality to Federated Learning Performance Improvement
title_full Adding Data Quality to Federated Learning Performance Improvement
title_fullStr Adding Data Quality to Federated Learning Performance Improvement
title_full_unstemmed Adding Data Quality to Federated Learning Performance Improvement
title_short Adding Data Quality to Federated Learning Performance Improvement
title_sort adding data quality to federated learning performance improvement
topic Data quality
deep learning
federated learning
IoT
IID
non-IID
url https://ieeexplore.ieee.org/document/11029230/
work_keys_str_mv AT ernestogurgelvalenteneto addingdataqualitytofederatedlearningperformanceimprovement
AT solonalvespeixoto addingdataqualitytofederatedlearningperformanceimprovement
AT valderireisquietinholeithardt addingdataqualitytofederatedlearningperformanceimprovement
AT juanfranciscodepazsantana addingdataqualitytofederatedlearningperformanceimprovement
AT juliocsdosanjos addingdataqualitytofederatedlearningperformanceimprovement