DRACO: Decentralized Asynchronous Federated Learning Over Row-Stochastic Wireless Networks

Emerging technologies and use cases, such as smart Internet of Things (IoT), Internet of Agents, and Edge AI, have generated significant interest in training neural networks over fully decentralized, serverless networks. A major obstacle in this context is ensuring stable convergence without imposin...

Full description

Saved in:
Bibliographic Details
Main Authors: Eunjeong Jeong, Marios Kountouris
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Communications Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11016099/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Emerging technologies and use cases, such as smart Internet of Things (IoT), Internet of Agents, and Edge AI, have generated significant interest in training neural networks over fully decentralized, serverless networks. A major obstacle in this context is ensuring stable convergence without imposing stringent assumptions, such as identical data distributions across devices or synchronized updates. In this paper, we introduce DRACO, a novel framework for decentralized asynchronous Stochastic Gradient Descent (SGD) over row-stochastic gossip wireless networks. Our approach leverages continuous communication, allowing edge devices to perform local training and exchange model updates along a continuous timeline, thereby eliminating the need for synchronized timing. Additionally, our algorithm decouples communication and computation schedules, enabling complete autonomy for all users while effectively addressing straggler issues. Through a thorough convergence analysis, we show that DRACO achieves high performance in decentralized optimization while maintaining low variance across users even without predefined scheduling policies. Numerical experiments further validate the effectiveness of our approach, demonstrating that controlling the maximum number of received messages per client significantly reduces redundant communication costs while maintaining robust learning performance.
ISSN:2644-125X