A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios

This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ankita Tondwalkar, Andres Kwasinski
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Cognitive radios uncoordinated multi-agent deep Q-learning underlay dynamic spectrum access and sharing
Online Access:	https://ieeexplore.ieee.org/document/10854424/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation.
ISSN:	2169-3536

A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios

Similar Items