A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is t...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10854424/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation. |
---|---|
ISSN: | 2169-3536 |