A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios

This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is t...

Full description

Saved in:
Bibliographic Details
Main Authors: Ankita Tondwalkar, Andres Kwasinski
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10854424/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575583718998016
author Ankita Tondwalkar
Andres Kwasinski
author_facet Ankita Tondwalkar
Andres Kwasinski
author_sort Ankita Tondwalkar
collection DOAJ
description This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation.
format Article
id doaj-art-560b5f47f6c74352a1cd09e0de0afeee
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-560b5f47f6c74352a1cd09e0de0afeee2025-01-31T23:04:45ZengIEEEIEEE Access2169-35362025-01-0113196781969310.1109/ACCESS.2025.353465710854424A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive RadiosAnkita Tondwalkar0https://orcid.org/0000-0003-0648-8134Andres Kwasinski1https://orcid.org/0000-0002-8083-8318Kate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USAKate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USAThis paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation.https://ieeexplore.ieee.org/document/10854424/Cognitive radiosuncoordinated multi-agent deep Q-learningunderlay dynamic spectrum access and sharing
spellingShingle Ankita Tondwalkar
Andres Kwasinski
A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
IEEE Access
Cognitive radios
uncoordinated multi-agent deep Q-learning
underlay dynamic spectrum access and sharing
title A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_full A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_fullStr A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_full_unstemmed A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_short A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_sort deep q learning algorithm with guaranteed convergence for distributed and uncoordinated operation of cognitive radios
topic Cognitive radios
uncoordinated multi-agent deep Q-learning
underlay dynamic spectrum access and sharing
url https://ieeexplore.ieee.org/document/10854424/
work_keys_str_mv AT ankitatondwalkar adeepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios
AT andreskwasinski adeepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios
AT ankitatondwalkar deepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios
AT andreskwasinski deepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios