A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is t...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10854424/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832575583718998016 |
---|---|
author | Ankita Tondwalkar Andres Kwasinski |
author_facet | Ankita Tondwalkar Andres Kwasinski |
author_sort | Ankita Tondwalkar |
collection | DOAJ |
description | This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation. |
format | Article |
id | doaj-art-560b5f47f6c74352a1cd09e0de0afeee |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-560b5f47f6c74352a1cd09e0de0afeee2025-01-31T23:04:45ZengIEEEIEEE Access2169-35362025-01-0113196781969310.1109/ACCESS.2025.353465710854424A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive RadiosAnkita Tondwalkar0https://orcid.org/0000-0003-0648-8134Andres Kwasinski1https://orcid.org/0000-0002-8083-8318Kate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USAKate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USAThis paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation.https://ieeexplore.ieee.org/document/10854424/Cognitive radiosuncoordinated multi-agent deep Q-learningunderlay dynamic spectrum access and sharing |
spellingShingle | Ankita Tondwalkar Andres Kwasinski A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios IEEE Access Cognitive radios uncoordinated multi-agent deep Q-learning underlay dynamic spectrum access and sharing |
title | A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios |
title_full | A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios |
title_fullStr | A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios |
title_full_unstemmed | A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios |
title_short | A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios |
title_sort | deep q learning algorithm with guaranteed convergence for distributed and uncoordinated operation of cognitive radios |
topic | Cognitive radios uncoordinated multi-agent deep Q-learning underlay dynamic spectrum access and sharing |
url | https://ieeexplore.ieee.org/document/10854424/ |
work_keys_str_mv | AT ankitatondwalkar adeepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios AT andreskwasinski adeepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios AT ankitatondwalkar deepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios AT andreskwasinski deepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios |