A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios

This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ankita Tondwalkar, Andres Kwasinski
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Cognitive radios uncoordinated multi-agent deep Q-learning underlay dynamic spectrum access and sharing
Online Access:	https://ieeexplore.ieee.org/document/10854424/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832575583718998016
author	Ankita Tondwalkar Andres Kwasinski
author_facet	Ankita Tondwalkar Andres Kwasinski
author_sort	Ankita Tondwalkar
collection	DOAJ
description	This paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation.
format	Article
id	doaj-art-560b5f47f6c74352a1cd09e0de0afeee
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-560b5f47f6c74352a1cd09e0de0afeee2025-01-31T23:04:45ZengIEEEIEEE Access2169-35362025-01-0113196781969310.1109/ACCESS.2025.353465710854424A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive RadiosAnkita Tondwalkar0https://orcid.org/0000-0003-0648-8134Andres Kwasinski1https://orcid.org/0000-0002-8083-8318Kate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USAKate Gleason College of Engineering, Rochester Institute of Technology, Rochester, NY, USAThis paper studies a deep reinforcement learning technique for distributed resource allocation among cognitive radios operating under an underlay dynamic spectrum access paradigm which does not require coordination between agents during learning. The key challenge that is addressed in this work is that of a non-stationary reinforcement learning environment that arises from the uncoordinated and distributed radios operation over a shared wireless environment. This challenge is illustrated by presenting a simulation result where the use of a standard single-agent deep reinforcement learning approach does not achieve convergence because of being applied in the uncoordinated interacting multi-radio scenario. To address this challenge, this work presents the uncoordinated and distributed multi-agent DQL (UDMA-DQL) technique that combines a deep neural network with learning in exploration phases, and with the use of a Best Reply Process with Inertia for the gradual learning of the best policy. Key in the study of UDMA-DQL herein, it is shown by considering aspects specific to deep reinforcement learning, that under an arbitrarily long time the UDMA-DQL technique converges with probability one to equilibrium policies in the non-stationary environment resulting from the distributed and uncoordinated operation of cognitive radios. This analytical study is confirmed through simulation results showing that, in cases when an optimal policy can be identified, UDMA-DQL is able to find such policy in 99% of cases for a sufficiently long learning time. Importantly, further simulations show that the presented UDMA-DQL approach achieves a much faster learning performance compared to an equivalent table-based Q-learning implementation.https://ieeexplore.ieee.org/document/10854424/Cognitive radiosuncoordinated multi-agent deep Q-learningunderlay dynamic spectrum access and sharing
spellingShingle	Ankita Tondwalkar Andres Kwasinski A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios IEEE Access Cognitive radios uncoordinated multi-agent deep Q-learning underlay dynamic spectrum access and sharing
title	A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_full	A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_fullStr	A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_full_unstemmed	A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_short	A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios
title_sort	deep q learning algorithm with guaranteed convergence for distributed and uncoordinated operation of cognitive radios
topic	Cognitive radios uncoordinated multi-agent deep Q-learning underlay dynamic spectrum access and sharing
url	https://ieeexplore.ieee.org/document/10854424/
work_keys_str_mv	AT ankitatondwalkar adeepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios AT andreskwasinski adeepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios AT ankitatondwalkar deepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios AT andreskwasinski deepqlearningalgorithmwithguaranteedconvergencefordistributedanduncoordinatedoperationofcognitiveradios

A Deep Q-Learning Algorithm With Guaranteed Convergence for Distributed and Uncoordinated Operation of Cognitive Radios

Similar Items