A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeat...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhen Zhang, Dongqing Wang, Dongbin Zhao, Qiaoni Han, Tingting Song
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8517104/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841536190641078272
author Zhen Zhang
Dongqing Wang
Dongbin Zhao
Qiaoni Han
Tingting Song
author_facet Zhen Zhang
Dongqing Wang
Dongbin Zhao
Qiaoni Han
Tingting Song
author_sort Zhen Zhang
collection DOAJ
description Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.
format Article
id doaj-art-8eaebd97e1574567be7a7f56af0df567
institution Kabale University
issn 2169-3536
language English
publishDate 2018-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-8eaebd97e1574567be7a7f56af0df5672025-01-15T00:01:05ZengIEEEIEEE Access2169-35362018-01-016702237023510.1109/ACCESS.2018.28788538517104A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative AgentsZhen Zhang0https://orcid.org/0000-0002-6615-629XDongqing Wang1Dongbin Zhao2Qiaoni Han3Tingting Song4School of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaState Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaMulti-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.https://ieeexplore.ieee.org/document/8517104/Multi-agent reinforcement learninggradient ascentQ-learningcooperative tasks
spellingShingle Zhen Zhang
Dongqing Wang
Dongbin Zhao
Qiaoni Han
Tingting Song
A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
IEEE Access
Multi-agent reinforcement learning
gradient ascent
Q-learning
cooperative tasks
title A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_fullStr A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full_unstemmed A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_short A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_sort gradient based reinforcement learning algorithm for multiple cooperative agents
topic Multi-agent reinforcement learning
gradient ascent
Q-learning
cooperative tasks
url https://ieeexplore.ieee.org/document/8517104/
work_keys_str_mv AT zhenzhang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongqingwang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongbinzhao agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT qiaonihan agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT tingtingsong agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT zhenzhang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongqingwang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT dongbinzhao gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT qiaonihan gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents
AT tingtingsong gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents