A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhen Zhang, Dongqing Wang, Dongbin Zhao, Qiaoni Han, Tingting Song
Format:	Article
Language:	English
Published:	IEEE 2018-01-01
Series:	IEEE Access
Subjects:	Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks
Online Access:	https://ieeexplore.ieee.org/document/8517104/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841536190641078272
author	Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song
author_facet	Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song
author_sort	Zhen Zhang
collection	DOAJ
description	Multi-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.
format	Article
id	doaj-art-8eaebd97e1574567be7a7f56af0df567
institution	Kabale University
issn	2169-3536
language	English
publishDate	2018-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-8eaebd97e1574567be7a7f56af0df5672025-01-15T00:01:05ZengIEEEIEEE Access2169-35362018-01-016702237023510.1109/ACCESS.2018.28788538517104A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative AgentsZhen Zhang0https://orcid.org/0000-0002-6615-629XDongqing Wang1Dongbin Zhao2Qiaoni Han3Tingting Song4School of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaState Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaSchool of Automation, Qingdao University, Qingdao, ChinaMulti-agent reinforcement learning (MARL) can be used to design intelligent agents for solving cooperative tasks. Within the MARL category, this paper proposes the probability of maximal reward based on the infinitesimal gradient ascent (PMR-IGA) algorithm to reach the maximal total reward in repeated games. Theoretical analyses show that in a finite-player-finite-action repeated game with two pure optimal joint actions where no common component action exists, both the optimal joint actions are stable critical points of the PMR-IGA model. Furthermore, we apply the Q-value function to estimate the gradient and derive the probability of maximal reward based on estimated gradient ascent (PMR-EGA) algorithm. Theoretical analyses and simulations of case studies of repeated games show that the maximal total reward can be achieved under any initial conditions. The PMR-EGA can be naturally extended to optimize cooperative stochastic games. Two stochastic games, i.e., box pushing and a distributed sensor network, are used as test beds. The simulations show that the PMR-EGA displays consistently an excellent performance for both stochastic games.https://ieeexplore.ieee.org/document/8517104/Multi-agent reinforcement learninggradient ascentQ-learningcooperative tasks
spellingShingle	Zhen Zhang Dongqing Wang Dongbin Zhao Qiaoni Han Tingting Song A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents IEEE Access Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks
title	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_fullStr	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_full_unstemmed	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_short	A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents
title_sort	gradient based reinforcement learning algorithm for multiple cooperative agents
topic	Multi-agent reinforcement learning gradient ascent Q-learning cooperative tasks
url	https://ieeexplore.ieee.org/document/8517104/
work_keys_str_mv	AT zhenzhang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongqingwang agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongbinzhao agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT qiaonihan agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT tingtingsong agradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT zhenzhang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongqingwang gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT dongbinzhao gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT qiaonihan gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents AT tingtingsong gradientbasedreinforcementlearningalgorithmformultiplecooperativeagents

A Gradient-Based Reinforcement Learning Algorithm for Multiple Cooperative Agents

Similar Items