A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition

The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lei Sheng, Honghui Chen, Xiliang Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Algorithms
Subjects:	automatic representation state transition exploration exploitation multi-agent reinforcement learning deterministic strategy gradient algorithm
Online Access:	https://www.mdpi.com/1999-4893/17/12/579
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850050480126820352
author	Lei Sheng Honghui Chen Xiliang Chen
author_facet	Lei Sheng Honghui Chen Xiliang Chen
author_sort	Lei Sheng
collection	DOAJ
description	The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a partially observable continuous action space, this study introduces a multi-agent centralized strategy gradient algorithm grounded in a local state transition mechanism. In order to solve this challenge, the algorithm learns local state and local state-action representation from local observations and action values, thereby establishing a “local state transition” mechanism autonomously. As the input of the actor network, the automatically extracted local observation representation reduces the input state dimension, enhances the local state features closely related to the local state transition, and promotes the agent to use the local state features that affect the next observation state. To mitigate non-stationarity and reliability assignment issues in multi-agent environments, a centralized critic network evaluates the current joint strategy. The proposed algorithm, NST-FACMAC, is evaluated alongside other multi-agent deterministic strategy algorithms in a continuous control simulation environment using a multi-agent ant robot. The experimental results indicate accelerated convergence and higher average reward values in cooperative multi-agent ant simulation environments. Notably, in four simulated environments named Ant-v2 (2 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 4), Ant-v2 (2 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 4d), Ant-v2 (4 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 2), and Manyant (2 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 3), the algorithm demonstrates performance improvements of approximately 1.9%, 4.8%, 11.9%, and 36.1%, respectively, compared to the best baseline algorithm. These findings underscore the algorithm’s effectiveness in enhancing the stability of multi-agent ant robot control within dynamic environments.
format	Article
id	doaj-art-972010c0e91d40d3be242eefb050f166
institution	DOAJ
issn	1999-4893
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Algorithms
spelling	doaj-art-972010c0e91d40d3be242eefb050f1662025-08-20T02:53:27ZengMDPI AGAlgorithms1999-48932024-12-01171257910.3390/a17120579A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State TransitionLei Sheng0Honghui Chen1Xiliang Chen2National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha 410073, ChinaNational Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha 410073, ChinaSchool of Command and Control Engineering, Army Engineering University, Nanjing 210007, ChinaThe prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a partially observable continuous action space, this study introduces a multi-agent centralized strategy gradient algorithm grounded in a local state transition mechanism. In order to solve this challenge, the algorithm learns local state and local state-action representation from local observations and action values, thereby establishing a “local state transition” mechanism autonomously. As the input of the actor network, the automatically extracted local observation representation reduces the input state dimension, enhances the local state features closely related to the local state transition, and promotes the agent to use the local state features that affect the next observation state. To mitigate non-stationarity and reliability assignment issues in multi-agent environments, a centralized critic network evaluates the current joint strategy. The proposed algorithm, NST-FACMAC, is evaluated alongside other multi-agent deterministic strategy algorithms in a continuous control simulation environment using a multi-agent ant robot. The experimental results indicate accelerated convergence and higher average reward values in cooperative multi-agent ant simulation environments. Notably, in four simulated environments named Ant-v2 (2 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 4), Ant-v2 (2 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 4d), Ant-v2 (4 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 2), and Manyant (2 <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 3), the algorithm demonstrates performance improvements of approximately 1.9%, 4.8%, 11.9%, and 36.1%, respectively, compared to the best baseline algorithm. These findings underscore the algorithm’s effectiveness in enhancing the stability of multi-agent ant robot control within dynamic environments.https://www.mdpi.com/1999-4893/17/12/579automatic representationstate transitionexplorationexploitationmulti-agent reinforcement learningdeterministic strategy gradient algorithm
spellingShingle	Lei Sheng Honghui Chen Xiliang Chen A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition Algorithms automatic representation state transition exploration exploitation multi-agent reinforcement learning deterministic strategy gradient algorithm
title	A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition
title_full	A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition
title_fullStr	A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition
title_full_unstemmed	A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition
title_short	A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition
title_sort	multi agent centralized strategy gradient reinforcement learning algorithm based on state transition
topic	automatic representation state transition exploration exploitation multi-agent reinforcement learning deterministic strategy gradient algorithm
url	https://www.mdpi.com/1999-4893/17/12/579
work_keys_str_mv	AT leisheng amultiagentcentralizedstrategygradientreinforcementlearningalgorithmbasedonstatetransition AT honghuichen amultiagentcentralizedstrategygradientreinforcementlearningalgorithmbasedonstatetransition AT xiliangchen amultiagentcentralizedstrategygradientreinforcementlearningalgorithmbasedonstatetransition AT leisheng multiagentcentralizedstrategygradientreinforcementlearningalgorithmbasedonstatetransition AT honghuichen multiagentcentralizedstrategygradientreinforcementlearningalgorithmbasedonstatetransition AT xiliangchen multiagentcentralizedstrategygradientreinforcementlearningalgorithmbasedonstatetransition

A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition

Similar Items