Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognize...

Full description

Saved in:

Bibliographic Details
Main Authors:	Almira Budiyanto, Keisuke Azetsu, Nobutomo Matsunaga
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Automation
Subjects:	multi-robots formation change Scaled Dot Product Attention transfer learning
Online Access:	https://www.mdpi.com/2673-4052/5/4/34
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850049994947559424
author	Almira Budiyanto Keisuke Azetsu Nobutomo Matsunaga
author_facet	Almira Budiyanto Keisuke Azetsu Nobutomo Matsunaga
author_sort	Almira Budiyanto
collection	DOAJ
description	A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots.
format	Article
id	doaj-art-d30a6077bc0d4eb3b18650aee8c5e685
institution	DOAJ
issn	2673-4052
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Automation
spelling	doaj-art-d30a6077bc0d4eb3b18650aee8c5e6852025-08-20T02:53:34ZengMDPI AGAutomation2673-40522024-11-015459761210.3390/automation5040034Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)Almira Budiyanto0Keisuke Azetsu1Nobutomo Matsunaga2Graduate School of Science and Technology, Kumamoto University, Kumamoto 860-8555, JapanGraduate School of Science and Technology, Kumamoto University, Kumamoto 860-8555, JapanFaculty of Advanced Science and Technology, Kumamoto University, Kumamoto 860-8555, JapanA method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots.https://www.mdpi.com/2673-4052/5/4/34multi-robotsformation changeScaled Dot Product Attentiontransfer learning
spellingShingle	Almira Budiyanto Keisuke Azetsu Nobutomo Matsunaga Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) Automation multi-robots formation change Scaled Dot Product Attention transfer learning
title	Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_full	Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_fullStr	Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_full_unstemmed	Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_short	Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_sort	accelerated transfer learning for cooperative transportation formation change via sdpa mappo scaled dot product attention multi agent proximal policy optimization
topic	multi-robots formation change Scaled Dot Product Attention transfer learning
url	https://www.mdpi.com/2673-4052/5/4/34
work_keys_str_mv	AT almirabudiyanto acceleratedtransferlearningforcooperativetransportationformationchangeviasdpamapposcaleddotproductattentionmultiagentproximalpolicyoptimization AT keisukeazetsu acceleratedtransferlearningforcooperativetransportationformationchangeviasdpamapposcaleddotproductattentionmultiagentproximalpolicyoptimization AT nobutomomatsunaga acceleratedtransferlearningforcooperativetransportationformationchangeviasdpamapposcaleddotproductattentionmultiagentproximalpolicyoptimization

Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

Similar Items