Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognize...

Full description

Saved in:
Bibliographic Details
Main Authors: Almira Budiyanto, Keisuke Azetsu, Nobutomo Matsunaga
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Automation
Subjects:
Online Access:https://www.mdpi.com/2673-4052/5/4/34
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850049994947559424
author Almira Budiyanto
Keisuke Azetsu
Nobutomo Matsunaga
author_facet Almira Budiyanto
Keisuke Azetsu
Nobutomo Matsunaga
author_sort Almira Budiyanto
collection DOAJ
description A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots.
format Article
id doaj-art-d30a6077bc0d4eb3b18650aee8c5e685
institution DOAJ
issn 2673-4052
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Automation
spelling doaj-art-d30a6077bc0d4eb3b18650aee8c5e6852025-08-20T02:53:34ZengMDPI AGAutomation2673-40522024-11-015459761210.3390/automation5040034Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)Almira Budiyanto0Keisuke Azetsu1Nobutomo Matsunaga2Graduate School of Science and Technology, Kumamoto University, Kumamoto 860-8555, JapanGraduate School of Science and Technology, Kumamoto University, Kumamoto 860-8555, JapanFaculty of Advanced Science and Technology, Kumamoto University, Kumamoto 860-8555, JapanA method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots.https://www.mdpi.com/2673-4052/5/4/34multi-robotsformation changeScaled Dot Product Attentiontransfer learning
spellingShingle Almira Budiyanto
Keisuke Azetsu
Nobutomo Matsunaga
Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
Automation
multi-robots
formation change
Scaled Dot Product Attention
transfer learning
title Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_full Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_fullStr Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_full_unstemmed Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_short Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)
title_sort accelerated transfer learning for cooperative transportation formation change via sdpa mappo scaled dot product attention multi agent proximal policy optimization
topic multi-robots
formation change
Scaled Dot Product Attention
transfer learning
url https://www.mdpi.com/2673-4052/5/4/34
work_keys_str_mv AT almirabudiyanto acceleratedtransferlearningforcooperativetransportationformationchangeviasdpamapposcaleddotproductattentionmultiagentproximalpolicyoptimization
AT keisukeazetsu acceleratedtransferlearningforcooperativetransportationformationchangeviasdpamapposcaleddotproductattentionmultiagentproximalpolicyoptimization
AT nobutomomatsunaga acceleratedtransferlearningforcooperativetransportationformationchangeviasdpamapposcaleddotproductattentionmultiagentproximalpolicyoptimization