DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-Based Proximal Policy Optimization (<i>DTPP...

Full description

Saved in:
Bibliographic Details
Main Authors: Anning Wei, Jintao Liang, Kaiyuan Lin, Ziyue Li, Rui Zhao
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/8/12/720
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846104992883146752
author Anning Wei
Jintao Liang
Kaiyuan Lin
Ziyue Li
Rui Zhao
author_facet Anning Wei
Jintao Liang
Kaiyuan Lin
Ziyue Li
Rui Zhao
author_sort Anning Wei
collection DOAJ
description Existing multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-Based Proximal Policy Optimization (<i>DTPPO</i>) method. DTPPO enhances multi-UAV collaboration through a Spatial Transformer, which models inter-agent dynamics, and a Temporal Transformer, which captures temporal dependencies to improve generalization across diverse environments. This architecture allows UAVs to navigate new, unseen environments without retraining. Extensive simulations demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities. The results confirm DTPPO’s effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios.
format Article
id doaj-art-5e3fa4bf906b4122883aec9725527edb
institution Kabale University
issn 2504-446X
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-5e3fa4bf906b4122883aec9725527edb2024-12-27T14:21:46ZengMDPI AGDrones2504-446X2024-11-0181272010.3390/drones8120720DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex EnvironmentsAnning Wei0Jintao Liang1Kaiyuan Lin2Ziyue Li3Rui Zhao4Department of Automation, Tsinghua University, Beijing 100190, ChinaState Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaViterbi School of Engineering, University of Southern California, Los Angeles, CA 90007, USADepartment of Information Systems, University of Cologne, 50923 Köln, GermanySenseTime ResearchExisting multi-agent deep reinforcement learning (MADRL) methods for multi-UAV navigation face challenges in generalization, particularly when applied to unseen complex environments. To address these limitations, we propose a Dual-Transformer Encoder-Based Proximal Policy Optimization (<i>DTPPO</i>) method. DTPPO enhances multi-UAV collaboration through a Spatial Transformer, which models inter-agent dynamics, and a Temporal Transformer, which captures temporal dependencies to improve generalization across diverse environments. This architecture allows UAVs to navigate new, unseen environments without retraining. Extensive simulations demonstrate that DTPPO outperforms current MADRL methods in terms of transferability, obstacle avoidance, and navigation efficiency across environments with varying obstacle densities. The results confirm DTPPO’s effectiveness as a robust solution for multi-UAV navigation in both known and unseen scenarios.https://www.mdpi.com/2504-446X/8/12/720multi-UAV navigationpartially observable Markov decision processmulti-agent deep reinforcement learningcross-scenario transferability
spellingShingle Anning Wei
Jintao Liang
Kaiyuan Lin
Ziyue Li
Rui Zhao
DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments
Drones
multi-UAV navigation
partially observable Markov decision process
multi-agent deep reinforcement learning
cross-scenario transferability
title DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments
title_full DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments
title_fullStr DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments
title_full_unstemmed DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments
title_short DTPPO: Dual-Transformer Encoder-Based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments
title_sort dtppo dual transformer encoder based proximal policy optimization for multi uav navigation in unseen complex environments
topic multi-UAV navigation
partially observable Markov decision process
multi-agent deep reinforcement learning
cross-scenario transferability
url https://www.mdpi.com/2504-446X/8/12/720
work_keys_str_mv AT anningwei dtppodualtransformerencoderbasedproximalpolicyoptimizationformultiuavnavigationinunseencomplexenvironments
AT jintaoliang dtppodualtransformerencoderbasedproximalpolicyoptimizationformultiuavnavigationinunseencomplexenvironments
AT kaiyuanlin dtppodualtransformerencoderbasedproximalpolicyoptimizationformultiuavnavigationinunseencomplexenvironments
AT ziyueli dtppodualtransformerencoderbasedproximalpolicyoptimizationformultiuavnavigationinunseencomplexenvironments
AT ruizhao dtppodualtransformerencoderbasedproximalpolicyoptimizationformultiuavnavigationinunseencomplexenvironments