Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning
Reinforcement learning (RL) has been shown to be effective in path planning. However, it usually requires exploring a sufficient number of state–action pairs, some of which may be unsafe when deployed in practical obstacle environments. To this end, this paper proposes an end-to-end planning method...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-09-01
|
| Series: | Drones |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-446X/8/9/481 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850261171823706112 |
|---|---|
| author | Hong Chen Dan Huang Chenggang Wang Lu Ding Lei Song Hongtao Liu |
| author_facet | Hong Chen Dan Huang Chenggang Wang Lu Ding Lei Song Hongtao Liu |
| author_sort | Hong Chen |
| collection | DOAJ |
| description | Reinforcement learning (RL) has been shown to be effective in path planning. However, it usually requires exploring a sufficient number of state–action pairs, some of which may be unsafe when deployed in practical obstacle environments. To this end, this paper proposes an end-to-end planning method based model-free RL framework with optimization, which can achieve better learning performance with a safety guarantee. Firstly, for second-order drone systems, a differentiable high-order control barrier function (HOCBF) is introduced to ensure the output of the planning algorithm falls in a safe range. Then, a safety layer based on the HOCBF is proposed, which projects RL actions into a feasible solution set to guarantee safe exploration. Finally, we conducted a simulation for drone obstacle avoidance and validated the proposed method in the simulation environment. The experimental results demonstrate a significant enhancement over the baseline approach. Specifically, the proposed method achieved a substantial reduction in the average cumulative number of collisions per drone during training compared to the baseline. Additionally, in the testing phase, the proposed method realized a 43% improvement in the task success rate relative to the MADDPG. |
| format | Article |
| id | doaj-art-db990fb209bf414c8588491f33312aad |
| institution | OA Journals |
| issn | 2504-446X |
| language | English |
| publishDate | 2024-09-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Drones |
| spelling | doaj-art-db990fb209bf414c8588491f33312aad2025-08-20T01:55:30ZengMDPI AGDrones2504-446X2024-09-018948110.3390/drones8090481Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement LearningHong Chen0Dan Huang1Chenggang Wang2Lu Ding3Lei Song4Hongtao Liu5School of Electrical Engineering, Guangxi University, Nanning 530004, ChinaSchool of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Electrical Engineering, Guangxi University, Nanning 530004, ChinaSchool of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China92281 Branch, Zhucheng 262200, ChinaReinforcement learning (RL) has been shown to be effective in path planning. However, it usually requires exploring a sufficient number of state–action pairs, some of which may be unsafe when deployed in practical obstacle environments. To this end, this paper proposes an end-to-end planning method based model-free RL framework with optimization, which can achieve better learning performance with a safety guarantee. Firstly, for second-order drone systems, a differentiable high-order control barrier function (HOCBF) is introduced to ensure the output of the planning algorithm falls in a safe range. Then, a safety layer based on the HOCBF is proposed, which projects RL actions into a feasible solution set to guarantee safe exploration. Finally, we conducted a simulation for drone obstacle avoidance and validated the proposed method in the simulation environment. The experimental results demonstrate a significant enhancement over the baseline approach. Specifically, the proposed method achieved a substantial reduction in the average cumulative number of collisions per drone during training compared to the baseline. Additionally, in the testing phase, the proposed method realized a 43% improvement in the task success rate relative to the MADDPG.https://www.mdpi.com/2504-446X/8/9/481reinforcement learningcontrol barrier functionmultiple agents |
| spellingShingle | Hong Chen Dan Huang Chenggang Wang Lu Ding Lei Song Hongtao Liu Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning Drones reinforcement learning control barrier function multiple agents |
| title | Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning |
| title_full | Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning |
| title_fullStr | Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning |
| title_full_unstemmed | Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning |
| title_short | Collision-Free Path Planning for Multiple Drones Based on Safe Reinforcement Learning |
| title_sort | collision free path planning for multiple drones based on safe reinforcement learning |
| topic | reinforcement learning control barrier function multiple agents |
| url | https://www.mdpi.com/2504-446X/8/9/481 |
| work_keys_str_mv | AT hongchen collisionfreepathplanningformultipledronesbasedonsafereinforcementlearning AT danhuang collisionfreepathplanningformultipledronesbasedonsafereinforcementlearning AT chenggangwang collisionfreepathplanningformultipledronesbasedonsafereinforcementlearning AT luding collisionfreepathplanningformultipledronesbasedonsafereinforcementlearning AT leisong collisionfreepathplanningformultipledronesbasedonsafereinforcementlearning AT hongtaoliu collisionfreepathplanningformultipledronesbasedonsafereinforcementlearning |