A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning
Unmanned aerial vehicles (UAVs) are increasingly being employed in search operations. Deep reinforcement learning (DRL), owing to its robust self-learning and adaptive capabilities, has been extensively applied to drone search tasks. However, traditional DRL approaches often suffer from long trainin...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-09-01
|
| Series: | Drones |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-446X/8/10/536 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850204949210726400 |
|---|---|
| author | Dexing Wei Lun Zhang Mei Yang Hanqiang Deng Jian Huang |
| author_facet | Dexing Wei Lun Zhang Mei Yang Hanqiang Deng Jian Huang |
| author_sort | Dexing Wei |
| collection | DOAJ |
| description | Unmanned aerial vehicles (UAVs) are increasingly being employed in search operations. Deep reinforcement learning (DRL), owing to its robust self-learning and adaptive capabilities, has been extensively applied to drone search tasks. However, traditional DRL approaches often suffer from long training times, especially in long-term search missions for UAVs, where the interaction cycles between the agent and the environment are extended. This paper addresses this critical issue by introducing a novel method—temporally asynchronous grouped environment reinforcement learning (TAGRL). Our key innovation lies in recognizing that as the number of training environments increases, agents can learn knowledge from discontinuous trajectories. This insight leads to the design of grouped environments, allowing agents to explore only a limited number of steps within each interaction cycle rather than completing full sequences. Consequently, TAGRL demonstrates faster learning speeds and lower memory consumption compared to existing parallel environment learning methods. The results indicate that this framework enhances the efficiency of UAV search tasks, paving the way for more scalable and effective applications of RL in complex scenarios. |
| format | Article |
| id | doaj-art-44aeccfd3a0b4f2f9730a9f35c48e588 |
| institution | OA Journals |
| issn | 2504-446X |
| language | English |
| publishDate | 2024-09-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Drones |
| spelling | doaj-art-44aeccfd3a0b4f2f9730a9f35c48e5882025-08-20T02:11:12ZengMDPI AGDrones2504-446X2024-09-0181053610.3390/drones8100536A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement LearningDexing Wei0Lun Zhang1Mei Yang2Hanqiang Deng3Jian Huang4College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, ChinaUnmanned aerial vehicles (UAVs) are increasingly being employed in search operations. Deep reinforcement learning (DRL), owing to its robust self-learning and adaptive capabilities, has been extensively applied to drone search tasks. However, traditional DRL approaches often suffer from long training times, especially in long-term search missions for UAVs, where the interaction cycles between the agent and the environment are extended. This paper addresses this critical issue by introducing a novel method—temporally asynchronous grouped environment reinforcement learning (TAGRL). Our key innovation lies in recognizing that as the number of training environments increases, agents can learn knowledge from discontinuous trajectories. This insight leads to the design of grouped environments, allowing agents to explore only a limited number of steps within each interaction cycle rather than completing full sequences. Consequently, TAGRL demonstrates faster learning speeds and lower memory consumption compared to existing parallel environment learning methods. The results indicate that this framework enhances the efficiency of UAV search tasks, paving the way for more scalable and effective applications of RL in complex scenarios.https://www.mdpi.com/2504-446X/8/10/536UAVsreinforcement learninglong-term search taskmulti-agent |
| spellingShingle | Dexing Wei Lun Zhang Mei Yang Hanqiang Deng Jian Huang A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning Drones UAVs reinforcement learning long-term search task multi-agent |
| title | A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning |
| title_full | A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning |
| title_fullStr | A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning |
| title_full_unstemmed | A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning |
| title_short | A Long-Term Target Search Method for Unmanned Aerial Vehicles Based on Reinforcement Learning |
| title_sort | long term target search method for unmanned aerial vehicles based on reinforcement learning |
| topic | UAVs reinforcement learning long-term search task multi-agent |
| url | https://www.mdpi.com/2504-446X/8/10/536 |
| work_keys_str_mv | AT dexingwei alongtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT lunzhang alongtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT meiyang alongtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT hanqiangdeng alongtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT jianhuang alongtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT dexingwei longtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT lunzhang longtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT meiyang longtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT hanqiangdeng longtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning AT jianhuang longtermtargetsearchmethodforunmannedaerialvehiclesbasedonreinforcementlearning |