Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps

Deep reinforcement learning (DRL)-based navigation in an environment with dynamic obstacles is a challenging task due to the partially observable nature of the problem. While DRL algorithms are built around the Markov property (assumption that all the necessary information for making a decision is c...

Full description

Saved in:
Bibliographic Details
Main Authors: Tareq A. Fahmy, Omar M. Shehata, Shady A. Maged
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Robotics
Subjects:
Online Access:https://www.mdpi.com/2218-6581/13/11/166
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846152531514753024
author Tareq A. Fahmy
Omar M. Shehata
Shady A. Maged
author_facet Tareq A. Fahmy
Omar M. Shehata
Shady A. Maged
author_sort Tareq A. Fahmy
collection DOAJ
description Deep reinforcement learning (DRL)-based navigation in an environment with dynamic obstacles is a challenging task due to the partially observable nature of the problem. While DRL algorithms are built around the Markov property (assumption that all the necessary information for making a decision is contained in a single observation of the current state) for structuring the learning process; the partially observable Markov property in the DRL navigation problem is significantly amplified when dealing with dynamic obstacles. A single observation or measurement of the environment is often insufficient for capturing the dynamic behavior of obstacles, thereby hindering the agent’s decision-making. This study addresses this challenge by using an environment-specific heuristic approach to augment the dynamic obstacles’ temporal information in observation to guide the agent’s decision-making. We proposed Multichannel Cost Map Observation for Spatial and Temporal Information (M-COST) to mitigate these limitations. Our results show that the M-COST approach more than doubles the convergence rate in concentrated tunnel situations, where successful navigation is only possible if the agent learns to avoid dynamic obstacles. Additionally, navigation efficiency improved by 35% in tunnel scenarios and by 12% in dense-environment navigation compared to standard methods that rely on raw sensor data or frame stacking.
format Article
id doaj-art-e691b21c57d44112818e43e87abe41e8
institution Kabale University
issn 2218-6581
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Robotics
spelling doaj-art-e691b21c57d44112818e43e87abe41e82024-11-26T18:20:41ZengMDPI AGRobotics2218-65812024-11-01131116610.3390/robotics13110166Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost MapsTareq A. Fahmy0Omar M. Shehata1Shady A. Maged2Mechatronics Engineering Department, Ain Shams University, Cairo 11535, EgyptMechatronics Engineering Department, Ain Shams University, Cairo 11535, EgyptMechatronics Engineering Department, Ain Shams University, Cairo 11535, EgyptDeep reinforcement learning (DRL)-based navigation in an environment with dynamic obstacles is a challenging task due to the partially observable nature of the problem. While DRL algorithms are built around the Markov property (assumption that all the necessary information for making a decision is contained in a single observation of the current state) for structuring the learning process; the partially observable Markov property in the DRL navigation problem is significantly amplified when dealing with dynamic obstacles. A single observation or measurement of the environment is often insufficient for capturing the dynamic behavior of obstacles, thereby hindering the agent’s decision-making. This study addresses this challenge by using an environment-specific heuristic approach to augment the dynamic obstacles’ temporal information in observation to guide the agent’s decision-making. We proposed Multichannel Cost Map Observation for Spatial and Temporal Information (M-COST) to mitigate these limitations. Our results show that the M-COST approach more than doubles the convergence rate in concentrated tunnel situations, where successful navigation is only possible if the agent learns to avoid dynamic obstacles. Additionally, navigation efficiency improved by 35% in tunnel scenarios and by 12% in dense-environment navigation compared to standard methods that rely on raw sensor data or frame stacking.https://www.mdpi.com/2218-6581/13/11/166deep reinforcement learningnavigationmultichannel cost maptrajectory awarespatial and temporal representationPOMDP (partially observable Markov decision process)
spellingShingle Tareq A. Fahmy
Omar M. Shehata
Shady A. Maged
Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps
Robotics
deep reinforcement learning
navigation
multichannel cost map
trajectory aware
spatial and temporal representation
POMDP (partially observable Markov decision process)
title Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps
title_full Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps
title_fullStr Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps
title_full_unstemmed Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps
title_short Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps
title_sort trajectory aware deep reinforcement learning navigation using multichannel cost maps
topic deep reinforcement learning
navigation
multichannel cost map
trajectory aware
spatial and temporal representation
POMDP (partially observable Markov decision process)
url https://www.mdpi.com/2218-6581/13/11/166
work_keys_str_mv AT tareqafahmy trajectoryawaredeepreinforcementlearningnavigationusingmultichannelcostmaps
AT omarmshehata trajectoryawaredeepreinforcementlearningnavigationusingmultichannelcostmaps
AT shadyamaged trajectoryawaredeepreinforcementlearningnavigationusingmultichannelcostmaps