Trajectory Aware Deep Reinforcement Learning Navigation Using Multichannel Cost Maps

Deep reinforcement learning (DRL)-based navigation in an environment with dynamic obstacles is a challenging task due to the partially observable nature of the problem. While DRL algorithms are built around the Markov property (assumption that all the necessary information for making a decision is c...

Full description

Saved in:
Bibliographic Details
Main Authors: Tareq A. Fahmy, Omar M. Shehata, Shady A. Maged
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Robotics
Subjects:
Online Access:https://www.mdpi.com/2218-6581/13/11/166
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep reinforcement learning (DRL)-based navigation in an environment with dynamic obstacles is a challenging task due to the partially observable nature of the problem. While DRL algorithms are built around the Markov property (assumption that all the necessary information for making a decision is contained in a single observation of the current state) for structuring the learning process; the partially observable Markov property in the DRL navigation problem is significantly amplified when dealing with dynamic obstacles. A single observation or measurement of the environment is often insufficient for capturing the dynamic behavior of obstacles, thereby hindering the agent’s decision-making. This study addresses this challenge by using an environment-specific heuristic approach to augment the dynamic obstacles’ temporal information in observation to guide the agent’s decision-making. We proposed Multichannel Cost Map Observation for Spatial and Temporal Information (M-COST) to mitigate these limitations. Our results show that the M-COST approach more than doubles the convergence rate in concentrated tunnel situations, where successful navigation is only possible if the agent learns to avoid dynamic obstacles. Additionally, navigation efficiency improved by 35% in tunnel scenarios and by 12% in dense-environment navigation compared to standard methods that rely on raw sensor data or frame stacking.
ISSN:2218-6581