Monocular Vision-Based Depth Estimation of Forward-Looking Scenes for Mobile Platforms

The depth estimation of forward-looking scenes is one of the fundamental tasks for an Intelligent Mobile Platform to perceive its surrounding environment. In response to this requirement, this paper proposes a self-supervised monocular depth estimation method that can be utilized across various mobi...

Full description

Saved in:
Bibliographic Details
Main Authors: Li Wei, Meng Ding, Shuai Li
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/8/4267
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The depth estimation of forward-looking scenes is one of the fundamental tasks for an Intelligent Mobile Platform to perceive its surrounding environment. In response to this requirement, this paper proposes a self-supervised monocular depth estimation method that can be utilized across various mobile platforms, including unmanned aerial vehicles (UAVs) and autonomous ground vehicles (AGVs). Building on the foundational framework of <i>Monodepth2</i>, we introduce an intermediate module between the encoder and decoder of the depth estimation network to facilitate multiscale fusion of feature maps. Additionally, we integrate the channel attention mechanism ECANet into the depth estimation network to enhance the significance of important channels. Consequently, the proposed method addresses the issue of losing critical features, which can lead to diminished accuracy and robustness. The experiments presented in this paper are conducted on two datasets: <i>KITTI</i>, a publicly available dataset collected from real-world environments used to evaluate depth estimation performance for AGV platforms, and <i>AirSim</i>, a custom dataset generated using simulation software to assess depth estimation performance for UAV platforms. The experimental results demonstrate that the proposed method can overcome the adverse effects of varying working conditions and accurately perceive detailed depth information in specific regions, such as object edges and targets of different scales. Furthermore, the depth predicted by the proposed method is quantitatively compared with the ground truth depth, and a variety of evaluation metrics confirm that our method exhibits superior inference capability and robustness.
ISSN:2076-3417