Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles

This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles...

Full description

Saved in:
Bibliographic Details
Main Authors: Marco Job, David Botta, Victor Reijgwart, Luca Ebner, Andrej Studer, Roland Siegwart, Eleni Kelasidi
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Robotics and AI
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frobt.2025.1609765/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV’s global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV’s net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.
ISSN:2296-9144