Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles

This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles...

Full description

Saved in:
Bibliographic Details
Main Authors: Marco Job, David Botta, Victor Reijgwart, Luca Ebner, Andrej Studer, Roland Siegwart, Eleni Kelasidi
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Robotics and AI
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frobt.2025.1609765/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850157067086594048
author Marco Job
Marco Job
David Botta
Victor Reijgwart
Luca Ebner
Andrej Studer
Roland Siegwart
Eleni Kelasidi
Eleni Kelasidi
Eleni Kelasidi
author_facet Marco Job
Marco Job
David Botta
Victor Reijgwart
Luca Ebner
Andrej Studer
Roland Siegwart
Eleni Kelasidi
Eleni Kelasidi
Eleni Kelasidi
author_sort Marco Job
collection DOAJ
description This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV’s global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV’s net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.
format Article
id doaj-art-d9fd88c966dc41bc80da9e243b3ddbdf
institution OA Journals
issn 2296-9144
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Robotics and AI
spelling doaj-art-d9fd88c966dc41bc80da9e243b3ddbdf2025-08-20T02:24:17ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442025-06-011210.3389/frobt.2025.16097651609765Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehiclesMarco Job0Marco Job1David Botta2Victor Reijgwart3Luca Ebner4Andrej Studer5Roland Siegwart6Eleni Kelasidi7Eleni Kelasidi8Eleni Kelasidi9Autonomous Systems Lab, Institute of Robotics and Intelligent Systems, Department of Mechanical and Process Engineering, ETH, Zurich, Zurich, SwitzerlandDepartment of Mechanical and Industrial Engineering, Norwegian University of Science and Technology (NTNU), Trondheim, NorwayAutonomous Systems Lab, Institute of Robotics and Intelligent Systems, Department of Mechanical and Process Engineering, ETH, Zurich, Zurich, SwitzerlandAutonomous Systems Lab, Institute of Robotics and Intelligent Systems, Department of Mechanical and Process Engineering, ETH, Zurich, Zurich, SwitzerlandTethys Robotics, Zurich, SwitzerlandTethys Robotics, Zurich, SwitzerlandAutonomous Systems Lab, Institute of Robotics and Intelligent Systems, Department of Mechanical and Process Engineering, ETH, Zurich, Zurich, SwitzerlandAutonomous Systems Lab, Institute of Robotics and Intelligent Systems, Department of Mechanical and Process Engineering, ETH, Zurich, Zurich, SwitzerlandDepartment of Mechanical and Industrial Engineering, Norwegian University of Science and Technology (NTNU), Trondheim, NorwayAquaculture Robotics and Automation Group, SINTEF Ocean, Trondheim, NorwayThis paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV’s global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV’s net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.https://www.frontiersin.org/articles/10.3389/frobt.2025.1609765/fulllocalizationmappingUUVsdepth predictionaquaculture
spellingShingle Marco Job
Marco Job
David Botta
Victor Reijgwart
Luca Ebner
Andrej Studer
Roland Siegwart
Eleni Kelasidi
Eleni Kelasidi
Eleni Kelasidi
Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
Frontiers in Robotics and AI
localization
mapping
UUVs
depth prediction
aquaculture
title Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
title_full Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
title_fullStr Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
title_full_unstemmed Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
title_short Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
title_sort leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles
topic localization
mapping
UUVs
depth prediction
aquaculture
url https://www.frontiersin.org/articles/10.3389/frobt.2025.1609765/full
work_keys_str_mv AT marcojob leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT marcojob leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT davidbotta leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT victorreijgwart leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT lucaebner leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT andrejstuder leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT rolandsiegwart leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT elenikelasidi leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT elenikelasidi leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles
AT elenikelasidi leveraginglearnedmonoculardepthpredictionforposeestimationandmappingonunmannedunderwatervehicles