Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image

Visual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neura...

Full description

Saved in:
Bibliographic Details
Main Authors: Nanda Febri Istighfarin, Seongwon Lee, HyungGi Jo
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10723272/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846127488567083008
author Nanda Febri Istighfarin
Seongwon Lee
HyungGi Jo
author_facet Nanda Febri Istighfarin
Seongwon Lee
HyungGi Jo
author_sort Nanda Febri Istighfarin
collection DOAJ
description Visual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neural network to regress the 2D-3D correspondences from images and utilizes these correspondences in a pose solver like PnP-RANSAC to estimate the pose of the query image. A common challenge is that regressing these correspondences often involves sampling across the entire 2D image, which is inefficient as not all areas contain useful information for the network. To address this, we propose sampling only the essential regions of an image to enhance the network’s learning efficiency. Our method selectively captures informative features by integrating the structural and edge contexts within images, identifying robust regions for sampling. This refinement allows the network to learn 2D-3D correspondences better. We tested our approach using both the publicly available outdoor dataset and our custom dataset, where it achieved state-of-the-art results in a large dataset.
format Article
id doaj-art-e809f62ed7cf4c388f0a51f00d721566
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e809f62ed7cf4c388f0a51f00d7215662024-12-12T00:00:46ZengIEEEIEEE Access2169-35362024-01-011215496315497410.1109/ACCESS.2024.348396310723272Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within ImageNanda Febri Istighfarin0https://orcid.org/0009-0007-3720-4356Seongwon Lee1https://orcid.org/0000-0002-7077-5595HyungGi Jo2https://orcid.org/0000-0003-2689-1940Division of Electronic Engineering, Jeonbuk National University, Jeonju, South KoreaSchool of Electrical Engineering, Kookmin University, Seoul, South KoreaDivision of Electronic Engineering, Jeonbuk National University, Jeonju, South KoreaVisual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neural network to regress the 2D-3D correspondences from images and utilizes these correspondences in a pose solver like PnP-RANSAC to estimate the pose of the query image. A common challenge is that regressing these correspondences often involves sampling across the entire 2D image, which is inefficient as not all areas contain useful information for the network. To address this, we propose sampling only the essential regions of an image to enhance the network’s learning efficiency. Our method selectively captures informative features by integrating the structural and edge contexts within images, identifying robust regions for sampling. This refinement allows the network to learn 2D-3D correspondences better. We tested our approach using both the publicly available outdoor dataset and our custom dataset, where it achieved state-of-the-art results in a large dataset.https://ieeexplore.ieee.org/document/10723272/Visual localizationpose estimationsampling moduleedge detectorstructural contextattention
spellingShingle Nanda Febri Istighfarin
Seongwon Lee
HyungGi Jo
Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
IEEE Access
Visual localization
pose estimation
sampling module
edge detector
structural context
attention
title Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
title_full Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
title_fullStr Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
title_full_unstemmed Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
title_short Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
title_sort back to the context tuning visual localization using structural and edge context within image
topic Visual localization
pose estimation
sampling module
edge detector
structural context
attention
url https://ieeexplore.ieee.org/document/10723272/
work_keys_str_mv AT nandafebriistighfarin backtothecontexttuningvisuallocalizationusingstructuralandedgecontextwithinimage
AT seongwonlee backtothecontexttuningvisuallocalizationusingstructuralandedgecontextwithinimage
AT hyunggijo backtothecontexttuningvisuallocalizationusingstructuralandedgecontextwithinimage