Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image
Visual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neura...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10723272/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846127488567083008 |
|---|---|
| author | Nanda Febri Istighfarin Seongwon Lee HyungGi Jo |
| author_facet | Nanda Febri Istighfarin Seongwon Lee HyungGi Jo |
| author_sort | Nanda Febri Istighfarin |
| collection | DOAJ |
| description | Visual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neural network to regress the 2D-3D correspondences from images and utilizes these correspondences in a pose solver like PnP-RANSAC to estimate the pose of the query image. A common challenge is that regressing these correspondences often involves sampling across the entire 2D image, which is inefficient as not all areas contain useful information for the network. To address this, we propose sampling only the essential regions of an image to enhance the network’s learning efficiency. Our method selectively captures informative features by integrating the structural and edge contexts within images, identifying robust regions for sampling. This refinement allows the network to learn 2D-3D correspondences better. We tested our approach using both the publicly available outdoor dataset and our custom dataset, where it achieved state-of-the-art results in a large dataset. |
| format | Article |
| id | doaj-art-e809f62ed7cf4c388f0a51f00d721566 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-e809f62ed7cf4c388f0a51f00d7215662024-12-12T00:00:46ZengIEEEIEEE Access2169-35362024-01-011215496315497410.1109/ACCESS.2024.348396310723272Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within ImageNanda Febri Istighfarin0https://orcid.org/0009-0007-3720-4356Seongwon Lee1https://orcid.org/0000-0002-7077-5595HyungGi Jo2https://orcid.org/0000-0003-2689-1940Division of Electronic Engineering, Jeonbuk National University, Jeonju, South KoreaSchool of Electrical Engineering, Kookmin University, Seoul, South KoreaDivision of Electronic Engineering, Jeonbuk National University, Jeonju, South KoreaVisual localization has become a crucial task in robotics, especially in autonomous vehicles and virtual reality, due to its ability to utilize inexpensive sensors and achieve high accuracy. Among various methods, the scene coordinate regression network is a recent approach. This method uses a neural network to regress the 2D-3D correspondences from images and utilizes these correspondences in a pose solver like PnP-RANSAC to estimate the pose of the query image. A common challenge is that regressing these correspondences often involves sampling across the entire 2D image, which is inefficient as not all areas contain useful information for the network. To address this, we propose sampling only the essential regions of an image to enhance the network’s learning efficiency. Our method selectively captures informative features by integrating the structural and edge contexts within images, identifying robust regions for sampling. This refinement allows the network to learn 2D-3D correspondences better. We tested our approach using both the publicly available outdoor dataset and our custom dataset, where it achieved state-of-the-art results in a large dataset.https://ieeexplore.ieee.org/document/10723272/Visual localizationpose estimationsampling moduleedge detectorstructural contextattention |
| spellingShingle | Nanda Febri Istighfarin Seongwon Lee HyungGi Jo Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image IEEE Access Visual localization pose estimation sampling module edge detector structural context attention |
| title | Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image |
| title_full | Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image |
| title_fullStr | Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image |
| title_full_unstemmed | Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image |
| title_short | Back to the Context: Tuning Visual Localization Using Structural and Edge Context Within Image |
| title_sort | back to the context tuning visual localization using structural and edge context within image |
| topic | Visual localization pose estimation sampling module edge detector structural context attention |
| url | https://ieeexplore.ieee.org/document/10723272/ |
| work_keys_str_mv | AT nandafebriistighfarin backtothecontexttuningvisuallocalizationusingstructuralandedgecontextwithinimage AT seongwonlee backtothecontexttuningvisuallocalizationusingstructuralandedgecontextwithinimage AT hyunggijo backtothecontexttuningvisuallocalizationusingstructuralandedgecontextwithinimage |