Aerial–Terrestrial Image Feature Matching: An Evaluation of Recent Deep Learning Methods

The 3-D reconstruction of complex urban areas is becoming increasingly important for various applications. To achieve precise and complete 3-D reconstruction, current approaches aim to combine aerial and terrestrial images. The main challenge is achieving reliable feature matching of aerial and terr...

Full description

Saved in:
Bibliographic Details
Main Authors: Hui Wang, Jiangxue Yu, San Jiang, Dejin Zhang, Qingquan Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10976566/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The 3-D reconstruction of complex urban areas is becoming increasingly important for various applications. To achieve precise and complete 3-D reconstruction, current approaches aim to combine aerial and terrestrial images. The main challenge is achieving reliable feature matching of aerial and terrestrial images under large viewing angles and varying scene illuminations. Traditional handcrafted methods experience a significant decline in matching performance. In this context, deep-learning-based feature matching methods have developed rapidly and gained extensive attention. However, their performance in handling challenging large-angle aerial–terrestrial datasets still needs to be evaluated. To assess their performance for aerial–terrestrial images, this study has reviewed and evaluated four types of recent deep-learning-based feature matching networks and selected four sets of aerial–terrestrial datasets for experimental tests. Extensive experiments and evaluations have been conducted in terms of feature matching and image orientation based on structure from motion (SfM). The results demonstrate that graph-neural-network-based methods and detector-free methods exhibit significant advantages in feature matching of aerial–terrestrial datasets, which can generate effective and correct matches for aerial–terrestrial images with large-scale and viewpoint differences. In particular, the combination of SuperPoint and LightGlue achieves the best performance, which can generate approximately ten times the number of aerial–terrestrial feature matches when compared with scale invariant feature transform (SIFT). In addition, all images can be registered in SfM reconstruction using its matching results. However, the precision of deep-learning-based methods is still inferior to the classical handcrafted method in SfM reconstruction. Thus, there is still significant room for improvement to enhance their performance further.
ISSN:1939-1404
2151-1535