Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO

Dense matching plays an important role in 3D modeling from satellite images. Its purpose is to establish pixel-by-pixel correspondences between two stereo images. This study presents a learning-based dense matching approach that integrates selfsupervised learning with a multi-head attention mechanis...

Full description

Saved in:
Bibliographic Details
Main Authors: X. He, M. Yang, S. Jiang, W. Jiang, Q. Li
Format: Article
Language:English
Published: Copernicus Publications 2025-07-01
Series:ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:https://isprs-annals.copernicus.org/articles/X-G-2025/357/2025/isprs-annals-X-G-2025-357-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849321117675683840
author X. He
M. Yang
S. Jiang
S. Jiang
W. Jiang
Q. Li
author_facet X. He
M. Yang
S. Jiang
S. Jiang
W. Jiang
Q. Li
author_sort X. He
collection DOAJ
description Dense matching plays an important role in 3D modeling from satellite images. Its purpose is to establish pixel-by-pixel correspondences between two stereo images. This study presents a learning-based dense matching approach that integrates selfsupervised learning with a multi-head attention mechanism to achieve feature fusion. Since stereo matching in satellite datasets is restricted by the disparity range, the pixel-by-pixel method can reduce the limitation. In the feature extraction module, we have performed attention-based in-depth learning on the smallest-scale feature using the self-supervised DINO. In addition, a CEP (Context-Enhanced Path) module is added outside the main matching path, and continuously enhanced position embedding is used to improve relative position encoding. The effectiveness of this method has been demonstrated through experiments on the US3D and WHU-Stereo datasets.
format Article
id doaj-art-de74f4b2e7e14387ba3ba2e65eea1c69
institution Kabale University
issn 2194-9042
2194-9050
language English
publishDate 2025-07-01
publisher Copernicus Publications
record_format Article
series ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
spelling doaj-art-de74f4b2e7e14387ba3ba2e65eea1c692025-08-20T03:49:50ZengCopernicus PublicationsISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences2194-90422194-90502025-07-01X-G-202535736410.5194/isprs-annals-X-G-2025-357-2025Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINOX. He0M. Yang1S. Jiang2S. Jiang3W. Jiang4Q. Li5School of Computer Science, China University of Geosciences, Wuhan 430074, ChinaSchool of Computer Science, China University of Geosciences, Wuhan 430074, ChinaGuangdong Key Laboratory of Urban Informatics, Shenzhen Univeristy, Shenzhen 518060, ChinaEngineering Research Center of Natural Resource Information Management and Digital Twin Engineering Software, Ministry of Education, Wuhan 430074, ChinaState Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430072, ChinaGuangdong Key Laboratory of Urban Informatics, Shenzhen Univeristy, Shenzhen 518060, ChinaDense matching plays an important role in 3D modeling from satellite images. Its purpose is to establish pixel-by-pixel correspondences between two stereo images. This study presents a learning-based dense matching approach that integrates selfsupervised learning with a multi-head attention mechanism to achieve feature fusion. Since stereo matching in satellite datasets is restricted by the disparity range, the pixel-by-pixel method can reduce the limitation. In the feature extraction module, we have performed attention-based in-depth learning on the smallest-scale feature using the self-supervised DINO. In addition, a CEP (Context-Enhanced Path) module is added outside the main matching path, and continuously enhanced position embedding is used to improve relative position encoding. The effectiveness of this method has been demonstrated through experiments on the US3D and WHU-Stereo datasets.https://isprs-annals.copernicus.org/articles/X-G-2025/357/2025/isprs-annals-X-G-2025-357-2025.pdf
spellingShingle X. He
M. Yang
S. Jiang
S. Jiang
W. Jiang
Q. Li
Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
title Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO
title_full Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO
title_fullStr Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO
title_full_unstemmed Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO
title_short Stereo Matching of High-Resolution Satellite Images via Hierarchical ViT and Self-Supervised DINO
title_sort stereo matching of high resolution satellite images via hierarchical vit and self supervised dino
url https://isprs-annals.copernicus.org/articles/X-G-2025/357/2025/isprs-annals-X-G-2025-357-2025.pdf
work_keys_str_mv AT xhe stereomatchingofhighresolutionsatelliteimagesviahierarchicalvitandselfsuperviseddino
AT myang stereomatchingofhighresolutionsatelliteimagesviahierarchicalvitandselfsuperviseddino
AT sjiang stereomatchingofhighresolutionsatelliteimagesviahierarchicalvitandselfsuperviseddino
AT sjiang stereomatchingofhighresolutionsatelliteimagesviahierarchicalvitandselfsuperviseddino
AT wjiang stereomatchingofhighresolutionsatelliteimagesviahierarchicalvitandselfsuperviseddino
AT qli stereomatchingofhighresolutionsatelliteimagesviahierarchicalvitandselfsuperviseddino