Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume

Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regio...

Full description

Saved in:
Bibliographic Details
Main Authors: Zongcheng Zuo, Yuanxiang Li, Yu Zhou, Fan Mo
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/7/2233
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850188952368054272
author Zongcheng Zuo
Yuanxiang Li
Yu Zhou
Fan Mo
author_facet Zongcheng Zuo
Yuanxiang Li
Yu Zhou
Fan Mo
author_sort Zongcheng Zuo
collection DOAJ
description Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets.
format Article
id doaj-art-0fdfca1c8ceb499f8e8f12fb13bfc9f8
institution OA Journals
issn 1424-8220
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-0fdfca1c8ceb499f8e8f12fb13bfc9f82025-08-20T02:15:46ZengMDPI AGSensors1424-82202025-04-01257223310.3390/s25072233Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost VolumeZongcheng Zuo0Yuanxiang Li1Yu Zhou2Fan Mo3School of Design, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, ChinaXi’an Institute of Surveying and Mapping, Xi’an 710054, ChinaLand Satellite Remote Sensing Application Center, Ministry of Natural Resources, Beijing 100048, ChinaFeature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets.https://www.mdpi.com/1424-8220/25/7/22333D reconstructiondrone remote sensingmulti-view stereofeature matchingdeep learningMVSNet
spellingShingle Zongcheng Zuo
Yuanxiang Li
Yu Zhou
Fan Mo
Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
Sensors
3D reconstruction
drone remote sensing
multi-view stereo
feature matching
deep learning
MVSNet
title Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
title_full Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
title_fullStr Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
title_full_unstemmed Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
title_short Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
title_sort multi view stereo using perspective aware features and metadata to improve cost volume
topic 3D reconstruction
drone remote sensing
multi-view stereo
feature matching
deep learning
MVSNet
url https://www.mdpi.com/1424-8220/25/7/2233
work_keys_str_mv AT zongchengzuo multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume
AT yuanxiangli multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume
AT yuzhou multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume
AT fanmo multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume