Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume
Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regio...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/7/2233 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850188952368054272 |
|---|---|
| author | Zongcheng Zuo Yuanxiang Li Yu Zhou Fan Mo |
| author_facet | Zongcheng Zuo Yuanxiang Li Yu Zhou Fan Mo |
| author_sort | Zongcheng Zuo |
| collection | DOAJ |
| description | Feature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets. |
| format | Article |
| id | doaj-art-0fdfca1c8ceb499f8e8f12fb13bfc9f8 |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-0fdfca1c8ceb499f8e8f12fb13bfc9f82025-08-20T02:15:46ZengMDPI AGSensors1424-82202025-04-01257223310.3390/s25072233Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost VolumeZongcheng Zuo0Yuanxiang Li1Yu Zhou2Fan Mo3School of Design, Shanghai Jiao Tong University, Shanghai 200240, ChinaSchool of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240, ChinaXi’an Institute of Surveying and Mapping, Xi’an 710054, ChinaLand Satellite Remote Sensing Application Center, Ministry of Natural Resources, Beijing 100048, ChinaFeature matching is pivotal when using multi-view stereo (MVS) to reconstruct dense 3D models from calibrated images. This paper proposes PAC-MVSNet, which integrates perspective-aware convolution (PAC) and metadata-enhanced cost volumes to address the challenges in reflective and texture-less regions. PAC dynamically aligns convolutional kernels with scene perspective lines, while the use of metadata (e.g., camera pose distance) enables geometric reasoning during cost aggregation. In PAC-MVSNet, we introduce feature matching with long-range tracking that utilizes both internal and external focuses to integrate extensive contextual data within individual images as well as across multiple images. To enhance the performance of the feature matching with long-range tracking, we also propose a perspective-aware convolution module that directs the convolutional kernel to capture features along the perspective lines. This enables the module to extract perspective-aware features from images, improving the feature matching. Finally, we crafted a specific 2D CNN that fuses image priors, thereby integrating keyframes and geometric metadata within the cost volume to evaluate depth planes. Our method represents the first attempt to embed the existing physical model knowledge into a network for completing MVS tasks, which achieved optimal performance using multiple benchmark datasets.https://www.mdpi.com/1424-8220/25/7/22333D reconstructiondrone remote sensingmulti-view stereofeature matchingdeep learningMVSNet |
| spellingShingle | Zongcheng Zuo Yuanxiang Li Yu Zhou Fan Mo Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume Sensors 3D reconstruction drone remote sensing multi-view stereo feature matching deep learning MVSNet |
| title | Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume |
| title_full | Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume |
| title_fullStr | Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume |
| title_full_unstemmed | Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume |
| title_short | Multi-View Stereo Using Perspective-Aware Features and Metadata to Improve Cost Volume |
| title_sort | multi view stereo using perspective aware features and metadata to improve cost volume |
| topic | 3D reconstruction drone remote sensing multi-view stereo feature matching deep learning MVSNet |
| url | https://www.mdpi.com/1424-8220/25/7/2233 |
| work_keys_str_mv | AT zongchengzuo multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume AT yuanxiangli multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume AT yuzhou multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume AT fanmo multiviewstereousingperspectiveawarefeaturesandmetadatatoimprovecostvolume |