DP-AMF: Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion for Single-View 3D Reconstruction

Single-view 3D reconstruction remains fundamentally ill-posed, as a single RGB image lacks scale and depth cues, often yielding ambiguous results under occlusion or in texture-poor regions. We propose DP-AMF, a novel Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion framework that inte...

Full description

Saved in:
Bibliographic Details
Main Authors: Luoxi Zhang, Chun Xie, Itaru Kitahara
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/11/7/246
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Single-view 3D reconstruction remains fundamentally ill-posed, as a single RGB image lacks scale and depth cues, often yielding ambiguous results under occlusion or in texture-poor regions. We propose DP-AMF, a novel Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion framework that integrates high-fidelity depth priors—generated offline by the MARIGOLD diffusion-based estimator and cached to avoid extra training cost—with hierarchical local features from ResNet-32/ResNet-18 and semantic global features from DINO-ViT. A learnable fusion module dynamically adjusts per-channel weights to balance these modalities according to local texture and occlusion, and an implicit signed-distance field decoder reconstructs the final mesh. Extensive experiments on 3D-FRONT and Pix3D demonstrate that DP-AMF reduces Chamfer Distance by 7.64%, increases F-Score by 2.81%, and boosts Normal Consistency by 5.88% compared to strong baselines, while qualitative results show sharper edges and more complete geometry in challenging scenes. DP-AMF achieves these gains without substantially increasing model size or inference time, offering a robust and effective solution for complex single-view reconstruction tasks.
ISSN:2313-433X