BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
This research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/7/3896 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner enhances both global context and local precision, addressing the limitations of existing methods in depth precision, occlusion robustness, and computational efficiency. The paper explores four fusion techniques—direct replacement, weighted fusion, region-of-interest refinement, and hard combine—to balance the strengths of monocular and BEV depth estimation. Initial experiments on the NuScenes dataset yield a 38.72% NDS, which is lower than the baseline BEVDepth’s 43.59% NDS, highlighting the challenges in monocular pipeline alignment. Nevertheless, the upper-bound performance of BEVCorner is assessed under ground-truth depth supervision, and the results show a significant improvement, achieving a 53.21% NDS, despite a 21.96% increase in parameters (from 76.4 M to 97.9 M). The upper-bound analysis highlights the promise of camera-only fusion for resource-constrained scenarios. |
|---|---|
| ISSN: | 2076-3417 |