BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
This research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/7/3896 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849739476474003456 |
|---|---|
| author | Jesslyn Nathania Qiyuan Liu Zhiheng Li Liming Liu Yipeng Gao |
| author_facet | Jesslyn Nathania Qiyuan Liu Zhiheng Li Liming Liu Yipeng Gao |
| author_sort | Jesslyn Nathania |
| collection | DOAJ |
| description | This research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner enhances both global context and local precision, addressing the limitations of existing methods in depth precision, occlusion robustness, and computational efficiency. The paper explores four fusion techniques—direct replacement, weighted fusion, region-of-interest refinement, and hard combine—to balance the strengths of monocular and BEV depth estimation. Initial experiments on the NuScenes dataset yield a 38.72% NDS, which is lower than the baseline BEVDepth’s 43.59% NDS, highlighting the challenges in monocular pipeline alignment. Nevertheless, the upper-bound performance of BEVCorner is assessed under ground-truth depth supervision, and the results show a significant improvement, achieving a 53.21% NDS, despite a 21.96% increase in parameters (from 76.4 M to 97.9 M). The upper-bound analysis highlights the promise of camera-only fusion for resource-constrained scenarios. |
| format | Article |
| id | doaj-art-6d33eb702efa47f39e568b113f24cea3 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-6d33eb702efa47f39e568b113f24cea32025-08-20T03:06:16ZengMDPI AGApplied Sciences2076-34172025-04-01157389610.3390/app15073896BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth FusionJesslyn Nathania0Qiyuan Liu1Zhiheng Li2Liming Liu3Yipeng Gao4Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, ChinaTsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, ChinaTsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, ChinaStreamax Technology Co., Ltd., 21-23/F B1 Building, Zhiyuan, No. 1001 Xueyuan Avenue, Shenzhen 518057, ChinaStreamax Technology Co., Ltd., 21-23/F B1 Building, Zhiyuan, No. 1001 Xueyuan Avenue, Shenzhen 518057, ChinaThis research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner enhances both global context and local precision, addressing the limitations of existing methods in depth precision, occlusion robustness, and computational efficiency. The paper explores four fusion techniques—direct replacement, weighted fusion, region-of-interest refinement, and hard combine—to balance the strengths of monocular and BEV depth estimation. Initial experiments on the NuScenes dataset yield a 38.72% NDS, which is lower than the baseline BEVDepth’s 43.59% NDS, highlighting the challenges in monocular pipeline alignment. Nevertheless, the upper-bound performance of BEVCorner is assessed under ground-truth depth supervision, and the results show a significant improvement, achieving a 53.21% NDS, despite a 21.96% increase in parameters (from 76.4 M to 97.9 M). The upper-bound analysis highlights the promise of camera-only fusion for resource-constrained scenarios.https://www.mdpi.com/2076-3417/15/7/3896bird’s eye view (BEV)monocular 3D object detectiondepth fusiondepth estimationautonomous driving |
| spellingShingle | Jesslyn Nathania Qiyuan Liu Zhiheng Li Liming Liu Yipeng Gao BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion Applied Sciences bird’s eye view (BEV) monocular 3D object detection depth fusion depth estimation autonomous driving |
| title | BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion |
| title_full | BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion |
| title_fullStr | BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion |
| title_full_unstemmed | BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion |
| title_short | BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion |
| title_sort | bevcorner enhancing bird s eye view object detection with monocular features via depth fusion |
| topic | bird’s eye view (BEV) monocular 3D object detection depth fusion depth estimation autonomous driving |
| url | https://www.mdpi.com/2076-3417/15/7/3896 |
| work_keys_str_mv | AT jesslynnathania bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion AT qiyuanliu bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion AT zhihengli bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion AT limingliu bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion AT yipenggao bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion |