BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion

This research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner...

Full description

Saved in:
Bibliographic Details
Main Authors: Jesslyn Nathania, Qiyuan Liu, Zhiheng Li, Liming Liu, Yipeng Gao
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3896
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849739476474003456
author Jesslyn Nathania
Qiyuan Liu
Zhiheng Li
Liming Liu
Yipeng Gao
author_facet Jesslyn Nathania
Qiyuan Liu
Zhiheng Li
Liming Liu
Yipeng Gao
author_sort Jesslyn Nathania
collection DOAJ
description This research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner enhances both global context and local precision, addressing the limitations of existing methods in depth precision, occlusion robustness, and computational efficiency. The paper explores four fusion techniques—direct replacement, weighted fusion, region-of-interest refinement, and hard combine—to balance the strengths of monocular and BEV depth estimation. Initial experiments on the NuScenes dataset yield a 38.72% NDS, which is lower than the baseline BEVDepth’s 43.59% NDS, highlighting the challenges in monocular pipeline alignment. Nevertheless, the upper-bound performance of BEVCorner is assessed under ground-truth depth supervision, and the results show a significant improvement, achieving a 53.21% NDS, despite a 21.96% increase in parameters (from 76.4 M to 97.9 M). The upper-bound analysis highlights the promise of camera-only fusion for resource-constrained scenarios.
format Article
id doaj-art-6d33eb702efa47f39e568b113f24cea3
institution DOAJ
issn 2076-3417
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-6d33eb702efa47f39e568b113f24cea32025-08-20T03:06:16ZengMDPI AGApplied Sciences2076-34172025-04-01157389610.3390/app15073896BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth FusionJesslyn Nathania0Qiyuan Liu1Zhiheng Li2Liming Liu3Yipeng Gao4Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, ChinaTsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, ChinaTsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, ChinaStreamax Technology Co., Ltd., 21-23/F B1 Building, Zhiyuan, No. 1001 Xueyuan Avenue, Shenzhen 518057, ChinaStreamax Technology Co., Ltd., 21-23/F B1 Building, Zhiyuan, No. 1001 Xueyuan Avenue, Shenzhen 518057, ChinaThis research paper presents BEVCorner, a novel framework that synergistically integrates monocular and multi-view pipelines for enhanced 3D object detection in autonomous driving. By fusing depth maps from Bird’s-Eye View (BEV) with object-centric depth estimates from monocular detection, BEVCorner enhances both global context and local precision, addressing the limitations of existing methods in depth precision, occlusion robustness, and computational efficiency. The paper explores four fusion techniques—direct replacement, weighted fusion, region-of-interest refinement, and hard combine—to balance the strengths of monocular and BEV depth estimation. Initial experiments on the NuScenes dataset yield a 38.72% NDS, which is lower than the baseline BEVDepth’s 43.59% NDS, highlighting the challenges in monocular pipeline alignment. Nevertheless, the upper-bound performance of BEVCorner is assessed under ground-truth depth supervision, and the results show a significant improvement, achieving a 53.21% NDS, despite a 21.96% increase in parameters (from 76.4 M to 97.9 M). The upper-bound analysis highlights the promise of camera-only fusion for resource-constrained scenarios.https://www.mdpi.com/2076-3417/15/7/3896bird’s eye view (BEV)monocular 3D object detectiondepth fusiondepth estimationautonomous driving
spellingShingle Jesslyn Nathania
Qiyuan Liu
Zhiheng Li
Liming Liu
Yipeng Gao
BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
Applied Sciences
bird’s eye view (BEV)
monocular 3D object detection
depth fusion
depth estimation
autonomous driving
title BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
title_full BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
title_fullStr BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
title_full_unstemmed BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
title_short BEVCorner: Enhancing Bird’s-Eye View Object Detection with Monocular Features via Depth Fusion
title_sort bevcorner enhancing bird s eye view object detection with monocular features via depth fusion
topic bird’s eye view (BEV)
monocular 3D object detection
depth fusion
depth estimation
autonomous driving
url https://www.mdpi.com/2076-3417/15/7/3896
work_keys_str_mv AT jesslynnathania bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion
AT qiyuanliu bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion
AT zhihengli bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion
AT limingliu bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion
AT yipenggao bevcornerenhancingbirdseyeviewobjectdetectionwithmonocularfeaturesviadepthfusion