BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems....
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | AI |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-2688/6/4/82 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850144241979752448 |
|---|---|
| author | Daniel Ayo Oladele Elisha Didam Markus Adnan M. Abu-Mahfouz |
| author_facet | Daniel Ayo Oladele Elisha Didam Markus Adnan M. Abu-Mahfouz |
| author_sort | Daniel Ayo Oladele |
| collection | DOAJ |
| description | Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain degradation and depth estimation inaccuracies persist. This paper introduces BEVCAM3D, a unified bird’s-eye view (BEV) architecture that fuses monocular cameras and LiDAR point clouds to overcome single-sensor limitations. BEVCAM3D integrates a deformable cross-modality attention module for feature alignment and a fast ground segmentation algorithm to reduce computational overhead by 40%. Evaluated on the nuScenes dataset, BEVCAM3D achieves state-of-the-art performance, with a 73.9% mAP and a 76.2% NDS, outperforming existing LiDAR-camera fusion methods like SparseFusion (72.0% mAP) and IS-Fusion (73.0% mAP). Notably, it excels in detecting pedestrians (91.0% AP) and traffic cones (89.9% AP), addressing the class imbalance in autonomous driving scenarios. The framework supports real-time inference at 11.2 FPS with an EfficientDet-B3 backbone and demonstrates robustness under low-light conditions (62.3% nighttime mAP). |
| format | Article |
| id | doaj-art-719e9fe62ac84cd58cb25aa0b69c4cea |
| institution | OA Journals |
| issn | 2673-2688 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | AI |
| spelling | doaj-art-719e9fe62ac84cd58cb25aa0b69c4cea2025-08-20T02:28:27ZengMDPI AGAI2673-26882025-04-01648210.3390/ai6040082BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point CloudsDaniel Ayo Oladele0Elisha Didam Markus1Adnan M. Abu-Mahfouz2Department of Electrical, Electronic and Computer Engineering, Central University of Technology, Bloemfontein 9301, South AfricaDepartment of Electrical, Electronic and Computer Engineering, Central University of Technology, Bloemfontein 9301, South AfricaEmerging Digital Technologies for the Fourth Industrial Revolution (EDT4IR) Research Centre, Council for Scientific and Industrial Research (CSIR), Pretoria 0184, South AfricaThree-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain degradation and depth estimation inaccuracies persist. This paper introduces BEVCAM3D, a unified bird’s-eye view (BEV) architecture that fuses monocular cameras and LiDAR point clouds to overcome single-sensor limitations. BEVCAM3D integrates a deformable cross-modality attention module for feature alignment and a fast ground segmentation algorithm to reduce computational overhead by 40%. Evaluated on the nuScenes dataset, BEVCAM3D achieves state-of-the-art performance, with a 73.9% mAP and a 76.2% NDS, outperforming existing LiDAR-camera fusion methods like SparseFusion (72.0% mAP) and IS-Fusion (73.0% mAP). Notably, it excels in detecting pedestrians (91.0% AP) and traffic cones (89.9% AP), addressing the class imbalance in autonomous driving scenarios. The framework supports real-time inference at 11.2 FPS with an EfficientDet-B3 backbone and demonstrates robustness under low-light conditions (62.3% nighttime mAP).https://www.mdpi.com/2673-2688/6/4/823D perceptionattentionmechanismsbird’s-eye view (BEV)multi-modalfusionobjectdetectionreal-timeprocessing |
| spellingShingle | Daniel Ayo Oladele Elisha Didam Markus Adnan M. Abu-Mahfouz BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds AI 3D perception attentionmechanisms bird’s-eye view (BEV) multi-modalfusion objectdetection real-timeprocessing |
| title | BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds |
| title_full | BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds |
| title_fullStr | BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds |
| title_full_unstemmed | BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds |
| title_short | BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds |
| title_sort | bev cam3d a unified bird s eye view architecture for autonomous driving with monocular cameras and 3d point clouds |
| topic | 3D perception attentionmechanisms bird’s-eye view (BEV) multi-modalfusion objectdetection real-timeprocessing |
| url | https://www.mdpi.com/2673-2688/6/4/82 |
| work_keys_str_mv | AT danielayooladele bevcam3daunifiedbirdseyeviewarchitectureforautonomousdrivingwithmonocularcamerasand3dpointclouds AT elishadidammarkus bevcam3daunifiedbirdseyeviewarchitectureforautonomousdrivingwithmonocularcamerasand3dpointclouds AT adnanmabumahfouz bevcam3daunifiedbirdseyeviewarchitectureforautonomousdrivingwithmonocularcamerasand3dpointclouds |