BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds

Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems....

Full description

Saved in:
Bibliographic Details
Main Authors: Daniel Ayo Oladele, Elisha Didam Markus, Adnan M. Abu-Mahfouz
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/4/82
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850144241979752448
author Daniel Ayo Oladele
Elisha Didam Markus
Adnan M. Abu-Mahfouz
author_facet Daniel Ayo Oladele
Elisha Didam Markus
Adnan M. Abu-Mahfouz
author_sort Daniel Ayo Oladele
collection DOAJ
description Three-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain degradation and depth estimation inaccuracies persist. This paper introduces BEVCAM3D, a unified bird’s-eye view (BEV) architecture that fuses monocular cameras and LiDAR point clouds to overcome single-sensor limitations. BEVCAM3D integrates a deformable cross-modality attention module for feature alignment and a fast ground segmentation algorithm to reduce computational overhead by 40%. Evaluated on the nuScenes dataset, BEVCAM3D achieves state-of-the-art performance, with a 73.9% mAP and a 76.2% NDS, outperforming existing LiDAR-camera fusion methods like SparseFusion (72.0% mAP) and IS-Fusion (73.0% mAP). Notably, it excels in detecting pedestrians (91.0% AP) and traffic cones (89.9% AP), addressing the class imbalance in autonomous driving scenarios. The framework supports real-time inference at 11.2 FPS with an EfficientDet-B3 backbone and demonstrates robustness under low-light conditions (62.3% nighttime mAP).
format Article
id doaj-art-719e9fe62ac84cd58cb25aa0b69c4cea
institution OA Journals
issn 2673-2688
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series AI
spelling doaj-art-719e9fe62ac84cd58cb25aa0b69c4cea2025-08-20T02:28:27ZengMDPI AGAI2673-26882025-04-01648210.3390/ai6040082BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point CloudsDaniel Ayo Oladele0Elisha Didam Markus1Adnan M. Abu-Mahfouz2Department of Electrical, Electronic and Computer Engineering, Central University of Technology, Bloemfontein 9301, South AfricaDepartment of Electrical, Electronic and Computer Engineering, Central University of Technology, Bloemfontein 9301, South AfricaEmerging Digital Technologies for the Fourth Industrial Revolution (EDT4IR) Research Centre, Council for Scientific and Industrial Research (CSIR), Pretoria 0184, South AfricaThree-dimensional (3D) visual perception is pivotal for understanding surrounding environments in applications such as autonomous driving and mobile robotics. While LiDAR-based models dominate due to accurate depth sensing, their cost and sparse outputs have driven interest in camera-based systems. However, challenges like cross-domain degradation and depth estimation inaccuracies persist. This paper introduces BEVCAM3D, a unified bird’s-eye view (BEV) architecture that fuses monocular cameras and LiDAR point clouds to overcome single-sensor limitations. BEVCAM3D integrates a deformable cross-modality attention module for feature alignment and a fast ground segmentation algorithm to reduce computational overhead by 40%. Evaluated on the nuScenes dataset, BEVCAM3D achieves state-of-the-art performance, with a 73.9% mAP and a 76.2% NDS, outperforming existing LiDAR-camera fusion methods like SparseFusion (72.0% mAP) and IS-Fusion (73.0% mAP). Notably, it excels in detecting pedestrians (91.0% AP) and traffic cones (89.9% AP), addressing the class imbalance in autonomous driving scenarios. The framework supports real-time inference at 11.2 FPS with an EfficientDet-B3 backbone and demonstrates robustness under low-light conditions (62.3% nighttime mAP).https://www.mdpi.com/2673-2688/6/4/823D perceptionattentionmechanismsbird’s-eye view (BEV)multi-modalfusionobjectdetectionreal-timeprocessing
spellingShingle Daniel Ayo Oladele
Elisha Didam Markus
Adnan M. Abu-Mahfouz
BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
AI
3D perception
attentionmechanisms
bird’s-eye view (BEV)
multi-modalfusion
objectdetection
real-timeprocessing
title BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
title_full BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
title_fullStr BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
title_full_unstemmed BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
title_short BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds
title_sort bev cam3d a unified bird s eye view architecture for autonomous driving with monocular cameras and 3d point clouds
topic 3D perception
attentionmechanisms
bird’s-eye view (BEV)
multi-modalfusion
objectdetection
real-timeprocessing
url https://www.mdpi.com/2673-2688/6/4/82
work_keys_str_mv AT danielayooladele bevcam3daunifiedbirdseyeviewarchitectureforautonomousdrivingwithmonocularcamerasand3dpointclouds
AT elishadidammarkus bevcam3daunifiedbirdseyeviewarchitectureforautonomousdrivingwithmonocularcamerasand3dpointclouds
AT adnanmabumahfouz bevcam3daunifiedbirdseyeviewarchitectureforautonomousdrivingwithmonocularcamerasand3dpointclouds