CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View

Abstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predi...

Full description

Saved in:
Bibliographic Details
Main Authors: Peicheng Shi, Zhiqiang Liu, Xinlong Dong, Aixi Yang
Format: Article
Language:English
Published: Springer 2024-07-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01567-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850275651190259712
author Peicheng Shi
Zhiqiang Liu
Xinlong Dong
Aixi Yang
author_facet Peicheng Shi
Zhiqiang Liu
Xinlong Dong
Aixi Yang
author_sort Peicheng Shi
collection DOAJ
description Abstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.
format Article
id doaj-art-e90820165ff64e448f3d1310127a12be
institution OA Journals
issn 2199-4536
2198-6053
language English
publishDate 2024-07-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-e90820165ff64e448f3d1310127a12be2025-08-20T01:50:39ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-07-011067681769610.1007/s40747-024-01567-0CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye ViewPeicheng Shi0Zhiqiang Liu1Xinlong Dong2Aixi Yang3School of Mechanical and Automotive Engineering, Anhui Polytechnic UniversitySchool of Mechanical and Automotive Engineering, Anhui Polytechnic UniversitySchool of Mechanical and Automotive Engineering, Anhui Polytechnic UniversityPolytechnic Institute, Zhejiang UniversityAbstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.https://doi.org/10.1007/s40747-024-01567-0Bird’s Eye View (BEV) perception3D object detectionAttention mechanismAutonomous driving
spellingShingle Peicheng Shi
Zhiqiang Liu
Xinlong Dong
Aixi Yang
CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
Complex & Intelligent Systems
Bird’s Eye View (BEV) perception
3D object detection
Attention mechanism
Autonomous driving
title CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_full CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_fullStr CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_full_unstemmed CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_short CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_sort cl fusionbev 3d object detection method with camera lidar fusion in bird s eye view
topic Bird’s Eye View (BEV) perception
3D object detection
Attention mechanism
Autonomous driving
url https://doi.org/10.1007/s40747-024-01567-0
work_keys_str_mv AT peichengshi clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview
AT zhiqiangliu clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview
AT xinlongdong clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview
AT aixiyang clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview