CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View

Abstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Peicheng Shi, Zhiqiang Liu, Xinlong Dong, Aixi Yang
Format:	Article
Language:	English
Published:	Springer 2024-07-01
Series:	Complex & Intelligent Systems
Subjects:	Bird’s Eye View (BEV) perception 3D object detection Attention mechanism Autonomous driving
Online Access:	https://doi.org/10.1007/s40747-024-01567-0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850275651190259712
author	Peicheng Shi Zhiqiang Liu Xinlong Dong Aixi Yang
author_facet	Peicheng Shi Zhiqiang Liu Xinlong Dong Aixi Yang
author_sort	Peicheng Shi
collection	DOAJ
description	Abstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.
format	Article
id	doaj-art-e90820165ff64e448f3d1310127a12be
institution	OA Journals
issn	2199-4536 2198-6053
language	English
publishDate	2024-07-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-e90820165ff64e448f3d1310127a12be2025-08-20T01:50:39ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-07-011067681769610.1007/s40747-024-01567-0CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye ViewPeicheng Shi0Zhiqiang Liu1Xinlong Dong2Aixi Yang3School of Mechanical and Automotive Engineering, Anhui Polytechnic UniversitySchool of Mechanical and Automotive Engineering, Anhui Polytechnic UniversitySchool of Mechanical and Automotive Engineering, Anhui Polytechnic UniversityPolytechnic Institute, Zhejiang UniversityAbstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.https://doi.org/10.1007/s40747-024-01567-0Bird’s Eye View (BEV) perception3D object detectionAttention mechanismAutonomous driving
spellingShingle	Peicheng Shi Zhiqiang Liu Xinlong Dong Aixi Yang CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View Complex & Intelligent Systems Bird’s Eye View (BEV) perception 3D object detection Attention mechanism Autonomous driving
title	CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_full	CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_fullStr	CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_full_unstemmed	CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_short	CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
title_sort	cl fusionbev 3d object detection method with camera lidar fusion in bird s eye view
topic	Bird’s Eye View (BEV) perception 3D object detection Attention mechanism Autonomous driving
url	https://doi.org/10.1007/s40747-024-01567-0
work_keys_str_mv	AT peichengshi clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview AT zhiqiangliu clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview AT xinlongdong clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview AT aixiyang clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview

CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View

Similar Items