CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View
Abstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predi...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2024-07-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-024-01567-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850275651190259712 |
|---|---|
| author | Peicheng Shi Zhiqiang Liu Xinlong Dong Aixi Yang |
| author_facet | Peicheng Shi Zhiqiang Liu Xinlong Dong Aixi Yang |
| author_sort | Peicheng Shi |
| collection | DOAJ |
| description | Abstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods. |
| format | Article |
| id | doaj-art-e90820165ff64e448f3d1310127a12be |
| institution | OA Journals |
| issn | 2199-4536 2198-6053 |
| language | English |
| publishDate | 2024-07-01 |
| publisher | Springer |
| record_format | Article |
| series | Complex & Intelligent Systems |
| spelling | doaj-art-e90820165ff64e448f3d1310127a12be2025-08-20T01:50:39ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-07-011067681769610.1007/s40747-024-01567-0CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye ViewPeicheng Shi0Zhiqiang Liu1Xinlong Dong2Aixi Yang3School of Mechanical and Automotive Engineering, Anhui Polytechnic UniversitySchool of Mechanical and Automotive Engineering, Anhui Polytechnic UniversitySchool of Mechanical and Automotive Engineering, Anhui Polytechnic UniversityPolytechnic Institute, Zhejiang UniversityAbstract In the wave of research on autonomous driving, 3D object detection from the Bird’s Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera’s perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods.https://doi.org/10.1007/s40747-024-01567-0Bird’s Eye View (BEV) perception3D object detectionAttention mechanismAutonomous driving |
| spellingShingle | Peicheng Shi Zhiqiang Liu Xinlong Dong Aixi Yang CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View Complex & Intelligent Systems Bird’s Eye View (BEV) perception 3D object detection Attention mechanism Autonomous driving |
| title | CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View |
| title_full | CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View |
| title_fullStr | CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View |
| title_full_unstemmed | CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View |
| title_short | CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird’s Eye View |
| title_sort | cl fusionbev 3d object detection method with camera lidar fusion in bird s eye view |
| topic | Bird’s Eye View (BEV) perception 3D object detection Attention mechanism Autonomous driving |
| url | https://doi.org/10.1007/s40747-024-01567-0 |
| work_keys_str_mv | AT peichengshi clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview AT zhiqiangliu clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview AT xinlongdong clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview AT aixiyang clfusionbev3dobjectdetectionmethodwithcameralidarfusioninbirdseyeview |