Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving

End-to-end autonomous driving has become a key research focus in autonomous vehicles. However, existing methods struggle with effectively fusing heterogeneous sensor inputs and converting dense perceptual features into sparse motion representations. To address these challenges, we propose BevDrive,...

Full description

Saved in:
Bibliographic Details
Main Authors: Ze Yu, Jun Li, Yuzhen Wei, Yuandong Lyu, Xiaojun Tan
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/9/4/281
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850183565883473920
author Ze Yu
Jun Li
Yuzhen Wei
Yuandong Lyu
Xiaojun Tan
author_facet Ze Yu
Jun Li
Yuzhen Wei
Yuandong Lyu
Xiaojun Tan
author_sort Ze Yu
collection DOAJ
description End-to-end autonomous driving has become a key research focus in autonomous vehicles. However, existing methods struggle with effectively fusing heterogeneous sensor inputs and converting dense perceptual features into sparse motion representations. To address these challenges, we propose BevDrive, a novel end-to-end autonomous driving framework that unifies camera–LiDAR fusion and motion planning through a bird’s-eye view (BEV) representation. BevDrive consists of three core modules: the bidirectionally guided BEV feature construction module, the dual-attention BEV feature fusion module, and the BEV-based motion planning module. The bidirectionally guided BEV feature construction module comprises two branches: depth-guided image BEV feature construction and image-guided LiDAR BEV feature construction. Depth-guided image BEV feature construction employs a lifting and projection approach guided by depth information from LiDAR, transforming image features into a BEV representation. Meanwhile, image-guided LiDAR BEV feature construction enriches sparse LiDAR BEV features by integrating complementary information from the images. Then, the dual-attention BEV feature fusion module combines multi-modal BEV features at both local and global levels using a hybrid approach of window self-attention and global self-attention mechanisms. Finally, the BEV-based motion planning module integrates perception and planning by refining control and trajectory queries through interactions with the scene context in the fused BEV features, generating precise trajectory points and control commands. Extensive experiments on the CARLA Town05 Long benchmark demonstrate that BevDrive achieves state-of-the-art performance. Furthermore, we validate the feasibility of the proposed algorithm on a real-world vehicle platform, confirming its practical applicability and robustness.
format Article
id doaj-art-79adf3e5edd244f0b9b33f00959729ec
institution OA Journals
issn 2504-446X
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-79adf3e5edd244f0b9b33f00959729ec2025-08-20T02:17:20ZengMDPI AGDrones2504-446X2025-04-019428110.3390/drones9040281Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous DrivingZe Yu0Jun Li1Yuzhen Wei2Yuandong Lyu3Xiaojun Tan4School of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, ChinaSchool of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, ChinaSchool of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, ChinaLi Auto Inc., Shanghai 201800, ChinaSchool of Intelligent Systems Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, ChinaEnd-to-end autonomous driving has become a key research focus in autonomous vehicles. However, existing methods struggle with effectively fusing heterogeneous sensor inputs and converting dense perceptual features into sparse motion representations. To address these challenges, we propose BevDrive, a novel end-to-end autonomous driving framework that unifies camera–LiDAR fusion and motion planning through a bird’s-eye view (BEV) representation. BevDrive consists of three core modules: the bidirectionally guided BEV feature construction module, the dual-attention BEV feature fusion module, and the BEV-based motion planning module. The bidirectionally guided BEV feature construction module comprises two branches: depth-guided image BEV feature construction and image-guided LiDAR BEV feature construction. Depth-guided image BEV feature construction employs a lifting and projection approach guided by depth information from LiDAR, transforming image features into a BEV representation. Meanwhile, image-guided LiDAR BEV feature construction enriches sparse LiDAR BEV features by integrating complementary information from the images. Then, the dual-attention BEV feature fusion module combines multi-modal BEV features at both local and global levels using a hybrid approach of window self-attention and global self-attention mechanisms. Finally, the BEV-based motion planning module integrates perception and planning by refining control and trajectory queries through interactions with the scene context in the fused BEV features, generating precise trajectory points and control commands. Extensive experiments on the CARLA Town05 Long benchmark demonstrate that BevDrive achieves state-of-the-art performance. Furthermore, we validate the feasibility of the proposed algorithm on a real-world vehicle platform, confirming its practical applicability and robustness.https://www.mdpi.com/2504-446X/9/4/281deep learningautonomous drivingend-to-end autonomous drivingsensor fusionmotion planning
spellingShingle Ze Yu
Jun Li
Yuzhen Wei
Yuandong Lyu
Xiaojun Tan
Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
Drones
deep learning
autonomous driving
end-to-end autonomous driving
sensor fusion
motion planning
title Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
title_full Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
title_fullStr Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
title_full_unstemmed Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
title_short Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
title_sort combining camera lidar fusion and motion planning using bird s eye view representation for end to end autonomous driving
topic deep learning
autonomous driving
end-to-end autonomous driving
sensor fusion
motion planning
url https://www.mdpi.com/2504-446X/9/4/281
work_keys_str_mv AT zeyu combiningcameralidarfusionandmotionplanningusingbirdseyeviewrepresentationforendtoendautonomousdriving
AT junli combiningcameralidarfusionandmotionplanningusingbirdseyeviewrepresentationforendtoendautonomousdriving
AT yuzhenwei combiningcameralidarfusionandmotionplanningusingbirdseyeviewrepresentationforendtoendautonomousdriving
AT yuandonglyu combiningcameralidarfusionandmotionplanningusingbirdseyeviewrepresentationforendtoendautonomousdriving
AT xiaojuntan combiningcameralidarfusionandmotionplanningusingbirdseyeviewrepresentationforendtoendautonomousdriving