Combining Camera–LiDAR Fusion and Motion Planning Using Bird’s-Eye View Representation for End-to-End Autonomous Driving
End-to-end autonomous driving has become a key research focus in autonomous vehicles. However, existing methods struggle with effectively fusing heterogeneous sensor inputs and converting dense perceptual features into sparse motion representations. To address these challenges, we propose BevDrive,...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Drones |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-446X/9/4/281 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | End-to-end autonomous driving has become a key research focus in autonomous vehicles. However, existing methods struggle with effectively fusing heterogeneous sensor inputs and converting dense perceptual features into sparse motion representations. To address these challenges, we propose BevDrive, a novel end-to-end autonomous driving framework that unifies camera–LiDAR fusion and motion planning through a bird’s-eye view (BEV) representation. BevDrive consists of three core modules: the bidirectionally guided BEV feature construction module, the dual-attention BEV feature fusion module, and the BEV-based motion planning module. The bidirectionally guided BEV feature construction module comprises two branches: depth-guided image BEV feature construction and image-guided LiDAR BEV feature construction. Depth-guided image BEV feature construction employs a lifting and projection approach guided by depth information from LiDAR, transforming image features into a BEV representation. Meanwhile, image-guided LiDAR BEV feature construction enriches sparse LiDAR BEV features by integrating complementary information from the images. Then, the dual-attention BEV feature fusion module combines multi-modal BEV features at both local and global levels using a hybrid approach of window self-attention and global self-attention mechanisms. Finally, the BEV-based motion planning module integrates perception and planning by refining control and trajectory queries through interactions with the scene context in the fused BEV features, generating precise trajectory points and control commands. Extensive experiments on the CARLA Town05 Long benchmark demonstrate that BevDrive achieves state-of-the-art performance. Furthermore, we validate the feasibility of the proposed algorithm on a real-world vehicle platform, confirming its practical applicability and robustness. |
|---|---|
| ISSN: | 2504-446X |