AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction
The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point clo...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11082274/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point cloud data to construct an occupancy network, thereby improving target detection and representation. The framework introduces two innovative modules: a point-level data alignment module based on geometric transformations and an enhanced fusion module utilizing cross-attention mechanisms. These modules achieve precise point-level alignment and seamless feature fusion between point clouds and RGB images. Experiments on the nuScenes-Occupancy dataset demonstrate that the proposed AlignFusionNet outperforms baseline methods, achieving a significant 15.9% improvement in mIoU and a 4% increase in IoU. Compared to the previous state-of-the-art method, OccGen, mIoU is improved by 5.9%. Further qualitative visualization analysis shows that the proposed method achieves higher representation accuracy for small objects. |
|---|---|
| ISSN: | 2169-3536 |