AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction

The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point clo...

Full description

Saved in:
Bibliographic Details
Main Authors: Ziyi Xu, Legan Qi, Hongzhou Du, Jiaqi Yang, Zhenglin Chen
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11082274/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point cloud data to construct an occupancy network, thereby improving target detection and representation. The framework introduces two innovative modules: a point-level data alignment module based on geometric transformations and an enhanced fusion module utilizing cross-attention mechanisms. These modules achieve precise point-level alignment and seamless feature fusion between point clouds and RGB images. Experiments on the nuScenes-Occupancy dataset demonstrate that the proposed AlignFusionNet outperforms baseline methods, achieving a significant 15.9% improvement in mIoU and a 4% increase in IoU. Compared to the previous state-of-the-art method, OccGen, mIoU is improved by 5.9%. Further qualitative visualization analysis shows that the proposed method achieves higher representation accuracy for small objects.
ISSN:2169-3536