AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction

The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point clo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ziyi Xu, Legan Qi, Hongzhou Du, Jiaqi Yang, Zhenglin Chen
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	3D occupancy prediction point cloud multi-view image multimodal feature alignment cross-attention mechanisms
Online Access:	https://ieeexplore.ieee.org/document/11082274/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point cloud data to construct an occupancy network, thereby improving target detection and representation. The framework introduces two innovative modules: a point-level data alignment module based on geometric transformations and an enhanced fusion module utilizing cross-attention mechanisms. These modules achieve precise point-level alignment and seamless feature fusion between point clouds and RGB images. Experiments on the nuScenes-Occupancy dataset demonstrate that the proposed AlignFusionNet outperforms baseline methods, achieving a significant 15.9% improvement in mIoU and a 4% increase in IoU. Compared to the previous state-of-the-art method, OccGen, mIoU is improved by 5.9%. Further qualitative visualization analysis shows that the proposed method achieves higher representation accuracy for small objects.
ISSN:	2169-3536

AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction

Similar Items