TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment

In recent years, multimodal 3D object detection methods have garnered significant attention in autonomous driving systems due to their impressive detection performance. However, most existing research seldom addresses the issues of robustness and performance degradation in dynamic environments due t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yujing Wang, Abdul Hadi Abd Rahman, Fadilla 'Atyka Nor Rashid
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	3D object detection feature alignment multimodal robustness
Online Access:	https://ieeexplore.ieee.org/document/10975058/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850192157508370432
author	Yujing Wang Abdul Hadi Abd Rahman Fadilla 'Atyka Nor Rashid
author_facet	Yujing Wang Abdul Hadi Abd Rahman Fadilla 'Atyka Nor Rashid
author_sort	Yujing Wang
collection	DOAJ
description	In recent years, multimodal 3D object detection methods have garnered significant attention in autonomous driving systems due to their impressive detection performance. However, most existing research seldom addresses the issues of robustness and performance degradation in dynamic environments due to the difficulty of aligning modal features. In this paper, we introduce an innovative efficient fusion method that integrates time series features to improve the accuracy of 3D object detection through multi-sensor fusion, making it more suitable for dynamic and realistic scenarios such as automated driving, and verifying its robustness. The proposed framework incorporates a Temporal Local Self-Fusion Module (TLSFM) in the LiDAR stream to enrich the representation of LiDAR BEV features. To better align BEV features in image streams and point cloud streams, a Cross-Modal Fusion Alignment (CMFA), is introduced. The Temporal Fusion-CMFA (TF-CMFA) framework which contains TLSFM and CMFA module, demonstrates state-of-the-art performance, achieving a mean average precision (mAP) score of 74.4% and a NuScenes detection score (NDS) of 75.7% on the NuScenes benchmark dataset. Performance improvements recorded on the Waymo dataset, with improvements of +2.1 and +2.3 in the ALL-L1 and ALL-L2 metrics compared to VoxelMamba. Finally, robustness experiments demonstrate the performance of proposed approach under sensor failure conditions, demonstrating its exceptional robustness under such conditions.
format	Article
id	doaj-art-443de30e20b448b691dd0d67bcd82db6
institution	OA Journals
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-443de30e20b448b691dd0d67bcd82db62025-08-20T02:14:38ZengIEEEIEEE Access2169-35362025-01-0113748217483210.1109/ACCESS.2025.356348310975058TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal AlignmentYujing Wang0https://orcid.org/0009-0002-6414-8197Abdul Hadi Abd Rahman1https://orcid.org/0000-0002-0261-073XFadilla 'Atyka Nor Rashid2Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, MalaysiaCenter for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, MalaysiaCenter for Artificial Intelligence Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, MalaysiaIn recent years, multimodal 3D object detection methods have garnered significant attention in autonomous driving systems due to their impressive detection performance. However, most existing research seldom addresses the issues of robustness and performance degradation in dynamic environments due to the difficulty of aligning modal features. In this paper, we introduce an innovative efficient fusion method that integrates time series features to improve the accuracy of 3D object detection through multi-sensor fusion, making it more suitable for dynamic and realistic scenarios such as automated driving, and verifying its robustness. The proposed framework incorporates a Temporal Local Self-Fusion Module (TLSFM) in the LiDAR stream to enrich the representation of LiDAR BEV features. To better align BEV features in image streams and point cloud streams, a Cross-Modal Fusion Alignment (CMFA), is introduced. The Temporal Fusion-CMFA (TF-CMFA) framework which contains TLSFM and CMFA module, demonstrates state-of-the-art performance, achieving a mean average precision (mAP) score of 74.4% and a NuScenes detection score (NDS) of 75.7% on the NuScenes benchmark dataset. Performance improvements recorded on the Waymo dataset, with improvements of +2.1 and +2.3 in the ALL-L1 and ALL-L2 metrics compared to VoxelMamba. Finally, robustness experiments demonstrate the performance of proposed approach under sensor failure conditions, demonstrating its exceptional robustness under such conditions.https://ieeexplore.ieee.org/document/10975058/3D object detectionfeature alignmentmultimodalrobustness
spellingShingle	Yujing Wang Abdul Hadi Abd Rahman Fadilla 'Atyka Nor Rashid TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment IEEE Access 3D object detection feature alignment multimodal robustness
title	TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment
title_full	TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment
title_fullStr	TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment
title_full_unstemmed	TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment
title_short	TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment
title_sort	tf cmfa robust multimodal 3d object detection for dynamic environments using temporal fusion and cross modal alignment
topic	3D object detection feature alignment multimodal robustness
url	https://ieeexplore.ieee.org/document/10975058/
work_keys_str_mv	AT yujingwang tfcmfarobustmultimodal3dobjectdetectionfordynamicenvironmentsusingtemporalfusionandcrossmodalalignment AT abdulhadiabdrahman tfcmfarobustmultimodal3dobjectdetectionfordynamicenvironmentsusingtemporalfusionandcrossmodalalignment AT fadillaatykanorrashid tfcmfarobustmultimodal3dobjectdetectionfordynamicenvironmentsusingtemporalfusionandcrossmodalalignment

TF-CMFA: Robust Multimodal 3D Object Detection for Dynamic Environments Using Temporal Fusion and Cross-Modal Alignment

Similar Items