MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion

3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection perfo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Peicheng Shi, Wenchao Wu, Aixi Yang
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Algorithms
Subjects:	3D object detection autonomous driving image point cloud multi-modal fusion
Online Access:	https://www.mdpi.com/1999-4893/18/3/172
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850205030914719744
author	Peicheng Shi Wenchao Wu Aixi Yang
author_facet	Peicheng Shi Wenchao Wu Aixi Yang
author_sort	Peicheng Shi
collection	DOAJ
description	3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection performance. However, existing camera-LiDAR intermediate fusion methods suffer from insufficient interaction between local and global features and limited fine-grained feature extraction capabilities, which results in inadequate small object detection and unstable performance in complex scenes. To address these issues, the multi-modal 3D object detection algorithm with pointwise and voxelwise fusion (MPVF) is proposed, which enhances multi-modal feature interaction and optimizes feature extraction strategies to improve detection precision and robustness. First, the pointwise and voxelwise fusion (PVWF) module is proposed to combine local features from the pointwise fusion (PWF) module with global features from the voxelwise fusion (VWF) module, enhancing the interaction between features across modalities, improving small object detection capabilities, and boosting model performance in complex scenes. Second, an expressive feature extraction module, improved ResNet-101 and feature pyramid (IRFP), is developed, comprising the improved ResNet-101 (IR) and feature pyramid (FP) modules. The IR module uses a group convolution strategy to inject high-level semantic features into the PWF and VWF modules, improving extraction efficiency. The FP module, placed at an intermediate stage, captures fine-grained features at various resolutions, enhancing the model’s precision and robustness. Finally, evaluation on the KITTI dataset demonstrates a mean Average Precision (mAP) of 69.24%, a 2.75% improvement over GraphAlign++. Detection accuracy for cars, pedestrians, and cyclists reaches 85.12%, 48.61%, and 70.12%, respectively, with the proposed method excelling in pedestrian and cyclist detection.
format	Article
id	doaj-art-440d2c875a4143679f1d13ddd0dd810f
institution	OA Journals
issn	1999-4893
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Algorithms
spelling	doaj-art-440d2c875a4143679f1d13ddd0dd810f2025-08-20T02:11:11ZengMDPI AGAlgorithms1999-48932025-03-0118317210.3390/a18030172MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise FusionPeicheng Shi0Wenchao Wu1Aixi Yang2School of Mechanical and Automotive Engineering, Anhui Polytechnic University, Wuhu 241000, ChinaSchool of Mechanical and Automotive Engineering, Anhui Polytechnic University, Wuhu 241000, ChinaPolytechnic Institute, Zhejiang University, Hangzhou 310015, China3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection performance. However, existing camera-LiDAR intermediate fusion methods suffer from insufficient interaction between local and global features and limited fine-grained feature extraction capabilities, which results in inadequate small object detection and unstable performance in complex scenes. To address these issues, the multi-modal 3D object detection algorithm with pointwise and voxelwise fusion (MPVF) is proposed, which enhances multi-modal feature interaction and optimizes feature extraction strategies to improve detection precision and robustness. First, the pointwise and voxelwise fusion (PVWF) module is proposed to combine local features from the pointwise fusion (PWF) module with global features from the voxelwise fusion (VWF) module, enhancing the interaction between features across modalities, improving small object detection capabilities, and boosting model performance in complex scenes. Second, an expressive feature extraction module, improved ResNet-101 and feature pyramid (IRFP), is developed, comprising the improved ResNet-101 (IR) and feature pyramid (FP) modules. The IR module uses a group convolution strategy to inject high-level semantic features into the PWF and VWF modules, improving extraction efficiency. The FP module, placed at an intermediate stage, captures fine-grained features at various resolutions, enhancing the model’s precision and robustness. Finally, evaluation on the KITTI dataset demonstrates a mean Average Precision (mAP) of 69.24%, a 2.75% improvement over GraphAlign++. Detection accuracy for cars, pedestrians, and cyclists reaches 85.12%, 48.61%, and 70.12%, respectively, with the proposed method excelling in pedestrian and cyclist detection.https://www.mdpi.com/1999-4893/18/3/1723D object detectionautonomous drivingimagepoint cloudmulti-modal fusion
spellingShingle	Peicheng Shi Wenchao Wu Aixi Yang MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion Algorithms 3D object detection autonomous driving image point cloud multi-modal fusion
title	MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_full	MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_fullStr	MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_full_unstemmed	MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_short	MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_sort	mpvf multi modal 3d object detection algorithm with pointwise and voxelwise fusion
topic	3D object detection autonomous driving image point cloud multi-modal fusion
url	https://www.mdpi.com/1999-4893/18/3/172
work_keys_str_mv	AT peichengshi mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion AT wenchaowu mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion AT aixiyang mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion

MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion

Similar Items