MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion

3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection perfo...

Full description

Saved in:
Bibliographic Details
Main Authors: Peicheng Shi, Wenchao Wu, Aixi Yang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/3/172
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850205030914719744
author Peicheng Shi
Wenchao Wu
Aixi Yang
author_facet Peicheng Shi
Wenchao Wu
Aixi Yang
author_sort Peicheng Shi
collection DOAJ
description 3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection performance. However, existing camera-LiDAR intermediate fusion methods suffer from insufficient interaction between local and global features and limited fine-grained feature extraction capabilities, which results in inadequate small object detection and unstable performance in complex scenes. To address these issues, the multi-modal 3D object detection algorithm with pointwise and voxelwise fusion (MPVF) is proposed, which enhances multi-modal feature interaction and optimizes feature extraction strategies to improve detection precision and robustness. First, the pointwise and voxelwise fusion (PVWF) module is proposed to combine local features from the pointwise fusion (PWF) module with global features from the voxelwise fusion (VWF) module, enhancing the interaction between features across modalities, improving small object detection capabilities, and boosting model performance in complex scenes. Second, an expressive feature extraction module, improved ResNet-101 and feature pyramid (IRFP), is developed, comprising the improved ResNet-101 (IR) and feature pyramid (FP) modules. The IR module uses a group convolution strategy to inject high-level semantic features into the PWF and VWF modules, improving extraction efficiency. The FP module, placed at an intermediate stage, captures fine-grained features at various resolutions, enhancing the model’s precision and robustness. Finally, evaluation on the KITTI dataset demonstrates a mean Average Precision (mAP) of 69.24%, a 2.75% improvement over GraphAlign++. Detection accuracy for cars, pedestrians, and cyclists reaches 85.12%, 48.61%, and 70.12%, respectively, with the proposed method excelling in pedestrian and cyclist detection.
format Article
id doaj-art-440d2c875a4143679f1d13ddd0dd810f
institution OA Journals
issn 1999-4893
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj-art-440d2c875a4143679f1d13ddd0dd810f2025-08-20T02:11:11ZengMDPI AGAlgorithms1999-48932025-03-0118317210.3390/a18030172MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise FusionPeicheng Shi0Wenchao Wu1Aixi Yang2School of Mechanical and Automotive Engineering, Anhui Polytechnic University, Wuhu 241000, ChinaSchool of Mechanical and Automotive Engineering, Anhui Polytechnic University, Wuhu 241000, ChinaPolytechnic Institute, Zhejiang University, Hangzhou 310015, China3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection performance. However, existing camera-LiDAR intermediate fusion methods suffer from insufficient interaction between local and global features and limited fine-grained feature extraction capabilities, which results in inadequate small object detection and unstable performance in complex scenes. To address these issues, the multi-modal 3D object detection algorithm with pointwise and voxelwise fusion (MPVF) is proposed, which enhances multi-modal feature interaction and optimizes feature extraction strategies to improve detection precision and robustness. First, the pointwise and voxelwise fusion (PVWF) module is proposed to combine local features from the pointwise fusion (PWF) module with global features from the voxelwise fusion (VWF) module, enhancing the interaction between features across modalities, improving small object detection capabilities, and boosting model performance in complex scenes. Second, an expressive feature extraction module, improved ResNet-101 and feature pyramid (IRFP), is developed, comprising the improved ResNet-101 (IR) and feature pyramid (FP) modules. The IR module uses a group convolution strategy to inject high-level semantic features into the PWF and VWF modules, improving extraction efficiency. The FP module, placed at an intermediate stage, captures fine-grained features at various resolutions, enhancing the model’s precision and robustness. Finally, evaluation on the KITTI dataset demonstrates a mean Average Precision (mAP) of 69.24%, a 2.75% improvement over GraphAlign++. Detection accuracy for cars, pedestrians, and cyclists reaches 85.12%, 48.61%, and 70.12%, respectively, with the proposed method excelling in pedestrian and cyclist detection.https://www.mdpi.com/1999-4893/18/3/1723D object detectionautonomous drivingimagepoint cloudmulti-modal fusion
spellingShingle Peicheng Shi
Wenchao Wu
Aixi Yang
MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
Algorithms
3D object detection
autonomous driving
image
point cloud
multi-modal fusion
title MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_full MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_fullStr MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_full_unstemmed MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_short MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
title_sort mpvf multi modal 3d object detection algorithm with pointwise and voxelwise fusion
topic 3D object detection
autonomous driving
image
point cloud
multi-modal fusion
url https://www.mdpi.com/1999-4893/18/3/172
work_keys_str_mv AT peichengshi mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion
AT wenchaowu mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion
AT aixiyang mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion