MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion
3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection perfo...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Algorithms |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-4893/18/3/172 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850205030914719744 |
|---|---|
| author | Peicheng Shi Wenchao Wu Aixi Yang |
| author_facet | Peicheng Shi Wenchao Wu Aixi Yang |
| author_sort | Peicheng Shi |
| collection | DOAJ |
| description | 3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection performance. However, existing camera-LiDAR intermediate fusion methods suffer from insufficient interaction between local and global features and limited fine-grained feature extraction capabilities, which results in inadequate small object detection and unstable performance in complex scenes. To address these issues, the multi-modal 3D object detection algorithm with pointwise and voxelwise fusion (MPVF) is proposed, which enhances multi-modal feature interaction and optimizes feature extraction strategies to improve detection precision and robustness. First, the pointwise and voxelwise fusion (PVWF) module is proposed to combine local features from the pointwise fusion (PWF) module with global features from the voxelwise fusion (VWF) module, enhancing the interaction between features across modalities, improving small object detection capabilities, and boosting model performance in complex scenes. Second, an expressive feature extraction module, improved ResNet-101 and feature pyramid (IRFP), is developed, comprising the improved ResNet-101 (IR) and feature pyramid (FP) modules. The IR module uses a group convolution strategy to inject high-level semantic features into the PWF and VWF modules, improving extraction efficiency. The FP module, placed at an intermediate stage, captures fine-grained features at various resolutions, enhancing the model’s precision and robustness. Finally, evaluation on the KITTI dataset demonstrates a mean Average Precision (mAP) of 69.24%, a 2.75% improvement over GraphAlign++. Detection accuracy for cars, pedestrians, and cyclists reaches 85.12%, 48.61%, and 70.12%, respectively, with the proposed method excelling in pedestrian and cyclist detection. |
| format | Article |
| id | doaj-art-440d2c875a4143679f1d13ddd0dd810f |
| institution | OA Journals |
| issn | 1999-4893 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Algorithms |
| spelling | doaj-art-440d2c875a4143679f1d13ddd0dd810f2025-08-20T02:11:11ZengMDPI AGAlgorithms1999-48932025-03-0118317210.3390/a18030172MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise FusionPeicheng Shi0Wenchao Wu1Aixi Yang2School of Mechanical and Automotive Engineering, Anhui Polytechnic University, Wuhu 241000, ChinaSchool of Mechanical and Automotive Engineering, Anhui Polytechnic University, Wuhu 241000, ChinaPolytechnic Institute, Zhejiang University, Hangzhou 310015, China3D object detection plays a pivotal role in achieving accurate environmental perception, particularly in complex traffic scenarios where single-modal detection methods often fail to meet precision requirements. This highlights the necessity of multi-modal fusion approaches to enhance detection performance. However, existing camera-LiDAR intermediate fusion methods suffer from insufficient interaction between local and global features and limited fine-grained feature extraction capabilities, which results in inadequate small object detection and unstable performance in complex scenes. To address these issues, the multi-modal 3D object detection algorithm with pointwise and voxelwise fusion (MPVF) is proposed, which enhances multi-modal feature interaction and optimizes feature extraction strategies to improve detection precision and robustness. First, the pointwise and voxelwise fusion (PVWF) module is proposed to combine local features from the pointwise fusion (PWF) module with global features from the voxelwise fusion (VWF) module, enhancing the interaction between features across modalities, improving small object detection capabilities, and boosting model performance in complex scenes. Second, an expressive feature extraction module, improved ResNet-101 and feature pyramid (IRFP), is developed, comprising the improved ResNet-101 (IR) and feature pyramid (FP) modules. The IR module uses a group convolution strategy to inject high-level semantic features into the PWF and VWF modules, improving extraction efficiency. The FP module, placed at an intermediate stage, captures fine-grained features at various resolutions, enhancing the model’s precision and robustness. Finally, evaluation on the KITTI dataset demonstrates a mean Average Precision (mAP) of 69.24%, a 2.75% improvement over GraphAlign++. Detection accuracy for cars, pedestrians, and cyclists reaches 85.12%, 48.61%, and 70.12%, respectively, with the proposed method excelling in pedestrian and cyclist detection.https://www.mdpi.com/1999-4893/18/3/1723D object detectionautonomous drivingimagepoint cloudmulti-modal fusion |
| spellingShingle | Peicheng Shi Wenchao Wu Aixi Yang MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion Algorithms 3D object detection autonomous driving image point cloud multi-modal fusion |
| title | MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion |
| title_full | MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion |
| title_fullStr | MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion |
| title_full_unstemmed | MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion |
| title_short | MPVF: Multi-Modal 3D Object Detection Algorithm with Pointwise and Voxelwise Fusion |
| title_sort | mpvf multi modal 3d object detection algorithm with pointwise and voxelwise fusion |
| topic | 3D object detection autonomous driving image point cloud multi-modal fusion |
| url | https://www.mdpi.com/1999-4893/18/3/172 |
| work_keys_str_mv | AT peichengshi mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion AT wenchaowu mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion AT aixiyang mpvfmultimodal3dobjectdetectionalgorithmwithpointwiseandvoxelwisefusion |