PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-m...

Full description

Saved in:
Bibliographic Details
Main Authors: Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali, Hafiz Husnain Raza Sherazi
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/15/11/739
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850266975089983488
author Husnain Mushtaq
Xiaoheng Deng
Fizza Azhar
Mubashir Ali
Hafiz Husnain Raza Sherazi
author_facet Husnain Mushtaq
Xiaoheng Deng
Fizza Azhar
Mubashir Ali
Hafiz Husnain Raza Sherazi
author_sort Husnain Mushtaq
collection DOAJ
description Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision.
format Article
id doaj-art-da86c2f89e884a07b62ae39b8363f46e
institution OA Journals
issn 2078-2489
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-da86c2f89e884a07b62ae39b8363f46e2025-08-20T01:54:01ZengMDPI AGInformation2078-24892024-11-01151173910.3390/info15110739PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous VehiclesHusnain Mushtaq0Xiaoheng Deng1Fizza Azhar2Mubashir Ali3Hafiz Husnain Raza Sherazi4School of Computer Science and Engineering, Central South University, Changsha 410083, ChinaSchool of Computer Science and Engineering, Central South University, Changsha 410083, ChinaDepartment of Computer Science, University of Chenab, Gujrat 50700, PakistanSchool of Computer Science, University of Birmingham, Birmingham B15 2TT, UKSchool of Computing, Newcastle University, Newcastle Upon Tyne NE4 5TG, UKAccurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision.https://www.mdpi.com/2078-2489/15/11/739LiDAR-camera fusionobject perspective samplingViT feature fusion3D object detectionautonomous vehicles
spellingShingle Husnain Mushtaq
Xiaoheng Deng
Fizza Azhar
Mubashir Ali
Hafiz Husnain Raza Sherazi
PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
Information
LiDAR-camera fusion
object perspective sampling
ViT feature fusion
3D object detection
autonomous vehicles
title PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
title_full PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
title_fullStr PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
title_full_unstemmed PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
title_short PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
title_sort plc fusion perspective based hierarchical and deep lidar camera fusion for 3d object detection in autonomous vehicles
topic LiDAR-camera fusion
object perspective sampling
ViT feature fusion
3D object detection
autonomous vehicles
url https://www.mdpi.com/2078-2489/15/11/739
work_keys_str_mv AT husnainmushtaq plcfusionperspectivebasedhierarchicalanddeeplidarcamerafusionfor3dobjectdetectioninautonomousvehicles
AT xiaohengdeng plcfusionperspectivebasedhierarchicalanddeeplidarcamerafusionfor3dobjectdetectioninautonomousvehicles
AT fizzaazhar plcfusionperspectivebasedhierarchicalanddeeplidarcamerafusionfor3dobjectdetectioninautonomousvehicles
AT mubashirali plcfusionperspectivebasedhierarchicalanddeeplidarcamerafusionfor3dobjectdetectioninautonomousvehicles
AT hafizhusnainrazasherazi plcfusionperspectivebasedhierarchicalanddeeplidarcamerafusionfor3dobjectdetectioninautonomousvehicles