Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection

The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-worl...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hengsong Liu, Tongle Duan
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	3D object detection multi-sensor fusion zero-shot learning autonomous driving
Online Access:	https://www.mdpi.com/1424-8220/25/2/553
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587496231272448
author	Hengsong Liu Tongle Duan
author_facet	Hengsong Liu Tongle Duan
author_sort	Hengsong Liu
collection	DOAJ
description	The multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential. Therefore, this paper aims to address two key issues in this area: how to localize and classify novel objects. We propose Cross-modal Collaboration and Robust Feature Classifier to improve localization accuracy and classification robustness for novel objects. The Cross-modal Collaboration involves the collaborative localization between LiDAR and camera. In this approach, 2D images provide preliminary regions of interest for novel objects in the 3D point cloud, while the 3D point cloud offers more precise positional information to the 2D images. Through iterative updates between two modalities, the preliminary region and positional information are refined, achieving the accurate localization of novel objects. The Robust Feature Classifier aims to accurately classify novel objects. To prevent them from being misidentified as background or other incorrect categories, this method maps the semantic vectors of new categories into multiple sets of visual features distinguished from the background. And it clusters these visual features based on each individual semantic vector to maintain inter-class separability. Our method achieves state-of-the-art performance on various scenarios and datasets.
format	Article
id	doaj-art-5b907241b0ed40c89ebd3aafefb8078a
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-5b907241b0ed40c89ebd3aafefb8078a2025-01-24T13:49:19ZengMDPI AGSensors1424-82202025-01-0125255310.3390/s25020553Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object DetectionHengsong Liu0Tongle Duan1The 54th Research Institute, China Electronics Technology Group Corporation, College of Signal and Information Processing, Shijiazhuang 050081, ChinaThe 54th Research Institute, China Electronics Technology Group Corporation, College of Signal and Information Processing, Shijiazhuang 050081, ChinaThe multi-sensor fusion, such as LiDAR and camera-based 3D object detection, is a key technology in autonomous driving and robotics. However, traditional 3D detection models are limited to recognizing predefined categories and struggle with unknown or novel objects. Given the complexity of real-world environments, research into open-vocabulary 3D object detection is essential. Therefore, this paper aims to address two key issues in this area: how to localize and classify novel objects. We propose Cross-modal Collaboration and Robust Feature Classifier to improve localization accuracy and classification robustness for novel objects. The Cross-modal Collaboration involves the collaborative localization between LiDAR and camera. In this approach, 2D images provide preliminary regions of interest for novel objects in the 3D point cloud, while the 3D point cloud offers more precise positional information to the 2D images. Through iterative updates between two modalities, the preliminary region and positional information are refined, achieving the accurate localization of novel objects. The Robust Feature Classifier aims to accurately classify novel objects. To prevent them from being misidentified as background or other incorrect categories, this method maps the semantic vectors of new categories into multiple sets of visual features distinguished from the background. And it clusters these visual features based on each individual semantic vector to maintain inter-class separability. Our method achieves state-of-the-art performance on various scenarios and datasets.https://www.mdpi.com/1424-8220/25/2/5533D object detectionmulti-sensor fusionzero-shot learningautonomous driving
spellingShingle	Hengsong Liu Tongle Duan Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection Sensors 3D object detection multi-sensor fusion zero-shot learning autonomous driving
title	Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
title_full	Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
title_fullStr	Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
title_full_unstemmed	Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
title_short	Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
title_sort	cross modal collaboration and robust feature classifier for open vocabulary 3d object detection
topic	3D object detection multi-sensor fusion zero-shot learning autonomous driving
url	https://www.mdpi.com/1424-8220/25/2/553
work_keys_str_mv	AT hengsongliu crossmodalcollaborationandrobustfeatureclassifierforopenvocabulary3dobjectdetection AT tongleduan crossmodalcollaborationandrobustfeatureclassifierforopenvocabulary3dobjectdetection

Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection

Similar Items