Object-Specific Multiview Classification Through View-Compatible Feature Fusion
Multi-view classification (MVC) typically focuses on categorizing objects into distinct classes by employing multiple perspectives of the same objects. However, in numerous real-world applications, such as industrial inspection and quality control, there is an increasing need to distinguish particul...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/13/4127 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Multi-view classification (MVC) typically focuses on categorizing objects into distinct classes by employing multiple perspectives of the same objects. However, in numerous real-world applications, such as industrial inspection and quality control, there is an increasing need to distinguish particular objects from a pool of similar ones while simultaneously disregarding unknown objects. In these scenarios, relying on a single image may not provide sufficient information to effectively identify the scrutinized object, as different perspectives may reveal distinct characteristics that are essential for accurate classification. Most existing approaches operate within closed-set environments and are focused on generalization, which makes them less effective in distinguishing individual objects from others. This limitations are particularly problematic in industrial quality assessment, where distinguishing between specific objects and discarding unknowns is crucial. To address this challenge, we introduce a View-Compatible Feature Fusion (VCFF) method that utilizes images from predetermined positions as an accurate solution for multi-view classification of specific objects. Unlike other approaches, VCFF explicitly integrates pose information during the fusion process. It does not merely use pose as auxiliary data but employs it to align and selectively fuse features from different views. This mathematically explicit fusion of rotations, based on relative poses, allows VCFF to effectively combine multi-view information, enhancing classification accuracy. Through experimental evaluations, we demonstrate that the proposed VCFF method outperforms state-of-the-art MVC algorithms, especially in open-set scenarios, where the set of possible objects is not fully known in advance. Remarkably, VCFF achieves an average precision of 1.0 using only 8 cameras, whereas existing methods require 20 cameras to reach a maximum of 0.95. In terms of AUC-ROC under the constraint of fewer than 3<inline-formula><math display="inline"><semantics><mi>σ</mi></semantics></math></inline-formula> false positives—a critical metric in industrial inspection—current state-of-the-art methods achieve up to 0.72, while VCFF attains a perfect score of 1.0 with just eight cameras. Furthermore, our approach delivers highly accurate rotation estimation, maintaining an error margin slightly above 2° when sampling at 4° intervals. |
|---|---|
| ISSN: | 1424-8220 |