Toward general object search in open reality

Abstract Real-world scenarios are inherently dynamic and open-ended, necessitating that current deep models adapt to general objects in open realities to be practically useful. In this paper, we extend a valuable computer vision task called General Object Search in Open Reality (GOSO). The main obje...

Full description

Saved in:
Bibliographic Details
Main Authors: Gang Shen, Wenjun Ma, Guangyao Chen, Yonghong Tian
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-97251-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Real-world scenarios are inherently dynamic and open-ended, necessitating that current deep models adapt to general objects in open realities to be practically useful. In this paper, we extend a valuable computer vision task called General Object Search in Open Reality (GOSO). The main objective of GOSO is to determine whether an object from the open world appears in another gallery image, even when composed of arbitrary entities and backgrounds. However, two significant challenges arise: the high scale variance among different instances of the same entity and the vast openness with an ever-expanding set of unknown categories in the open world. To address these issues, we formalize the GOSO problem and propose a simple yet effective architecture named Siamese Exchanged Attention Network (SEA-Net). Specifically, based on a standard siamese structure, SEA-Net introduces a novel branch that comprises multiple stage-stacked Siamese Exchanged Attention (SEA) layers followed by a Hierarchical Feature Fusion (HFF) module, enabling efficient scale adaptation and the extraction of matching-friendly deep features. Moreover, an Open Score Fusion (OSF) module is integrated into SEA-Net during inference to yield a more robust matching score in open-world scenarios. We construct two new evaluation benchmarks suitable for the GOSO task using the existing COCO and LVIS datasets, and extensive experiments consistently demonstrate the effectiveness of the proposed method.
ISSN:2045-2322