Image region semantic enhancement and symmetric semantic completion for text-to-image person search

Abstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-m...

Full description

Saved in:
Bibliographic Details
Main Authors: Ting Tuo, Lijun Guo, Rong Zhang, Yirui Wang, Jiangbo Qian, Shangce Gao
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-00904-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849399658081681408
author Ting Tuo
Lijun Guo
Rong Zhang
Yirui Wang
Jiangbo Qian
Shangce Gao
author_facet Ting Tuo
Lijun Guo
Rong Zhang
Yirui Wang
Jiangbo Qian
Shangce Gao
author_sort Ting Tuo
collection DOAJ
description Abstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-modal alignment, overlooking the important role that text plays in guiding the learning of intricate details within images, which can lead to missed opportunities for capturing these details. In this paper, we are excited to introduce our proposed method called Image Region Semantic Enhancement and Symmetric Semantic Completion (RE-SSC). Specifically, our approach comprises two main components: Image Region Semantic Enhancement (IRSE) and Symmetric Semantic Completion (SSC). In IRSE, we initially apply superpixel segmentation to partition images into distinct patches based on low-level semantics. Subsequently, we leverage self-supervised consistency learning to transfer high-level semantic information from the global context of the image for local patches, enhancing local patch semantics. Within the SSC component, we have designed a symmetric semantic completion learning process that operates in both textual and visual directions, emphasizing global as well as local token learning to achieve effective alignment across modalities. We evaluated our method on three public datasets and are pleased to report competitive performance in addressing text-to-image pedestrian searches.
format Article
id doaj-art-b4b035a3ac7b470ebd6de7898c2e100b
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b4b035a3ac7b470ebd6de7898c2e100b2025-08-20T03:38:16ZengNature PortfolioScientific Reports2045-23222025-07-0115111610.1038/s41598-025-00904-8Image region semantic enhancement and symmetric semantic completion for text-to-image person searchTing Tuo0Lijun Guo1Rong Zhang2Yirui Wang3Jiangbo Qian4Shangce Gao5Faculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Engineering, University of ToyamaAbstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-modal alignment, overlooking the important role that text plays in guiding the learning of intricate details within images, which can lead to missed opportunities for capturing these details. In this paper, we are excited to introduce our proposed method called Image Region Semantic Enhancement and Symmetric Semantic Completion (RE-SSC). Specifically, our approach comprises two main components: Image Region Semantic Enhancement (IRSE) and Symmetric Semantic Completion (SSC). In IRSE, we initially apply superpixel segmentation to partition images into distinct patches based on low-level semantics. Subsequently, we leverage self-supervised consistency learning to transfer high-level semantic information from the global context of the image for local patches, enhancing local patch semantics. Within the SSC component, we have designed a symmetric semantic completion learning process that operates in both textual and visual directions, emphasizing global as well as local token learning to achieve effective alignment across modalities. We evaluated our method on three public datasets and are pleased to report competitive performance in addressing text-to-image pedestrian searches.https://doi.org/10.1038/s41598-025-00904-8
spellingShingle Ting Tuo
Lijun Guo
Rong Zhang
Yirui Wang
Jiangbo Qian
Shangce Gao
Image region semantic enhancement and symmetric semantic completion for text-to-image person search
Scientific Reports
title Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_full Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_fullStr Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_full_unstemmed Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_short Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_sort image region semantic enhancement and symmetric semantic completion for text to image person search
url https://doi.org/10.1038/s41598-025-00904-8
work_keys_str_mv AT tingtuo imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch
AT lijunguo imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch
AT rongzhang imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch
AT yiruiwang imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch
AT jiangboqian imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch
AT shangcegao imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch