Image region semantic enhancement and symmetric semantic completion for text-to-image person search

Abstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-m...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ting Tuo, Lijun Guo, Rong Zhang, Yirui Wang, Jiangbo Qian, Shangce Gao
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-025-00904-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849399658081681408
author	Ting Tuo Lijun Guo Rong Zhang Yirui Wang Jiangbo Qian Shangce Gao
author_facet	Ting Tuo Lijun Guo Rong Zhang Yirui Wang Jiangbo Qian Shangce Gao
author_sort	Ting Tuo
collection	DOAJ
description	Abstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-modal alignment, overlooking the important role that text plays in guiding the learning of intricate details within images, which can lead to missed opportunities for capturing these details. In this paper, we are excited to introduce our proposed method called Image Region Semantic Enhancement and Symmetric Semantic Completion (RE-SSC). Specifically, our approach comprises two main components: Image Region Semantic Enhancement (IRSE) and Symmetric Semantic Completion (SSC). In IRSE, we initially apply superpixel segmentation to partition images into distinct patches based on low-level semantics. Subsequently, we leverage self-supervised consistency learning to transfer high-level semantic information from the global context of the image for local patches, enhancing local patch semantics. Within the SSC component, we have designed a symmetric semantic completion learning process that operates in both textual and visual directions, emphasizing global as well as local token learning to achieve effective alignment across modalities. We evaluated our method on three public datasets and are pleased to report competitive performance in addressing text-to-image pedestrian searches.
format	Article
id	doaj-art-b4b035a3ac7b470ebd6de7898c2e100b
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-b4b035a3ac7b470ebd6de7898c2e100b2025-08-20T03:38:16ZengNature PortfolioScientific Reports2045-23222025-07-0115111610.1038/s41598-025-00904-8Image region semantic enhancement and symmetric semantic completion for text-to-image person searchTing Tuo0Lijun Guo1Rong Zhang2Yirui Wang3Jiangbo Qian4Shangce Gao5Faculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Electrical Engineering and Computer Science Ningbo, Ningbo UniversityFaculty of Engineering, University of ToyamaAbstract Mask learning has emerged as a promising approach for Text-to-Image Person Search (TIPS), yet it faces two key challenges: (1) There tends to be semantic inconsistency between image regions and text phrases. (2) Current approaches primarily focus on masking text tokens to facilitate cross-modal alignment, overlooking the important role that text plays in guiding the learning of intricate details within images, which can lead to missed opportunities for capturing these details. In this paper, we are excited to introduce our proposed method called Image Region Semantic Enhancement and Symmetric Semantic Completion (RE-SSC). Specifically, our approach comprises two main components: Image Region Semantic Enhancement (IRSE) and Symmetric Semantic Completion (SSC). In IRSE, we initially apply superpixel segmentation to partition images into distinct patches based on low-level semantics. Subsequently, we leverage self-supervised consistency learning to transfer high-level semantic information from the global context of the image for local patches, enhancing local patch semantics. Within the SSC component, we have designed a symmetric semantic completion learning process that operates in both textual and visual directions, emphasizing global as well as local token learning to achieve effective alignment across modalities. We evaluated our method on three public datasets and are pleased to report competitive performance in addressing text-to-image pedestrian searches.https://doi.org/10.1038/s41598-025-00904-8
spellingShingle	Ting Tuo Lijun Guo Rong Zhang Yirui Wang Jiangbo Qian Shangce Gao Image region semantic enhancement and symmetric semantic completion for text-to-image person search Scientific Reports
title	Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_full	Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_fullStr	Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_full_unstemmed	Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_short	Image region semantic enhancement and symmetric semantic completion for text-to-image person search
title_sort	image region semantic enhancement and symmetric semantic completion for text to image person search
url	https://doi.org/10.1038/s41598-025-00904-8
work_keys_str_mv	AT tingtuo imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT lijunguo imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT rongzhang imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT yiruiwang imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT jiangboqian imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch AT shangcegao imageregionsemanticenhancementandsymmetricsemanticcompletionfortexttoimagepersonsearch

Image region semantic enhancement and symmetric semantic completion for text-to-image person search

Similar Items