Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification

Attribute-image person re-identification (AIPR) is a meaningful and challenging task to retrieve images based on attribute descriptions. In this paper, we propose a regularized dual modal meta metric learning (RDM3L) method for AIPR, which employs meta-learning training methods to enhance the transf...

Full description

Saved in:
Bibliographic Details
Main Authors: Xianri Xu, Rongxian Xu
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10777022/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850126542482440192
author Xianri Xu
Rongxian Xu
author_facet Xianri Xu
Rongxian Xu
author_sort Xianri Xu
collection DOAJ
description Attribute-image person re-identification (AIPR) is a meaningful and challenging task to retrieve images based on attribute descriptions. In this paper, we propose a regularized dual modal meta metric learning (RDM3L) method for AIPR, which employs meta-learning training methods to enhance the transformer’s capacity to acquire latent knowledge. During training, data are initially divided into a single-modal support set with images and a dual-modal query set containing both attributes and images. The RDM3L method introduces an attribute-image transformer (AIT) as a novel feature extraction backbone, extending the visual transformer concept. Utilizing the concept of hard sample mining, the method designs attribute-image cross-modal meta metrics and image-image intra-modal meta metrics. The triple loss function based on meta-metrics is then applied to converge the same category samples and diverge different categories, thereby enhancing cross-modal and intramodal discrimination abilities. Finally, a regularization term is used to aggregate samples of different modalities in the query set to prevent overfitting, ensuring that RDM3L maintains the model’s generalization ability while aligning the two modalities and identifying unseen classes. Experimental results on the PETA and Market-1501 attribute datasets demonstrate the superiority of the RDM3L method, achieving mean average precision (mAP) scores of 36.7% on the Market-1501 Attributes dataset and 60.6% on the PETA dataset.
format Article
id doaj-art-799a73f17ae04e53b98e1a63b46ae8e2
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-799a73f17ae04e53b98e1a63b46ae8e22025-08-20T02:33:54ZengIEEEIEEE Access2169-35362024-01-011218334418335310.1109/ACCESS.2024.351103410777022Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-IdentificationXianri Xu0https://orcid.org/0009-0000-0367-8077Rongxian Xu1https://orcid.org/0009-0001-6670-8242College of Information Engineering, Fujian Business University, Fuzhou, ChinaCollege of Engineering, Huaqiao University, Quanzhou, ChinaAttribute-image person re-identification (AIPR) is a meaningful and challenging task to retrieve images based on attribute descriptions. In this paper, we propose a regularized dual modal meta metric learning (RDM3L) method for AIPR, which employs meta-learning training methods to enhance the transformer’s capacity to acquire latent knowledge. During training, data are initially divided into a single-modal support set with images and a dual-modal query set containing both attributes and images. The RDM3L method introduces an attribute-image transformer (AIT) as a novel feature extraction backbone, extending the visual transformer concept. Utilizing the concept of hard sample mining, the method designs attribute-image cross-modal meta metrics and image-image intra-modal meta metrics. The triple loss function based on meta-metrics is then applied to converge the same category samples and diverge different categories, thereby enhancing cross-modal and intramodal discrimination abilities. Finally, a regularization term is used to aggregate samples of different modalities in the query set to prevent overfitting, ensuring that RDM3L maintains the model’s generalization ability while aligning the two modalities and identifying unseen classes. Experimental results on the PETA and Market-1501 attribute datasets demonstrate the superiority of the RDM3L method, achieving mean average precision (mAP) scores of 36.7% on the Market-1501 Attributes dataset and 60.6% on the PETA dataset.https://ieeexplore.ieee.org/document/10777022/Transformermeta learningcross-modelmetric learningperson retrieval
spellingShingle Xianri Xu
Rongxian Xu
Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
IEEE Access
Transformer
meta learning
cross-model
metric learning
person retrieval
title Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
title_full Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
title_fullStr Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
title_full_unstemmed Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
title_short Transformer With Regularized Dual Modal Meta Metric Learning for Attribute-Image Person Re-Identification
title_sort transformer with regularized dual modal meta metric learning for attribute image person re identification
topic Transformer
meta learning
cross-model
metric learning
person retrieval
url https://ieeexplore.ieee.org/document/10777022/
work_keys_str_mv AT xianrixu transformerwithregularizeddualmodalmetametriclearningforattributeimagepersonreidentification
AT rongxianxu transformerwithregularizeddualmodalmetametriclearningforattributeimagepersonreidentification