LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion

Abstract In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spa...

Full description

Saved in:
Bibliographic Details
Main Authors: Xintong Hu, Peishun Liu, Xuefang Wang, Peiyao Wu, Ruichun Tang
Format: Article
Language:English
Published: SpringerOpen 2025-04-01
Series:Visual Computing for Industry, Biomedicine, and Art
Subjects:
Online Access:https://doi.org/10.1186/s42492-025-00190-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850181854352637952
author Xintong Hu
Peishun Liu
Xuefang Wang
Peiyao Wu
Ruichun Tang
author_facet Xintong Hu
Peishun Liu
Xuefang Wang
Peiyao Wu
Ruichun Tang
author_sort Xintong Hu
collection DOAJ
description Abstract In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method–LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.
format Article
id doaj-art-082c2f3766e24bffba08808d4efec489
institution OA Journals
issn 2524-4442
language English
publishDate 2025-04-01
publisher SpringerOpen
record_format Article
series Visual Computing for Industry, Biomedicine, and Art
spelling doaj-art-082c2f3766e24bffba08808d4efec4892025-08-20T02:17:49ZengSpringerOpenVisual Computing for Industry, Biomedicine, and Art2524-44422025-04-018111510.1186/s42492-025-00190-1LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusionXintong Hu0Peishun Liu1Xuefang Wang2Peiyao Wu3Ruichun Tang4Faculty of Information Science and Engineering, Ocean University of ChinaFaculty of Information Science and Engineering, Ocean University of ChinaSchool of Mathematical Sciences, Ocean University of ChinaFaculty of Information Science and Engineering, Ocean University of ChinaFaculty of Information Science and Engineering, Ocean University of ChinaAbstract In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method–LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.https://doi.org/10.1186/s42492-025-00190-1Domain generalizationPerson re-identificationFeature fusionSemantic representationDual-branch network architecture
spellingShingle Xintong Hu
Peishun Liu
Xuefang Wang
Peiyao Wu
Ruichun Tang
LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion
Visual Computing for Industry, Biomedicine, and Art
Domain generalization
Person re-identification
Feature fusion
Semantic representation
Dual-branch network architecture
title LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion
title_full LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion
title_fullStr LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion
title_full_unstemmed LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion
title_short LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion
title_sort lvit net a domain generalization person re identification model combining local semantics and multi feature cross fusion
topic Domain generalization
Person re-identification
Feature fusion
Semantic representation
Dual-branch network architecture
url https://doi.org/10.1186/s42492-025-00190-1
work_keys_str_mv AT xintonghu lvitnetadomaingeneralizationpersonreidentificationmodelcombininglocalsemanticsandmultifeaturecrossfusion
AT peishunliu lvitnetadomaingeneralizationpersonreidentificationmodelcombininglocalsemanticsandmultifeaturecrossfusion
AT xuefangwang lvitnetadomaingeneralizationpersonreidentificationmodelcombininglocalsemanticsandmultifeaturecrossfusion
AT peiyaowu lvitnetadomaingeneralizationpersonreidentificationmodelcombininglocalsemanticsandmultifeaturecrossfusion
AT ruichuntang lvitnetadomaingeneralizationpersonreidentificationmodelcombininglocalsemanticsandmultifeaturecrossfusion