CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation

Visual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extra...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinyi Xu, Yuhang Ming, Minyang Xu, Yaqi Fan, Yuan Zhang, Wanzeng Kong
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5287
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849327371806572544
author Jinyi Xu
Yuhang Ming
Minyang Xu
Yaqi Fan
Yuan Zhang
Wanzeng Kong
author_facet Jinyi Xu
Yuhang Ming
Minyang Xu
Yaqi Fan
Yuan Zhang
Wanzeng Kong
author_sort Jinyi Xu
collection DOAJ
description Visual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extractors. To address this limitation, we propose CriSALAD, a novel VPR model that integrates visual foundation models (VFMs) and cross-image information to improve feature extraction robustness. Specifically, we adapt pre-trained VFMs for VPR by incorporating a parameter-efficient adapter inspired by Xception, ensuring effective task adaptation while preserving computational efficiency. Additionally, we employ the Sinkhorn Algorithm for Locally Aggregated Descriptors (SALAD) as a global descriptor to enhance place recognition accuracy. Furthermore, we introduce a transformer-like cross-image encoder that facilitates information sharing between neighboring images, thus enhancing feature representations. We evaluate CriSALAD on multiple publicly available place recognition datasets, achieving promising performance with a recall@1 of 89.3% on the Nordland dataset, while the closest rival achieves only 76.2%. CriSALAD outperforms both baseline models and advanced VFM-based VPR approaches.
format Article
id doaj-art-481dbca5cc9f4eff99dd431348c21b21
institution Kabale University
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-481dbca5cc9f4eff99dd431348c21b212025-08-20T03:47:53ZengMDPI AGApplied Sciences2076-34172025-05-011510528710.3390/app15105287CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport AggregationJinyi Xu0Yuhang Ming1Minyang Xu2Yaqi Fan3Yuan Zhang4Wanzeng Kong5School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaVisual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extractors. To address this limitation, we propose CriSALAD, a novel VPR model that integrates visual foundation models (VFMs) and cross-image information to improve feature extraction robustness. Specifically, we adapt pre-trained VFMs for VPR by incorporating a parameter-efficient adapter inspired by Xception, ensuring effective task adaptation while preserving computational efficiency. Additionally, we employ the Sinkhorn Algorithm for Locally Aggregated Descriptors (SALAD) as a global descriptor to enhance place recognition accuracy. Furthermore, we introduce a transformer-like cross-image encoder that facilitates information sharing between neighboring images, thus enhancing feature representations. We evaluate CriSALAD on multiple publicly available place recognition datasets, achieving promising performance with a recall@1 of 89.3% on the Nordland dataset, while the closest rival achieves only 76.2%. CriSALAD outperforms both baseline models and advanced VFM-based VPR approaches.https://www.mdpi.com/2076-3417/15/10/5287visual place recognitionvisual localizationtransfer learningfeature aggregation
spellingShingle Jinyi Xu
Yuhang Ming
Minyang Xu
Yaqi Fan
Yuan Zhang
Wanzeng Kong
CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
Applied Sciences
visual place recognition
visual localization
transfer learning
feature aggregation
title CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
title_full CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
title_fullStr CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
title_full_unstemmed CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
title_short CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
title_sort crisalad robust visual place recognition using cross image information and optimal transport aggregation
topic visual place recognition
visual localization
transfer learning
feature aggregation
url https://www.mdpi.com/2076-3417/15/10/5287
work_keys_str_mv AT jinyixu crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation
AT yuhangming crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation
AT minyangxu crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation
AT yaqifan crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation
AT yuanzhang crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation
AT wanzengkong crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation