CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation

Visual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extra...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jinyi Xu, Yuhang Ming, Minyang Xu, Yaqi Fan, Yuan Zhang, Wanzeng Kong
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Applied Sciences
Subjects:	visual place recognition visual localization transfer learning feature aggregation
Online Access:	https://www.mdpi.com/2076-3417/15/10/5287
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Visual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extractors. To address this limitation, we propose CriSALAD, a novel VPR model that integrates visual foundation models (VFMs) and cross-image information to improve feature extraction robustness. Specifically, we adapt pre-trained VFMs for VPR by incorporating a parameter-efficient adapter inspired by Xception, ensuring effective task adaptation while preserving computational efficiency. Additionally, we employ the Sinkhorn Algorithm for Locally Aggregated Descriptors (SALAD) as a global descriptor to enhance place recognition accuracy. Furthermore, we introduce a transformer-like cross-image encoder that facilitates information sharing between neighboring images, thus enhancing feature representations. We evaluate CriSALAD on multiple publicly available place recognition datasets, achieving promising performance with a recall@1 of 89.3% on the Nordland dataset, while the closest rival achieves only 76.2%. CriSALAD outperforms both baseline models and advanced VFM-based VPR approaches.
ISSN:	2076-3417

CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation

Similar Items