CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation
Visual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extra...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5287 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849327371806572544 |
|---|---|
| author | Jinyi Xu Yuhang Ming Minyang Xu Yaqi Fan Yuan Zhang Wanzeng Kong |
| author_facet | Jinyi Xu Yuhang Ming Minyang Xu Yaqi Fan Yuan Zhang Wanzeng Kong |
| author_sort | Jinyi Xu |
| collection | DOAJ |
| description | Visual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extractors. To address this limitation, we propose CriSALAD, a novel VPR model that integrates visual foundation models (VFMs) and cross-image information to improve feature extraction robustness. Specifically, we adapt pre-trained VFMs for VPR by incorporating a parameter-efficient adapter inspired by Xception, ensuring effective task adaptation while preserving computational efficiency. Additionally, we employ the Sinkhorn Algorithm for Locally Aggregated Descriptors (SALAD) as a global descriptor to enhance place recognition accuracy. Furthermore, we introduce a transformer-like cross-image encoder that facilitates information sharing between neighboring images, thus enhancing feature representations. We evaluate CriSALAD on multiple publicly available place recognition datasets, achieving promising performance with a recall@1 of 89.3% on the Nordland dataset, while the closest rival achieves only 76.2%. CriSALAD outperforms both baseline models and advanced VFM-based VPR approaches. |
| format | Article |
| id | doaj-art-481dbca5cc9f4eff99dd431348c21b21 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-481dbca5cc9f4eff99dd431348c21b212025-08-20T03:47:53ZengMDPI AGApplied Sciences2076-34172025-05-011510528710.3390/app15105287CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport AggregationJinyi Xu0Yuhang Ming1Minyang Xu2Yaqi Fan3Yuan Zhang4Wanzeng Kong5School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaSchool of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, ChinaVisual place recognition (VPR) is crucial for enabling autonomous agents to accurately localize themselves within a known environment. While existing methods leverage neural networks to enhance performance and robustness, they often suffer from the limited representation power of local feature extractors. To address this limitation, we propose CriSALAD, a novel VPR model that integrates visual foundation models (VFMs) and cross-image information to improve feature extraction robustness. Specifically, we adapt pre-trained VFMs for VPR by incorporating a parameter-efficient adapter inspired by Xception, ensuring effective task adaptation while preserving computational efficiency. Additionally, we employ the Sinkhorn Algorithm for Locally Aggregated Descriptors (SALAD) as a global descriptor to enhance place recognition accuracy. Furthermore, we introduce a transformer-like cross-image encoder that facilitates information sharing between neighboring images, thus enhancing feature representations. We evaluate CriSALAD on multiple publicly available place recognition datasets, achieving promising performance with a recall@1 of 89.3% on the Nordland dataset, while the closest rival achieves only 76.2%. CriSALAD outperforms both baseline models and advanced VFM-based VPR approaches.https://www.mdpi.com/2076-3417/15/10/5287visual place recognitionvisual localizationtransfer learningfeature aggregation |
| spellingShingle | Jinyi Xu Yuhang Ming Minyang Xu Yaqi Fan Yuan Zhang Wanzeng Kong CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation Applied Sciences visual place recognition visual localization transfer learning feature aggregation |
| title | CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation |
| title_full | CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation |
| title_fullStr | CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation |
| title_full_unstemmed | CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation |
| title_short | CriSALAD: Robust Visual Place Recognition Using Cross-Image Information and Optimal Transport Aggregation |
| title_sort | crisalad robust visual place recognition using cross image information and optimal transport aggregation |
| topic | visual place recognition visual localization transfer learning feature aggregation |
| url | https://www.mdpi.com/2076-3417/15/10/5287 |
| work_keys_str_mv | AT jinyixu crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation AT yuhangming crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation AT minyangxu crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation AT yaqifan crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation AT yuanzhang crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation AT wanzengkong crisaladrobustvisualplacerecognitionusingcrossimageinformationandoptimaltransportaggregation |