Self-Supervised Foundation Model for Template Matching
Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Big Data and Cognitive Computing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-2289/9/2/38 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849719583654543360 |
|---|---|
| author | Anton Hristov Dimo Dimov Maria Nisheva-Pavlova |
| author_facet | Anton Hristov Dimo Dimov Maria Nisheva-Pavlova |
| author_sort | Anton Hristov |
| collection | DOAJ |
| description | Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. |
| format | Article |
| id | doaj-art-3bf0103f0f3e4e4ab9af8f2856799723 |
| institution | DOAJ |
| issn | 2504-2289 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Big Data and Cognitive Computing |
| spelling | doaj-art-3bf0103f0f3e4e4ab9af8f28567997232025-08-20T03:12:08ZengMDPI AGBig Data and Cognitive Computing2504-22892025-02-01923810.3390/bdcc9020038Self-Supervised Foundation Model for Template MatchingAnton Hristov0Dimo Dimov1Maria Nisheva-Pavlova2Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaInstitute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev Str., Block 2, 1113 Sofia, BulgariaFaculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaFinding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one.https://www.mdpi.com/2504-2289/9/2/38self-supervised learningtemplate matchingfoundation modelconvolutional neural networkimage matching |
| spellingShingle | Anton Hristov Dimo Dimov Maria Nisheva-Pavlova Self-Supervised Foundation Model for Template Matching Big Data and Cognitive Computing self-supervised learning template matching foundation model convolutional neural network image matching |
| title | Self-Supervised Foundation Model for Template Matching |
| title_full | Self-Supervised Foundation Model for Template Matching |
| title_fullStr | Self-Supervised Foundation Model for Template Matching |
| title_full_unstemmed | Self-Supervised Foundation Model for Template Matching |
| title_short | Self-Supervised Foundation Model for Template Matching |
| title_sort | self supervised foundation model for template matching |
| topic | self-supervised learning template matching foundation model convolutional neural network image matching |
| url | https://www.mdpi.com/2504-2289/9/2/38 |
| work_keys_str_mv | AT antonhristov selfsupervisedfoundationmodelfortemplatematching AT dimodimov selfsupervisedfoundationmodelfortemplatematching AT marianishevapavlova selfsupervisedfoundationmodelfortemplatematching |