Self-Supervised Foundation Model for Template Matching

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations...

Full description

Saved in:
Bibliographic Details
Main Authors: Anton Hristov, Dimo Dimov, Maria Nisheva-Pavlova
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/2/38
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849719583654543360
author Anton Hristov
Dimo Dimov
Maria Nisheva-Pavlova
author_facet Anton Hristov
Dimo Dimov
Maria Nisheva-Pavlova
author_sort Anton Hristov
collection DOAJ
description Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one.
format Article
id doaj-art-3bf0103f0f3e4e4ab9af8f2856799723
institution DOAJ
issn 2504-2289
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj-art-3bf0103f0f3e4e4ab9af8f28567997232025-08-20T03:12:08ZengMDPI AGBig Data and Cognitive Computing2504-22892025-02-01923810.3390/bdcc9020038Self-Supervised Foundation Model for Template MatchingAnton Hristov0Dimo Dimov1Maria Nisheva-Pavlova2Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaInstitute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev Str., Block 2, 1113 Sofia, BulgariaFaculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, 5 James Bourchier Blvd., 1164 Sofia, BulgariaFinding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one.https://www.mdpi.com/2504-2289/9/2/38self-supervised learningtemplate matchingfoundation modelconvolutional neural networkimage matching
spellingShingle Anton Hristov
Dimo Dimov
Maria Nisheva-Pavlova
Self-Supervised Foundation Model for Template Matching
Big Data and Cognitive Computing
self-supervised learning
template matching
foundation model
convolutional neural network
image matching
title Self-Supervised Foundation Model for Template Matching
title_full Self-Supervised Foundation Model for Template Matching
title_fullStr Self-Supervised Foundation Model for Template Matching
title_full_unstemmed Self-Supervised Foundation Model for Template Matching
title_short Self-Supervised Foundation Model for Template Matching
title_sort self supervised foundation model for template matching
topic self-supervised learning
template matching
foundation model
convolutional neural network
image matching
url https://www.mdpi.com/2504-2289/9/2/38
work_keys_str_mv AT antonhristov selfsupervisedfoundationmodelfortemplatematching
AT dimodimov selfsupervisedfoundationmodelfortemplatematching
AT marianishevapavlova selfsupervisedfoundationmodelfortemplatematching