Attention Score Enhancement Model Through Pairwise Image Comparison

This study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial...

Full description

Saved in:
Bibliographic Details
Main Authors: Yeong Seok Ju, Zong Woo Geem, Joon Shik Lim
Format: Article
Language:English
Published: MDPI AG 2024-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/21/9928
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850197281488240640
author Yeong Seok Ju
Zong Woo Geem
Joon Shik Lim
author_facet Yeong Seok Ju
Zong Woo Geem
Joon Shik Lim
author_sort Yeong Seok Ju
collection DOAJ
description This study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial. Although the ViT excels at capturing global interactions within the entire image, it may potentially underperform due to its inadequate representation of local features such as color, texture, and edges. The proposed PAE model enhances local features by calculating cosine similarity between the attention maps of training and reference images and integrating attention maps in regions with high similarity. This approach complements the ViT’s global capture capability, allowing for a more accurate reflection of subtle visual differences. Experiments using Clock Drawing Test data demonstrated that the PAE model achieved a precision of 0.9383, recall of 0.8916, F1-Score of 0.9133, and accuracy of 92.69%, showing a 12% improvement over API-Net and a 1% improvement over the ViT. This study suggests that the PAE model can enhance performance in computer vision fields where local features are crucial by overcoming the limitations of the ViT.
format Article
id doaj-art-94b3504b7f6047caa9a6f5de14533eaa
institution OA Journals
issn 2076-3417
language English
publishDate 2024-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-94b3504b7f6047caa9a6f5de14533eaa2025-08-20T02:13:12ZengMDPI AGApplied Sciences2076-34172024-10-011421992810.3390/app14219928Attention Score Enhancement Model Through Pairwise Image ComparisonYeong Seok Ju0Zong Woo Geem1Joon Shik Lim2Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of KoreaDepartment of Smart City, Gachon University, Seongnam 13120, Republic of KoreaDepartment of Computer Engineering, Gachon University, Seongnam 13120, Republic of KoreaThis study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial. Although the ViT excels at capturing global interactions within the entire image, it may potentially underperform due to its inadequate representation of local features such as color, texture, and edges. The proposed PAE model enhances local features by calculating cosine similarity between the attention maps of training and reference images and integrating attention maps in regions with high similarity. This approach complements the ViT’s global capture capability, allowing for a more accurate reflection of subtle visual differences. Experiments using Clock Drawing Test data demonstrated that the PAE model achieved a precision of 0.9383, recall of 0.8916, F1-Score of 0.9133, and accuracy of 92.69%, showing a 12% improvement over API-Net and a 1% improvement over the ViT. This study suggests that the PAE model can enhance performance in computer vision fields where local features are crucial by overcoming the limitations of the ViT.https://www.mdpi.com/2076-3417/14/21/9928vision transformerclock drawing testattention mechanismdementia classificationimage processing
spellingShingle Yeong Seok Ju
Zong Woo Geem
Joon Shik Lim
Attention Score Enhancement Model Through Pairwise Image Comparison
Applied Sciences
vision transformer
clock drawing test
attention mechanism
dementia classification
image processing
title Attention Score Enhancement Model Through Pairwise Image Comparison
title_full Attention Score Enhancement Model Through Pairwise Image Comparison
title_fullStr Attention Score Enhancement Model Through Pairwise Image Comparison
title_full_unstemmed Attention Score Enhancement Model Through Pairwise Image Comparison
title_short Attention Score Enhancement Model Through Pairwise Image Comparison
title_sort attention score enhancement model through pairwise image comparison
topic vision transformer
clock drawing test
attention mechanism
dementia classification
image processing
url https://www.mdpi.com/2076-3417/14/21/9928
work_keys_str_mv AT yeongseokju attentionscoreenhancementmodelthroughpairwiseimagecomparison
AT zongwoogeem attentionscoreenhancementmodelthroughpairwiseimagecomparison
AT joonshiklim attentionscoreenhancementmodelthroughpairwiseimagecomparison