Attention Score Enhancement Model Through Pairwise Image Comparison
This study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-10-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/21/9928 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850197281488240640 |
|---|---|
| author | Yeong Seok Ju Zong Woo Geem Joon Shik Lim |
| author_facet | Yeong Seok Ju Zong Woo Geem Joon Shik Lim |
| author_sort | Yeong Seok Ju |
| collection | DOAJ |
| description | This study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial. Although the ViT excels at capturing global interactions within the entire image, it may potentially underperform due to its inadequate representation of local features such as color, texture, and edges. The proposed PAE model enhances local features by calculating cosine similarity between the attention maps of training and reference images and integrating attention maps in regions with high similarity. This approach complements the ViT’s global capture capability, allowing for a more accurate reflection of subtle visual differences. Experiments using Clock Drawing Test data demonstrated that the PAE model achieved a precision of 0.9383, recall of 0.8916, F1-Score of 0.9133, and accuracy of 92.69%, showing a 12% improvement over API-Net and a 1% improvement over the ViT. This study suggests that the PAE model can enhance performance in computer vision fields where local features are crucial by overcoming the limitations of the ViT. |
| format | Article |
| id | doaj-art-94b3504b7f6047caa9a6f5de14533eaa |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-94b3504b7f6047caa9a6f5de14533eaa2025-08-20T02:13:12ZengMDPI AGApplied Sciences2076-34172024-10-011421992810.3390/app14219928Attention Score Enhancement Model Through Pairwise Image ComparisonYeong Seok Ju0Zong Woo Geem1Joon Shik Lim2Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of KoreaDepartment of Smart City, Gachon University, Seongnam 13120, Republic of KoreaDepartment of Computer Engineering, Gachon University, Seongnam 13120, Republic of KoreaThis study proposes the Pairwise Attention Enhancement (PAE) model to address the limitations of the Vision Transformer (ViT). While the ViT effectively models global relationships between image patches, it encounters challenges in medical image analysis where fine-grained local features are crucial. Although the ViT excels at capturing global interactions within the entire image, it may potentially underperform due to its inadequate representation of local features such as color, texture, and edges. The proposed PAE model enhances local features by calculating cosine similarity between the attention maps of training and reference images and integrating attention maps in regions with high similarity. This approach complements the ViT’s global capture capability, allowing for a more accurate reflection of subtle visual differences. Experiments using Clock Drawing Test data demonstrated that the PAE model achieved a precision of 0.9383, recall of 0.8916, F1-Score of 0.9133, and accuracy of 92.69%, showing a 12% improvement over API-Net and a 1% improvement over the ViT. This study suggests that the PAE model can enhance performance in computer vision fields where local features are crucial by overcoming the limitations of the ViT.https://www.mdpi.com/2076-3417/14/21/9928vision transformerclock drawing testattention mechanismdementia classificationimage processing |
| spellingShingle | Yeong Seok Ju Zong Woo Geem Joon Shik Lim Attention Score Enhancement Model Through Pairwise Image Comparison Applied Sciences vision transformer clock drawing test attention mechanism dementia classification image processing |
| title | Attention Score Enhancement Model Through Pairwise Image Comparison |
| title_full | Attention Score Enhancement Model Through Pairwise Image Comparison |
| title_fullStr | Attention Score Enhancement Model Through Pairwise Image Comparison |
| title_full_unstemmed | Attention Score Enhancement Model Through Pairwise Image Comparison |
| title_short | Attention Score Enhancement Model Through Pairwise Image Comparison |
| title_sort | attention score enhancement model through pairwise image comparison |
| topic | vision transformer clock drawing test attention mechanism dementia classification image processing |
| url | https://www.mdpi.com/2076-3417/14/21/9928 |
| work_keys_str_mv | AT yeongseokju attentionscoreenhancementmodelthroughpairwiseimagecomparison AT zongwoogeem attentionscoreenhancementmodelthroughpairwiseimagecomparison AT joonshiklim attentionscoreenhancementmodelthroughpairwiseimagecomparison |