Joint Detection and Removal of Specular Highlights using Vision Transformer with Multi-scale Patch Attention

Specular highlights play a pivotal role in comprehending scenes within developed visual environment. Nevertheless, their presence can adversely affect the efficacy of solutions in various computer vision tasks. Current methodologies typically use Convolutional Neural Network (CNN)-based Unet archite...

Full description

Saved in:
Bibliographic Details
Main Author: Levent Karacan
Format: Article
Language:English
Published: Sakarya University 2025-03-01
Series:Sakarya University Journal of Computer and Information Sciences
Subjects:
Online Access:https://dergipark.org.tr/en/download/article-file/4077288
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Specular highlights play a pivotal role in comprehending scenes within developed visual environment. Nevertheless, their presence can adversely affect the efficacy of solutions in various computer vision tasks. Current methodologies typically use Convolutional Neural Network (CNN)-based Unet architectures for specular highlight detection. However, CNNs exhibit limitations in capturing global contextual information, despite excelling in local context analysis. To utilize global context information, it is proposed a novel network architecture leveraging Vision Transformers (ViTs) to jointly detect and remove specular highlights for a given image. Developed model incorporates a multi-scale patch-based self-attention mechanism to effectively capture global context, alongside a CNN-based feed-forward network for local contextual cues. Experimental results with both quantitative and qualitative evaluations demonstrate that the proposed approach achieves state-of-the-art performance.
ISSN:2636-8129