Non-Homogeneous Image Dehazing Model Based on Vision Transformer Shunt Self-Attention Aggregation

This study introduces an innovative dehazing technique utilizing a Vision Transformer to mitigate the image quality degradation caused by non-homogeneous haze in real-world environments. Initially, the model employs convolutional layers to augment the channel dimensions of the input image, enabling...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhigang Zhang, Byung-Won Min, Zijiao Zhang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11028993/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study introduces an innovative dehazing technique utilizing a Vision Transformer to mitigate the image quality degradation caused by non-homogeneous haze in real-world environments. Initially, the model employs convolutional layers to augment the channel dimensions of the input image, enabling the extraction of basic local features. Subsequently, we incorporate a Multi-Scale Channel-Pixel Joint Attention Module, which refines the attention on various haze-affected areas, ensuring accurate capture of the complex characteristics associated with non-homogeneous haze. The cornerstone of our approach is the proposed improved Vision Transformer, which integrates a Shunt Self-Attention Aggregation Module. This module, leveraging a multi-head parallel self-attention mechanism, facilitates the efficient fusion of features across multiple scales, thereby enhancing feature reuse and integration. We conducted extensive experimental evaluations using both real-world and synthetic non-homogeneous haze datasets, rigorously validating the robustness and effectiveness of our model. The results demonstrate that our approach surpasses existing benchmark techniques across various evaluation metrics, affirming its preeminence in the field of image dehazing.
ISSN:2169-3536