Hourglass attention for image super-resolution

Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on T...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ling Xu, Yian Huang, Xiaoping Lin
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Super-resolution Transformer Windows size Image reconstruction
Online Access:	https://doi.org/10.1007/s44443-025-00214-z
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality.
ISSN:	1319-1578 2213-1248

Hourglass attention for image super-resolution

Similar Items