Hourglass attention for image super-resolution

Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on T...

Full description

Saved in:
Bibliographic Details
Main Authors: Ling Xu, Yian Huang, Xiaoping Lin
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00214-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849225815188832256
author Ling Xu
Yian Huang
Xiaoping Lin
author_facet Ling Xu
Yian Huang
Xiaoping Lin
author_sort Ling Xu
collection DOAJ
description Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality.
format Article
id doaj-art-0b5da760a3e04d758e4105ca9d7c327f
institution Kabale University
issn 1319-1578
2213-1248
language English
publishDate 2025-08-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-0b5da760a3e04d758e4105ca9d7c327f2025-08-24T11:53:42ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-08-0137712210.1007/s44443-025-00214-zHourglass attention for image super-resolutionLing Xu0Yian Huang1Xiaoping Lin2Department of Computer and Information Security ManagementCollege of Systems Engineering, City University of Hong Kong CollegeDepartment of Basic Education, Fujian Police CollegeAbstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality.https://doi.org/10.1007/s44443-025-00214-zSuper-resolutionTransformerWindows sizeImage reconstruction
spellingShingle Ling Xu
Yian Huang
Xiaoping Lin
Hourglass attention for image super-resolution
Journal of King Saud University: Computer and Information Sciences
Super-resolution
Transformer
Windows size
Image reconstruction
title Hourglass attention for image super-resolution
title_full Hourglass attention for image super-resolution
title_fullStr Hourglass attention for image super-resolution
title_full_unstemmed Hourglass attention for image super-resolution
title_short Hourglass attention for image super-resolution
title_sort hourglass attention for image super resolution
topic Super-resolution
Transformer
Windows size
Image reconstruction
url https://doi.org/10.1007/s44443-025-00214-z
work_keys_str_mv AT lingxu hourglassattentionforimagesuperresolution
AT yianhuang hourglassattentionforimagesuperresolution
AT xiaopinglin hourglassattentionforimagesuperresolution