Hourglass attention for image super-resolution

Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on T...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ling Xu, Yian Huang, Xiaoping Lin
Format:	Article
Language:	English
Published:	Elsevier 2025-08-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Super-resolution Transformer Windows size Image reconstruction
Online Access:	https://doi.org/10.1007/s44443-025-00214-z
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849225815188832256
author	Ling Xu Yian Huang Xiaoping Lin
author_facet	Ling Xu Yian Huang Xiaoping Lin
author_sort	Ling Xu
collection	DOAJ
description	Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality.
format	Article
id	doaj-art-0b5da760a3e04d758e4105ca9d7c327f
institution	Kabale University
issn	1319-1578 2213-1248
language	English
publishDate	2025-08-01
publisher	Elsevier
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj-art-0b5da760a3e04d758e4105ca9d7c327f2025-08-24T11:53:42ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-08-0137712210.1007/s44443-025-00214-zHourglass attention for image super-resolutionLing Xu0Yian Huang1Xiaoping Lin2Department of Computer and Information Security ManagementCollege of Systems Engineering, City University of Hong Kong CollegeDepartment of Basic Education, Fujian Police CollegeAbstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality.https://doi.org/10.1007/s44443-025-00214-zSuper-resolutionTransformerWindows sizeImage reconstruction
spellingShingle	Ling Xu Yian Huang Xiaoping Lin Hourglass attention for image super-resolution Journal of King Saud University: Computer and Information Sciences Super-resolution Transformer Windows size Image reconstruction
title	Hourglass attention for image super-resolution
title_full	Hourglass attention for image super-resolution
title_fullStr	Hourglass attention for image super-resolution
title_full_unstemmed	Hourglass attention for image super-resolution
title_short	Hourglass attention for image super-resolution
title_sort	hourglass attention for image super resolution
topic	Super-resolution Transformer Windows size Image reconstruction
url	https://doi.org/10.1007/s44443-025-00214-z
work_keys_str_mv	AT lingxu hourglassattentionforimagesuperresolution AT yianhuang hourglassattentionforimagesuperresolution AT xiaopinglin hourglassattentionforimagesuperresolution

Hourglass attention for image super-resolution

Similar Items