Hourglass attention for image super-resolution
Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on T...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-08-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44443-025-00214-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849225815188832256 |
|---|---|
| author | Ling Xu Yian Huang Xiaoping Lin |
| author_facet | Ling Xu Yian Huang Xiaoping Lin |
| author_sort | Ling Xu |
| collection | DOAJ |
| description | Abstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality. |
| format | Article |
| id | doaj-art-0b5da760a3e04d758e4105ca9d7c327f |
| institution | Kabale University |
| issn | 1319-1578 2213-1248 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Journal of King Saud University: Computer and Information Sciences |
| spelling | doaj-art-0b5da760a3e04d758e4105ca9d7c327f2025-08-24T11:53:42ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-08-0137712210.1007/s44443-025-00214-zHourglass attention for image super-resolutionLing Xu0Yian Huang1Xiaoping Lin2Department of Computer and Information Security ManagementCollege of Systems Engineering, City University of Hong Kong CollegeDepartment of Basic Education, Fujian Police CollegeAbstract SISR is an important research topic in computer vision. Its goal is to reconstruct HR images with rich details from LR inputs. Early methods based on CNNs made some progress, but their performance reached a limit due to limited model capacity and expressiveness. Recently, methods based on Transformers have shown significant improvements in this field. Their ability to capture long-range dependencies makes them well-suited for image reconstruction. However, these models often require high computational resources, which limits their practical use. This paper presents a detailed analysis of SISR and provides one key insight: the reconstruction performance depends on both low-level and high-level features. Then, we propose a novel SR model called HGFormer. This model uses a shallow architecture and introduces a Dynamic Spatial Information Compression (DSIC) module, which reduces computational complexity by converting the spatial information of mid-level features into the channel dimension. This improves both the efficiency and effectiveness of the model. HGFormer is the first method to expand the self-attention window to $$\varvec{32 \times 32}$$ 32 × 32 while keeping low computational costs. It achieves a performance gain of 0.69 dB on the Urban100 dataset. Extensive experiments on public datasets show that HGFormer outperforms existing methods in both objective metrics and visual quality.https://doi.org/10.1007/s44443-025-00214-zSuper-resolutionTransformerWindows sizeImage reconstruction |
| spellingShingle | Ling Xu Yian Huang Xiaoping Lin Hourglass attention for image super-resolution Journal of King Saud University: Computer and Information Sciences Super-resolution Transformer Windows size Image reconstruction |
| title | Hourglass attention for image super-resolution |
| title_full | Hourglass attention for image super-resolution |
| title_fullStr | Hourglass attention for image super-resolution |
| title_full_unstemmed | Hourglass attention for image super-resolution |
| title_short | Hourglass attention for image super-resolution |
| title_sort | hourglass attention for image super resolution |
| topic | Super-resolution Transformer Windows size Image reconstruction |
| url | https://doi.org/10.1007/s44443-025-00214-z |
| work_keys_str_mv | AT lingxu hourglassattentionforimagesuperresolution AT yianhuang hourglassattentionforimagesuperresolution AT xiaopinglin hourglassattentionforimagesuperresolution |