Local feature enhancement transformer for image super-resolution

Abstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huang Weijie, Huang Detian
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Image super-resolution Transformer Global context information Local feature interaction Attention mechanism
Online Access:	https://doi.org/10.1038/s41598-025-07650-x
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively.
ISSN:	2045-2322

Local feature enhancement transformer for image super-resolution

Similar Items