Local feature enhancement transformer for image super-resolution

Abstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of...

Full description

Saved in:
Bibliographic Details
Main Authors: Huang Weijie, Huang Detian
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-07650-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849769431033446400
author Huang Weijie
Huang Detian
author_facet Huang Weijie
Huang Detian
author_sort Huang Weijie
collection DOAJ
description Abstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively.
format Article
id doaj-art-58527b93cf4146a5b6e12f9a27b1acad
institution DOAJ
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-58527b93cf4146a5b6e12f9a27b1acad2025-08-20T03:03:25ZengNature PortfolioScientific Reports2045-23222025-07-0115111510.1038/s41598-025-07650-xLocal feature enhancement transformer for image super-resolutionHuang Weijie0Huang Detian1School of Business, Huaqiao UniversityCollege of Engineering, Huaqiao UniversityAbstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively.https://doi.org/10.1038/s41598-025-07650-xImage super-resolutionTransformerGlobal context informationLocal feature interactionAttention mechanism
spellingShingle Huang Weijie
Huang Detian
Local feature enhancement transformer for image super-resolution
Scientific Reports
Image super-resolution
Transformer
Global context information
Local feature interaction
Attention mechanism
title Local feature enhancement transformer for image super-resolution
title_full Local feature enhancement transformer for image super-resolution
title_fullStr Local feature enhancement transformer for image super-resolution
title_full_unstemmed Local feature enhancement transformer for image super-resolution
title_short Local feature enhancement transformer for image super-resolution
title_sort local feature enhancement transformer for image super resolution
topic Image super-resolution
Transformer
Global context information
Local feature interaction
Attention mechanism
url https://doi.org/10.1038/s41598-025-07650-x
work_keys_str_mv AT huangweijie localfeatureenhancementtransformerforimagesuperresolution
AT huangdetian localfeatureenhancementtransformerforimagesuperresolution