Local feature enhancement transformer for image super-resolution
Abstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-07650-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849769431033446400 |
|---|---|
| author | Huang Weijie Huang Detian |
| author_facet | Huang Weijie Huang Detian |
| author_sort | Huang Weijie |
| collection | DOAJ |
| description | Abstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively. |
| format | Article |
| id | doaj-art-58527b93cf4146a5b6e12f9a27b1acad |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-58527b93cf4146a5b6e12f9a27b1acad2025-08-20T03:03:25ZengNature PortfolioScientific Reports2045-23222025-07-0115111510.1038/s41598-025-07650-xLocal feature enhancement transformer for image super-resolutionHuang Weijie0Huang Detian1School of Business, Huaqiao UniversityCollege of Engineering, Huaqiao UniversityAbstract Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively.https://doi.org/10.1038/s41598-025-07650-xImage super-resolutionTransformerGlobal context informationLocal feature interactionAttention mechanism |
| spellingShingle | Huang Weijie Huang Detian Local feature enhancement transformer for image super-resolution Scientific Reports Image super-resolution Transformer Global context information Local feature interaction Attention mechanism |
| title | Local feature enhancement transformer for image super-resolution |
| title_full | Local feature enhancement transformer for image super-resolution |
| title_fullStr | Local feature enhancement transformer for image super-resolution |
| title_full_unstemmed | Local feature enhancement transformer for image super-resolution |
| title_short | Local feature enhancement transformer for image super-resolution |
| title_sort | local feature enhancement transformer for image super resolution |
| topic | Image super-resolution Transformer Global context information Local feature interaction Attention mechanism |
| url | https://doi.org/10.1038/s41598-025-07650-x |
| work_keys_str_mv | AT huangweijie localfeatureenhancementtransformerforimagesuperresolution AT huangdetian localfeatureenhancementtransformerforimagesuperresolution |