Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model

Scene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the impo...

Full description

Saved in:
Bibliographic Details
Main Authors: Shrey Singh, Prateek Keserwani, Partha Pratim Roy, Rajkumar Saini
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10772209/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850251011222929408
author Shrey Singh
Prateek Keserwani
Partha Pratim Roy
Rajkumar Saini
author_facet Shrey Singh
Prateek Keserwani
Partha Pratim Roy
Rajkumar Saini
author_sort Shrey Singh
collection DOAJ
description Scene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the importance of explicitly modeling the visual structure of the text. In this paper, we propose a novel Skeleton-Aware Diffusion Method (SADM) for STISR, which introduces text skeletons as structural guidance to the diffusion process. The text skeleton serves as a critical visual cue, helping the model to better restore the fine details of text, even in severely degraded low-resolution images. Generating high-quality skeletons from low-resolution scene text is a challenging task due to the inherent blurring and noise present in such images. To tackle this, we introduce a diffusion-based Skeleton Correction Network (SCN), which refines the initial skeletons produced by a convolutional neural network-based skeletonization model. The SCN effectively improves the accuracy of the skeletons, allowing for more precise structural guidance during the diffusion process. Our extensive experiments demonstrate the significant benefits of incorporating skeleton information into the STISR pipeline. The proposed SADM achieves state-of-the-art performance on the TextZoom dataset, with accuracies of 81.4%, 64.9%, and 49.6% on the easy, medium, and hard subsets, respectively, compared to the previous best results by ASTER text recognizer. Through detailed analysis, we also show that improving the quality of skeletons from low-resolution images leads to better super-resolution outcomes and enhances the performance of text recognizers.
format Article
id doaj-art-1b6c8d72c9344b9dbeafb014dfcfa0f4
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-1b6c8d72c9344b9dbeafb014dfcfa0f42025-08-20T01:58:00ZengIEEEIEEE Access2169-35362024-01-011218764018765110.1109/ACCESS.2024.351013610772209Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion ModelShrey Singh0https://orcid.org/0000-0002-2685-1319Prateek Keserwani1Partha Pratim Roy2https://orcid.org/0000-0002-5735-5254Rajkumar Saini3https://orcid.org/0000-0001-8532-0895Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science, Electrical and Space Engineering, Luleå Tekniska Universitet, Luleå, SwedenScene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the importance of explicitly modeling the visual structure of the text. In this paper, we propose a novel Skeleton-Aware Diffusion Method (SADM) for STISR, which introduces text skeletons as structural guidance to the diffusion process. The text skeleton serves as a critical visual cue, helping the model to better restore the fine details of text, even in severely degraded low-resolution images. Generating high-quality skeletons from low-resolution scene text is a challenging task due to the inherent blurring and noise present in such images. To tackle this, we introduce a diffusion-based Skeleton Correction Network (SCN), which refines the initial skeletons produced by a convolutional neural network-based skeletonization model. The SCN effectively improves the accuracy of the skeletons, allowing for more precise structural guidance during the diffusion process. Our extensive experiments demonstrate the significant benefits of incorporating skeleton information into the STISR pipeline. The proposed SADM achieves state-of-the-art performance on the TextZoom dataset, with accuracies of 81.4%, 64.9%, and 49.6% on the easy, medium, and hard subsets, respectively, compared to the previous best results by ASTER text recognizer. Through detailed analysis, we also show that improving the quality of skeletons from low-resolution images leads to better super-resolution outcomes and enhances the performance of text recognizers.https://ieeexplore.ieee.org/document/10772209/Scene text image super-resolutiondiffusion modelskeleton networkstext recognition
spellingShingle Shrey Singh
Prateek Keserwani
Partha Pratim Roy
Rajkumar Saini
Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
IEEE Access
Scene text image super-resolution
diffusion model
skeleton networks
text recognition
title Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_full Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_fullStr Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_full_unstemmed Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_short Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_sort better skeleton better readability scene text image super resolution via skeleton aware diffusion model
topic Scene text image super-resolution
diffusion model
skeleton networks
text recognition
url https://ieeexplore.ieee.org/document/10772209/
work_keys_str_mv AT shreysingh betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel
AT prateekkeserwani betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel
AT parthapratimroy betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel
AT rajkumarsaini betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel