Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model

Scene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the impo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shrey Singh, Prateek Keserwani, Partha Pratim Roy, Rajkumar Saini
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Scene text image super-resolution diffusion model skeleton networks text recognition
Online Access:	https://ieeexplore.ieee.org/document/10772209/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850251011222929408
author	Shrey Singh Prateek Keserwani Partha Pratim Roy Rajkumar Saini
author_facet	Shrey Singh Prateek Keserwani Partha Pratim Roy Rajkumar Saini
author_sort	Shrey Singh
collection	DOAJ
description	Scene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the importance of explicitly modeling the visual structure of the text. In this paper, we propose a novel Skeleton-Aware Diffusion Method (SADM) for STISR, which introduces text skeletons as structural guidance to the diffusion process. The text skeleton serves as a critical visual cue, helping the model to better restore the fine details of text, even in severely degraded low-resolution images. Generating high-quality skeletons from low-resolution scene text is a challenging task due to the inherent blurring and noise present in such images. To tackle this, we introduce a diffusion-based Skeleton Correction Network (SCN), which refines the initial skeletons produced by a convolutional neural network-based skeletonization model. The SCN effectively improves the accuracy of the skeletons, allowing for more precise structural guidance during the diffusion process. Our extensive experiments demonstrate the significant benefits of incorporating skeleton information into the STISR pipeline. The proposed SADM achieves state-of-the-art performance on the TextZoom dataset, with accuracies of 81.4%, 64.9%, and 49.6% on the easy, medium, and hard subsets, respectively, compared to the previous best results by ASTER text recognizer. Through detailed analysis, we also show that improving the quality of skeletons from low-resolution images leads to better super-resolution outcomes and enhances the performance of text recognizers.
format	Article
id	doaj-art-1b6c8d72c9344b9dbeafb014dfcfa0f4
institution	OA Journals
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-1b6c8d72c9344b9dbeafb014dfcfa0f42025-08-20T01:58:00ZengIEEEIEEE Access2169-35362024-01-011218764018765110.1109/ACCESS.2024.351013610772209Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion ModelShrey Singh0https://orcid.org/0000-0002-2685-1319Prateek Keserwani1Partha Pratim Roy2https://orcid.org/0000-0002-5735-5254Rajkumar Saini3https://orcid.org/0000-0001-8532-0895Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science, Electrical and Space Engineering, Luleå Tekniska Universitet, Luleå, SwedenScene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the importance of explicitly modeling the visual structure of the text. In this paper, we propose a novel Skeleton-Aware Diffusion Method (SADM) for STISR, which introduces text skeletons as structural guidance to the diffusion process. The text skeleton serves as a critical visual cue, helping the model to better restore the fine details of text, even in severely degraded low-resolution images. Generating high-quality skeletons from low-resolution scene text is a challenging task due to the inherent blurring and noise present in such images. To tackle this, we introduce a diffusion-based Skeleton Correction Network (SCN), which refines the initial skeletons produced by a convolutional neural network-based skeletonization model. The SCN effectively improves the accuracy of the skeletons, allowing for more precise structural guidance during the diffusion process. Our extensive experiments demonstrate the significant benefits of incorporating skeleton information into the STISR pipeline. The proposed SADM achieves state-of-the-art performance on the TextZoom dataset, with accuracies of 81.4%, 64.9%, and 49.6% on the easy, medium, and hard subsets, respectively, compared to the previous best results by ASTER text recognizer. Through detailed analysis, we also show that improving the quality of skeletons from low-resolution images leads to better super-resolution outcomes and enhances the performance of text recognizers.https://ieeexplore.ieee.org/document/10772209/Scene text image super-resolutiondiffusion modelskeleton networkstext recognition
spellingShingle	Shrey Singh Prateek Keserwani Partha Pratim Roy Rajkumar Saini Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model IEEE Access Scene text image super-resolution diffusion model skeleton networks text recognition
title	Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_full	Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_fullStr	Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_full_unstemmed	Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_short	Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
title_sort	better skeleton better readability scene text image super resolution via skeleton aware diffusion model
topic	Scene text image super-resolution diffusion model skeleton networks text recognition
url	https://ieeexplore.ieee.org/document/10772209/
work_keys_str_mv	AT shreysingh betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel AT prateekkeserwani betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel AT parthapratimroy betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel AT rajkumarsaini betterskeletonbetterreadabilityscenetextimagesuperresolutionviaskeletonawarediffusionmodel

Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model

Similar Items