Scene Text Recognition That Eliminates Background and Character Noise Interference

In natural photographs, complex background noise and character noise frequently interfere with scene text identification. To solve the aforementioned concerns, this paper proposes a novel scene character identification model that eliminates noise from both the backdrop and the character (ENBC). The...

Full description

Saved in:
Bibliographic Details
Main Authors: Shancheng Tang, Yaoqian Cao, Shaojun Liang, Zicheng Jin, Kun Lai
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3545
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In natural photographs, complex background noise and character noise frequently interfere with scene text identification. To solve the aforementioned concerns, this paper proposes a novel scene character identification model that eliminates noise from both the backdrop and the character (ENBC). The model is divided into three pieces. To begin, the high-level character feature extraction module uses ASPP dilated convolution with varying expansion rates to obtain features at various scales, thereby expanding the receptive field to capture the character feature area more effectively, eliminating noise interference from the character itself, and improving the character shape features. Second, the multi-level character feature fusion module merged the high-level character feature information after upsampling with the low-level character feature information in the backbone network, separated the foreground characters from background interference, removed background noise, and output the resulting image. Third, the recognition enhancement module enhances character context modeling by considering both forward (left-to-right) and backward (right-to-left) information from the text sequence. The experimental results show that the model can effectively minimize background and character noise interference, boosting recognition accuracy by at least 4.2% on the synthetic scene dataset. When compared to other popular techniques on the IIIT5K, ICDAR-2015, ICDAR-2003, and CUTE80 public datasets, recognition accuracy improves by an average of 6.97%.
ISSN:2076-3417