Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework

With the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts becaus...

Full description

Saved in:
Bibliographic Details
Main Authors: Anirudha Ghosh, Debaditya Barman, Abu Sufian, Ibrahim A. Hameed
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10772081/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850250115191668736
author Anirudha Ghosh
Debaditya Barman
Abu Sufian
Ibrahim A. Hameed
author_facet Anirudha Ghosh
Debaditya Barman
Abu Sufian
Ibrahim A. Hameed
author_sort Anirudha Ghosh
collection DOAJ
description With the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts because of the limited training data and complex structure of characters. In such a scenario, Siamese Network (SN) meta-learning offers a promising solution for this problem by enabling quick adaptation to new tasks with minimal training data. Despite the success of SNs in various classification tasks, the traditional SN architecture seeks a compelling upgrade to improve its ability to distinguish between similar-looking characters of regional scripts. In this research paper, we propose a novel Priority-Smart Network (PSN) framework for traditional SN architectures, which can easily be incorporated into existing CNN backbone and improve their ability to identify characters in low-resource regional scripts. Furthermore, we propose the Enhanced Differential Edge Detection (EDED) preprocessing strategy explicitly designed for OCR tasks. We rigorously investigate and evaluate three benchmark low-resource script datasets to establish the effectiveness of our proposed techniques. Our experimentation results showcase significant advancements in character recognition accuracy and robustness, emphasizing the potential of SN combined with the PSN framework and EDED strategy for improving OCR systems in low-resource script.
format Article
id doaj-art-2ae5c0be08514453a4c72ae700b4eb71
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-2ae5c0be08514453a4c72ae700b4eb712025-08-20T01:58:19ZengIEEEIEEE Access2169-35362024-01-011218965118966610.1109/ACCESS.2024.350960510772081Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN FrameworkAnirudha Ghosh0https://orcid.org/0009-0004-1465-0386Debaditya Barman1https://orcid.org/0000-0002-7562-119XAbu Sufian2https://orcid.org/0000-0003-2035-2938Ibrahim A. Hameed3https://orcid.org/0000-0003-1252-260XDepartment of Computer and System Sciences, Visva-Bharati University, Santiniketan, IndiaDepartment of Computer and System Sciences, Visva-Bharati University, Santiniketan, IndiaInstitute of Applied Sciences and Intelligent Systems of CNR, Lecce, ItalyDepartment of ICT and Natural Sciences, Norwegian University of Science and Technology, Trondheim, NorwayWith the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts because of the limited training data and complex structure of characters. In such a scenario, Siamese Network (SN) meta-learning offers a promising solution for this problem by enabling quick adaptation to new tasks with minimal training data. Despite the success of SNs in various classification tasks, the traditional SN architecture seeks a compelling upgrade to improve its ability to distinguish between similar-looking characters of regional scripts. In this research paper, we propose a novel Priority-Smart Network (PSN) framework for traditional SN architectures, which can easily be incorporated into existing CNN backbone and improve their ability to identify characters in low-resource regional scripts. Furthermore, we propose the Enhanced Differential Edge Detection (EDED) preprocessing strategy explicitly designed for OCR tasks. We rigorously investigate and evaluate three benchmark low-resource script datasets to establish the effectiveness of our proposed techniques. Our experimentation results showcase significant advancements in character recognition accuracy and robustness, emphasizing the potential of SN combined with the PSN framework and EDED strategy for improving OCR systems in low-resource script.https://ieeexplore.ieee.org/document/10772081/Deep learninglow resource regional languagesmeta-learningOCRpriority-smart networkSiamese network
spellingShingle Anirudha Ghosh
Debaditya Barman
Abu Sufian
Ibrahim A. Hameed
Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
IEEE Access
Deep learning
low resource regional languages
meta-learning
OCR
priority-smart network
Siamese network
title Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
title_full Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
title_fullStr Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
title_full_unstemmed Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
title_short Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
title_sort advancing optical character recognition for low resource scripts a siamese meta learning approach with psn framework
topic Deep learning
low resource regional languages
meta-learning
OCR
priority-smart network
Siamese network
url https://ieeexplore.ieee.org/document/10772081/
work_keys_str_mv AT anirudhaghosh advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework
AT debadityabarman advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework
AT abusufian advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework
AT ibrahimahameed advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework