Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
With the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts becaus...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10772081/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850250115191668736 |
|---|---|
| author | Anirudha Ghosh Debaditya Barman Abu Sufian Ibrahim A. Hameed |
| author_facet | Anirudha Ghosh Debaditya Barman Abu Sufian Ibrahim A. Hameed |
| author_sort | Anirudha Ghosh |
| collection | DOAJ |
| description | With the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts because of the limited training data and complex structure of characters. In such a scenario, Siamese Network (SN) meta-learning offers a promising solution for this problem by enabling quick adaptation to new tasks with minimal training data. Despite the success of SNs in various classification tasks, the traditional SN architecture seeks a compelling upgrade to improve its ability to distinguish between similar-looking characters of regional scripts. In this research paper, we propose a novel Priority-Smart Network (PSN) framework for traditional SN architectures, which can easily be incorporated into existing CNN backbone and improve their ability to identify characters in low-resource regional scripts. Furthermore, we propose the Enhanced Differential Edge Detection (EDED) preprocessing strategy explicitly designed for OCR tasks. We rigorously investigate and evaluate three benchmark low-resource script datasets to establish the effectiveness of our proposed techniques. Our experimentation results showcase significant advancements in character recognition accuracy and robustness, emphasizing the potential of SN combined with the PSN framework and EDED strategy for improving OCR systems in low-resource script. |
| format | Article |
| id | doaj-art-2ae5c0be08514453a4c72ae700b4eb71 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-2ae5c0be08514453a4c72ae700b4eb712025-08-20T01:58:19ZengIEEEIEEE Access2169-35362024-01-011218965118966610.1109/ACCESS.2024.350960510772081Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN FrameworkAnirudha Ghosh0https://orcid.org/0009-0004-1465-0386Debaditya Barman1https://orcid.org/0000-0002-7562-119XAbu Sufian2https://orcid.org/0000-0003-2035-2938Ibrahim A. Hameed3https://orcid.org/0000-0003-1252-260XDepartment of Computer and System Sciences, Visva-Bharati University, Santiniketan, IndiaDepartment of Computer and System Sciences, Visva-Bharati University, Santiniketan, IndiaInstitute of Applied Sciences and Intelligent Systems of CNR, Lecce, ItalyDepartment of ICT and Natural Sciences, Norwegian University of Science and Technology, Trondheim, NorwayWith the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts because of the limited training data and complex structure of characters. In such a scenario, Siamese Network (SN) meta-learning offers a promising solution for this problem by enabling quick adaptation to new tasks with minimal training data. Despite the success of SNs in various classification tasks, the traditional SN architecture seeks a compelling upgrade to improve its ability to distinguish between similar-looking characters of regional scripts. In this research paper, we propose a novel Priority-Smart Network (PSN) framework for traditional SN architectures, which can easily be incorporated into existing CNN backbone and improve their ability to identify characters in low-resource regional scripts. Furthermore, we propose the Enhanced Differential Edge Detection (EDED) preprocessing strategy explicitly designed for OCR tasks. We rigorously investigate and evaluate three benchmark low-resource script datasets to establish the effectiveness of our proposed techniques. Our experimentation results showcase significant advancements in character recognition accuracy and robustness, emphasizing the potential of SN combined with the PSN framework and EDED strategy for improving OCR systems in low-resource script.https://ieeexplore.ieee.org/document/10772081/Deep learninglow resource regional languagesmeta-learningOCRpriority-smart networkSiamese network |
| spellingShingle | Anirudha Ghosh Debaditya Barman Abu Sufian Ibrahim A. Hameed Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework IEEE Access Deep learning low resource regional languages meta-learning OCR priority-smart network Siamese network |
| title | Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework |
| title_full | Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework |
| title_fullStr | Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework |
| title_full_unstemmed | Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework |
| title_short | Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework |
| title_sort | advancing optical character recognition for low resource scripts a siamese meta learning approach with psn framework |
| topic | Deep learning low resource regional languages meta-learning OCR priority-smart network Siamese network |
| url | https://ieeexplore.ieee.org/document/10772081/ |
| work_keys_str_mv | AT anirudhaghosh advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework AT debadityabarman advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework AT abusufian advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework AT ibrahimahameed advancingopticalcharacterrecognitionforlowresourcescriptsasiamesemetalearningapproachwithpsnframework |