Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network
Deploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-base...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11098840/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849391094098296832 |
|---|---|
| author | Fauzia Yasir Majida Kazmi |
| author_facet | Fauzia Yasir Majida Kazmi |
| author_sort | Fauzia Yasir |
| collection | DOAJ |
| description | Deploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-based OCR implementations have primarily focused on simplified datasets such as MNIST digits, limiting their generalizability to scripts like Urdu that exhibit extensive intra-class variability, contextual shaping, and diacritics. This study presents a hardware-accelerated Urdu OCR framework using a custom-designed Convolutional Neural Network (CNN) optimized for deployment on the Xilinx Zynq UltraScale+ MPSoC (ZCU104). The proposed CNN is trained on a novel large-scale dataset of 336,000 labeled images spanning 48 Urdu characters across 230 font styles. Compared to MNIST-based FPGA implementations, our approach addresses significantly higher script complexity while achieving a classification accuracy of 96.73% (FP32) and 94.06% (INT8). Hardware-aware quantization and deployment using the Vitis AI toolchain enabled 75% model compression with minimal accuracy loss, achieving real-time inference of 0.189 ms per character and 4,886.95 FPS, while consuming only 1.32 W. Benchmarking against CPU and GPU platforms confirmed substantial improvements in speed and energy efficiency. This work establishes a high-performance, scalable, and energy-efficient FPGA-based OCR framework for Urdu and sets the foundation for extending such solutions to other cursive, low-resource languages like Arabic, Pashto, and Persian. |
| format | Article |
| id | doaj-art-846f68e910064324b788e5e739cc09b3 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-846f68e910064324b788e5e739cc09b32025-08-20T03:41:11ZengIEEEIEEE Access2169-35362025-01-011313553813555710.1109/ACCESS.2025.359329411098840Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural NetworkFauzia Yasir0https://orcid.org/0009-0004-2921-9046Majida Kazmi1https://orcid.org/0000-0002-2767-3139Faculty of Electrical and Computer Engineering, NED University of Engineering and Technology, Karachi, PakistanFaculty of Electrical and Computer Engineering, NED University of Engineering and Technology, Karachi, PakistanDeploying deep learning–based optical character recognition (OCR) systems for low-resource, complex-script languages like Urdu remains a major challenge due to high computational costs, lack of annotated datasets, and limited hardware support for real-time applications. Existing FPGA-based OCR implementations have primarily focused on simplified datasets such as MNIST digits, limiting their generalizability to scripts like Urdu that exhibit extensive intra-class variability, contextual shaping, and diacritics. This study presents a hardware-accelerated Urdu OCR framework using a custom-designed Convolutional Neural Network (CNN) optimized for deployment on the Xilinx Zynq UltraScale+ MPSoC (ZCU104). The proposed CNN is trained on a novel large-scale dataset of 336,000 labeled images spanning 48 Urdu characters across 230 font styles. Compared to MNIST-based FPGA implementations, our approach addresses significantly higher script complexity while achieving a classification accuracy of 96.73% (FP32) and 94.06% (INT8). Hardware-aware quantization and deployment using the Vitis AI toolchain enabled 75% model compression with minimal accuracy loss, achieving real-time inference of 0.189 ms per character and 4,886.95 FPS, while consuming only 1.32 W. Benchmarking against CPU and GPU platforms confirmed substantial improvements in speed and energy efficiency. This work establishes a high-performance, scalable, and energy-efficient FPGA-based OCR framework for Urdu and sets the foundation for extending such solutions to other cursive, low-resource languages like Arabic, Pashto, and Persian.https://ieeexplore.ieee.org/document/11098840/Computer visionconvolution neural networkedge computingFPGAlow powerenergy efficiency |
| spellingShingle | Fauzia Yasir Majida Kazmi Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network IEEE Access Computer vision convolution neural network edge computing FPGA low power energy efficiency |
| title | Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network |
| title_full | Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network |
| title_fullStr | Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network |
| title_full_unstemmed | Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network |
| title_short | Acceleration of Urdu Optical Character Recognition on Zynq UltraScale+ MPSoC Using Deep Convolutional Neural Network |
| title_sort | acceleration of urdu optical character recognition on zynq ultrascale mpsoc using deep convolutional neural network |
| topic | Computer vision convolution neural network edge computing FPGA low power energy efficiency |
| url | https://ieeexplore.ieee.org/document/11098840/ |
| work_keys_str_mv | AT fauziayasir accelerationofurduopticalcharacterrecognitiononzynqultrascalempsocusingdeepconvolutionalneuralnetwork AT majidakazmi accelerationofurduopticalcharacterrecognitiononzynqultrascalempsocusingdeepconvolutionalneuralnetwork |