Enhancing Arabic handwritten word recognition: a CNN-BiLSTM-CTC architecture with attention mechanism and adaptive augmentation
Abstract Optical character recognition (OCR) for Arabic presents unique challenges due to the script's cursive nature, contextual letter forms, multiple ligatures, the presence of diacritics, and the high variability in handwritten styles. This work introduces an enhanced Arabic handwritten wor...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Discover Applied Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s42452-025-06952-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Optical character recognition (OCR) for Arabic presents unique challenges due to the script's cursive nature, contextual letter forms, multiple ligatures, the presence of diacritics, and the high variability in handwritten styles. This work introduces an enhanced Arabic handwritten word recognition architecture that integrates the attention mechanism (AM) into an end-to-end framework combining convolutional neural networks (CNN), Bidirectional long short-term memory (BiLSTM), and connectionist temporal classification (CTC), while utilizing word beam search (WBS) for decoding. To address the issue of imbalanced data in the IFN/ENIT dataset, an adaptive data augmentation algorithm is proposed, focusing on underrepresented characters and words. Extensive experiments conducted across multiple train-test configurations compare the model's performance with and without attention, employing the original dataset, standard augmentation, and the proposed adaptive augmentation method. Results demonstrate that incorporating attention significantly enhances the character accuracy rate (CAR) and word accuracy rate (WAR), with further improvements observed when using adaptive augmentation. The proposed system achieves superior performance compared to previous state-of-the-art methods, particularly in handling challenging configurations, showcasing its potential for robust and scalable Arabic handwriting recognition. |
|---|---|
| ISSN: | 3004-9261 |