A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
Abstract Optical Character Recognition (OCR) plays a vital role in automating data entry from handwritten forms into digital systems. However, a significant gap exists in the research on OCR techniques tailored for handwritten texts in complex languages such as Bangla. Challenges in Bangla script ar...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Discover Artificial Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44163-025-00251-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849685982205444096 |
|---|---|
| author | Aye T. Maung Sumaiya Salekin Mohammad A. Haque |
| author_facet | Aye T. Maung Sumaiya Salekin Mohammad A. Haque |
| author_sort | Aye T. Maung |
| collection | DOAJ |
| description | Abstract Optical Character Recognition (OCR) plays a vital role in automating data entry from handwritten forms into digital systems. However, a significant gap exists in the research on OCR techniques tailored for handwritten texts in complex languages such as Bangla. Challenges in Bangla script arise from the presence of modifiers, compound characters, and diacritic marks, making accurate recognition difficult. Our research introduces a scalable and effective OCR pipeline for Bangla handwritten documents that addresses these complexities. The proposed pipeline leverages the YOLO (You Only Look Once) model for character detection, accurately isolating base alphabets, consonant conjuncts, and characters with modifiers (matras). For character recognition, the pipeline utilizes the EfficientNet-B4 model, which demonstrated a recognition accuracy of 93.87% for grapheme roots, 98.22% for vowel diacritics, and 98.0% for consonant diacritics on publicly available datasets, combined and adapted for our use. Additionally, the system’s resilience was enhanced using a Word2Vec-based spelling correction layer, reducing the Character Error Rate (CER) from 10.37% to 2.47%. Comparative evaluations on in-house data show that the proposed pipeline with spelling correction achieves the highest precision (0.9701) and lowest CER (0.0247), outperforming the Google Cloud Vision API’s OCR. In contrast, the Vision API has the highest CER (0.1389) and lower precision (0.8220), highlighting the effectiveness of the proposed approach for Bangla OCR. |
| format | Article |
| id | doaj-art-cbc52f1873054901ab8ecabba8a24a6e |
| institution | DOAJ |
| issn | 2731-0809 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Artificial Intelligence |
| spelling | doaj-art-cbc52f1873054901ab8ecabba8a24a6e2025-08-20T03:22:53ZengSpringerDiscover Artificial Intelligence2731-08092025-06-015112610.1007/s44163-025-00251-7A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNNAye T. Maung0Sumaiya Salekin1Mohammad A. Haque2Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and TechnologyDepartment of Electrical and Electronic Engineering, Bangladesh University of Engineering and TechnologyDepartment of Electrical and Electronic Engineering, Bangladesh University of Engineering and TechnologyAbstract Optical Character Recognition (OCR) plays a vital role in automating data entry from handwritten forms into digital systems. However, a significant gap exists in the research on OCR techniques tailored for handwritten texts in complex languages such as Bangla. Challenges in Bangla script arise from the presence of modifiers, compound characters, and diacritic marks, making accurate recognition difficult. Our research introduces a scalable and effective OCR pipeline for Bangla handwritten documents that addresses these complexities. The proposed pipeline leverages the YOLO (You Only Look Once) model for character detection, accurately isolating base alphabets, consonant conjuncts, and characters with modifiers (matras). For character recognition, the pipeline utilizes the EfficientNet-B4 model, which demonstrated a recognition accuracy of 93.87% for grapheme roots, 98.22% for vowel diacritics, and 98.0% for consonant diacritics on publicly available datasets, combined and adapted for our use. Additionally, the system’s resilience was enhanced using a Word2Vec-based spelling correction layer, reducing the Character Error Rate (CER) from 10.37% to 2.47%. Comparative evaluations on in-house data show that the proposed pipeline with spelling correction achieves the highest precision (0.9701) and lowest CER (0.0247), outperforming the Google Cloud Vision API’s OCR. In contrast, the Vision API has the highest CER (0.1389) and lower precision (0.8220), highlighting the effectiveness of the proposed approach for Bangla OCR.https://doi.org/10.1007/s44163-025-00251-7Bangla OCRCharacter detectionCharacter recognitionWord recognitionYOLOEfficientNet |
| spellingShingle | Aye T. Maung Sumaiya Salekin Mohammad A. Haque A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN Discover Artificial Intelligence Bangla OCR Character detection Character recognition Word recognition YOLO EfficientNet |
| title | A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN |
| title_full | A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN |
| title_fullStr | A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN |
| title_full_unstemmed | A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN |
| title_short | A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN |
| title_sort | hybrid approach to bangla handwritten ocr combining yolo and an advanced cnn |
| topic | Bangla OCR Character detection Character recognition Word recognition YOLO EfficientNet |
| url | https://doi.org/10.1007/s44163-025-00251-7 |
| work_keys_str_mv | AT ayetmaung ahybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn AT sumaiyasalekin ahybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn AT mohammadahaque ahybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn AT ayetmaung hybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn AT sumaiyasalekin hybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn AT mohammadahaque hybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn |