A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN

Abstract Optical Character Recognition (OCR) plays a vital role in automating data entry from handwritten forms into digital systems. However, a significant gap exists in the research on OCR techniques tailored for handwritten texts in complex languages such as Bangla. Challenges in Bangla script ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Aye T. Maung, Sumaiya Salekin, Mohammad A. Haque
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00251-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849685982205444096
author Aye T. Maung
Sumaiya Salekin
Mohammad A. Haque
author_facet Aye T. Maung
Sumaiya Salekin
Mohammad A. Haque
author_sort Aye T. Maung
collection DOAJ
description Abstract Optical Character Recognition (OCR) plays a vital role in automating data entry from handwritten forms into digital systems. However, a significant gap exists in the research on OCR techniques tailored for handwritten texts in complex languages such as Bangla. Challenges in Bangla script arise from the presence of modifiers, compound characters, and diacritic marks, making accurate recognition difficult. Our research introduces a scalable and effective OCR pipeline for Bangla handwritten documents that addresses these complexities. The proposed pipeline leverages the YOLO (You Only Look Once) model for character detection, accurately isolating base alphabets, consonant conjuncts, and characters with modifiers (matras). For character recognition, the pipeline utilizes the EfficientNet-B4 model, which demonstrated a recognition accuracy of 93.87% for grapheme roots, 98.22% for vowel diacritics, and 98.0% for consonant diacritics on publicly available datasets, combined and adapted for our use. Additionally, the system’s resilience was enhanced using a Word2Vec-based spelling correction layer, reducing the Character Error Rate (CER) from 10.37% to 2.47%. Comparative evaluations on in-house data show that the proposed pipeline with spelling correction achieves the highest precision (0.9701) and lowest CER (0.0247), outperforming the Google Cloud Vision API’s OCR. In contrast, the Vision API has the highest CER (0.1389) and lower precision (0.8220), highlighting the effectiveness of the proposed approach for Bangla OCR.
format Article
id doaj-art-cbc52f1873054901ab8ecabba8a24a6e
institution DOAJ
issn 2731-0809
language English
publishDate 2025-06-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-cbc52f1873054901ab8ecabba8a24a6e2025-08-20T03:22:53ZengSpringerDiscover Artificial Intelligence2731-08092025-06-015112610.1007/s44163-025-00251-7A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNNAye T. Maung0Sumaiya Salekin1Mohammad A. Haque2Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and TechnologyDepartment of Electrical and Electronic Engineering, Bangladesh University of Engineering and TechnologyDepartment of Electrical and Electronic Engineering, Bangladesh University of Engineering and TechnologyAbstract Optical Character Recognition (OCR) plays a vital role in automating data entry from handwritten forms into digital systems. However, a significant gap exists in the research on OCR techniques tailored for handwritten texts in complex languages such as Bangla. Challenges in Bangla script arise from the presence of modifiers, compound characters, and diacritic marks, making accurate recognition difficult. Our research introduces a scalable and effective OCR pipeline for Bangla handwritten documents that addresses these complexities. The proposed pipeline leverages the YOLO (You Only Look Once) model for character detection, accurately isolating base alphabets, consonant conjuncts, and characters with modifiers (matras). For character recognition, the pipeline utilizes the EfficientNet-B4 model, which demonstrated a recognition accuracy of 93.87% for grapheme roots, 98.22% for vowel diacritics, and 98.0% for consonant diacritics on publicly available datasets, combined and adapted for our use. Additionally, the system’s resilience was enhanced using a Word2Vec-based spelling correction layer, reducing the Character Error Rate (CER) from 10.37% to 2.47%. Comparative evaluations on in-house data show that the proposed pipeline with spelling correction achieves the highest precision (0.9701) and lowest CER (0.0247), outperforming the Google Cloud Vision API’s OCR. In contrast, the Vision API has the highest CER (0.1389) and lower precision (0.8220), highlighting the effectiveness of the proposed approach for Bangla OCR.https://doi.org/10.1007/s44163-025-00251-7Bangla OCRCharacter detectionCharacter recognitionWord recognitionYOLOEfficientNet
spellingShingle Aye T. Maung
Sumaiya Salekin
Mohammad A. Haque
A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
Discover Artificial Intelligence
Bangla OCR
Character detection
Character recognition
Word recognition
YOLO
EfficientNet
title A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
title_full A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
title_fullStr A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
title_full_unstemmed A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
title_short A hybrid approach to Bangla handwritten OCR: combining YOLO and an advanced CNN
title_sort hybrid approach to bangla handwritten ocr combining yolo and an advanced cnn
topic Bangla OCR
Character detection
Character recognition
Word recognition
YOLO
EfficientNet
url https://doi.org/10.1007/s44163-025-00251-7
work_keys_str_mv AT ayetmaung ahybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn
AT sumaiyasalekin ahybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn
AT mohammadahaque ahybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn
AT ayetmaung hybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn
AT sumaiyasalekin hybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn
AT mohammadahaque hybridapproachtobanglahandwrittenocrcombiningyoloandanadvancedcnn