Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images

The emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can...

Full description

Saved in:
Bibliographic Details
Main Authors: Junyoung Park, Wonjun Kang, Seonji Park, Keuntek Lee, Hyung Il Koo, Nam Ik Cho
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11007558/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849761232974774272
author Junyoung Park
Wonjun Kang
Seonji Park
Keuntek Lee
Hyung Il Koo
Nam Ik Cho
author_facet Junyoung Park
Wonjun Kang
Seonji Park
Keuntek Lee
Hyung Il Koo
Nam Ik Cho
author_sort Junyoung Park
collection DOAJ
description The emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can extract textual contents from document images for LLM applications. However, most existing methods have primarily focused on scene text or well-structured document images, and typically limit text detection and recognition to the word level. In this paper, we propose a novel OCR framework capable of detecting and recognizing text at both the text-line and text-block levels. Specifically, we design a new deep neural network (DNN) to replace the Connected Component (CC) extraction and state estimation processes used in conventional methods. Despite being trained solely on synthetic datasets, the proposed OCR system performs robust text detection and layout analysis. Furthermore, we propose a recognition metric to evaluate content preservation in OCR systems and introduce a new OCR benchmark consisting of camera-captured document images. Our method demonstrates superior performance on this benchmark, outperforming existing OCR APIs.
format Article
id doaj-art-0db1302502a948a3b70503dc02ac606a
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-0db1302502a948a3b70503dc02ac606a2025-08-20T03:06:05ZengIEEEIEEE Access2169-35362025-01-0113912639127510.1109/ACCESS.2025.357200111007558Development of OCR Service for Page-Level Recognition for Camera-Captured Document ImagesJunyoung Park0Wonjun Kang1https://orcid.org/0009-0009-6198-7906Seonji Park2Keuntek Lee3https://orcid.org/0000-0003-4901-7842Hyung Il Koo4https://orcid.org/0000-0002-6955-8083Nam Ik Cho5https://orcid.org/0000-0001-5297-4649Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaFuriosaAI, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaThe emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can extract textual contents from document images for LLM applications. However, most existing methods have primarily focused on scene text or well-structured document images, and typically limit text detection and recognition to the word level. In this paper, we propose a novel OCR framework capable of detecting and recognizing text at both the text-line and text-block levels. Specifically, we design a new deep neural network (DNN) to replace the Connected Component (CC) extraction and state estimation processes used in conventional methods. Despite being trained solely on synthetic datasets, the proposed OCR system performs robust text detection and layout analysis. Furthermore, we propose a recognition metric to evaluate content preservation in OCR systems and introduce a new OCR benchmark consisting of camera-captured document images. Our method demonstrates superior performance on this benchmark, outperforming existing OCR APIs.https://ieeexplore.ieee.org/document/11007558/Document image processinglayout analysisoptical character recognitionscene text detection
spellingShingle Junyoung Park
Wonjun Kang
Seonji Park
Keuntek Lee
Hyung Il Koo
Nam Ik Cho
Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
IEEE Access
Document image processing
layout analysis
optical character recognition
scene text detection
title Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
title_full Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
title_fullStr Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
title_full_unstemmed Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
title_short Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
title_sort development of ocr service for page level recognition for camera captured document images
topic Document image processing
layout analysis
optical character recognition
scene text detection
url https://ieeexplore.ieee.org/document/11007558/
work_keys_str_mv AT junyoungpark developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages
AT wonjunkang developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages
AT seonjipark developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages
AT keunteklee developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages
AT hyungilkoo developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages
AT namikcho developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages