Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images
The emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11007558/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849761232974774272 |
|---|---|
| author | Junyoung Park Wonjun Kang Seonji Park Keuntek Lee Hyung Il Koo Nam Ik Cho |
| author_facet | Junyoung Park Wonjun Kang Seonji Park Keuntek Lee Hyung Il Koo Nam Ik Cho |
| author_sort | Junyoung Park |
| collection | DOAJ |
| description | The emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can extract textual contents from document images for LLM applications. However, most existing methods have primarily focused on scene text or well-structured document images, and typically limit text detection and recognition to the word level. In this paper, we propose a novel OCR framework capable of detecting and recognizing text at both the text-line and text-block levels. Specifically, we design a new deep neural network (DNN) to replace the Connected Component (CC) extraction and state estimation processes used in conventional methods. Despite being trained solely on synthetic datasets, the proposed OCR system performs robust text detection and layout analysis. Furthermore, we propose a recognition metric to evaluate content preservation in OCR systems and introduce a new OCR benchmark consisting of camera-captured document images. Our method demonstrates superior performance on this benchmark, outperforming existing OCR APIs. |
| format | Article |
| id | doaj-art-0db1302502a948a3b70503dc02ac606a |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-0db1302502a948a3b70503dc02ac606a2025-08-20T03:06:05ZengIEEEIEEE Access2169-35362025-01-0113912639127510.1109/ACCESS.2025.357200111007558Development of OCR Service for Page-Level Recognition for Camera-Captured Document ImagesJunyoung Park0Wonjun Kang1https://orcid.org/0009-0009-6198-7906Seonji Park2Keuntek Lee3https://orcid.org/0000-0003-4901-7842Hyung Il Koo4https://orcid.org/0000-0002-6955-8083Nam Ik Cho5https://orcid.org/0000-0001-5297-4649Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaFuriosaAI, Seoul, South KoreaDepartment of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South KoreaThe emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can extract textual contents from document images for LLM applications. However, most existing methods have primarily focused on scene text or well-structured document images, and typically limit text detection and recognition to the word level. In this paper, we propose a novel OCR framework capable of detecting and recognizing text at both the text-line and text-block levels. Specifically, we design a new deep neural network (DNN) to replace the Connected Component (CC) extraction and state estimation processes used in conventional methods. Despite being trained solely on synthetic datasets, the proposed OCR system performs robust text detection and layout analysis. Furthermore, we propose a recognition metric to evaluate content preservation in OCR systems and introduce a new OCR benchmark consisting of camera-captured document images. Our method demonstrates superior performance on this benchmark, outperforming existing OCR APIs.https://ieeexplore.ieee.org/document/11007558/Document image processinglayout analysisoptical character recognitionscene text detection |
| spellingShingle | Junyoung Park Wonjun Kang Seonji Park Keuntek Lee Hyung Il Koo Nam Ik Cho Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images IEEE Access Document image processing layout analysis optical character recognition scene text detection |
| title | Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images |
| title_full | Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images |
| title_fullStr | Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images |
| title_full_unstemmed | Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images |
| title_short | Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images |
| title_sort | development of ocr service for page level recognition for camera captured document images |
| topic | Document image processing layout analysis optical character recognition scene text detection |
| url | https://ieeexplore.ieee.org/document/11007558/ |
| work_keys_str_mv | AT junyoungpark developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages AT wonjunkang developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages AT seonjipark developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages AT keunteklee developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages AT hyungilkoo developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages AT namikcho developmentofocrserviceforpagelevelrecognitionforcameracaptureddocumentimages |