Visual Complexity in Korean Documents: Toward Language-Specific Datasets for Deep Learning-Based Forgery Detection
Recent advancements in information and communication technology have driven various organizations, including businesses, government agencies, and institutions, to digitize and manage critical documents. Document digitization mitigates spatial constraints on storage and offers significant advantages...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/8/4319 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Recent advancements in information and communication technology have driven various organizations, including businesses, government agencies, and institutions, to digitize and manage critical documents. Document digitization mitigates spatial constraints on storage and offers significant advantages in transmission and management. However, while digitization offers many benefits, the development of image processing software has also increased the risk of forgery and manipulation of digital documents. Digital documents, ranging from everyday documents to those handled by major institutions, can become targets of forgery, and the unrestricted distribution of such documents may cause social disruption. As a result, research on digital document forgery detection has been actively conducted in various countries, with recent studies focusing on improving detection accuracy using deep learning techniques. However, most of the document image datasets generated for the development of deep learning models are English-based documents. Consequently, forgery detection models trained on these English-based datasets may perform well on English documents but may not achieve the same level of accuracy when applied to documents in other languages. This study systematically examines the necessity of language-specific datasets by analyzing the impact of visual complexity on forgery detection accuracy. Specifically, this study analyzes differences in forgery characteristics between English and Korean documents as representative cases and evaluates the classification performance of a forgery detection model trained on an English dataset when applied to both English and Korean documents. The experimental results indicate that forged document images exhibit distinct visual alterations depending on the language. Furthermore, the detection performance of models trained on English-based datasets varies according to the language of the training and test data. These findings underscore the necessity of developing datasets and model architectures tailored to the linguistic and structural characteristics of each language to enhance forgery detection efficacy. Additionally, the results highlight the importance of multilingual datasets in deep learning-based forgery detection, providing a foundation for the advancement of language-specific detection models. |
|---|---|
| ISSN: | 2076-3417 |