Direction-Aware Lightweight Framework for Traditional Mongolian Document Layout Analysis

Traditional Mongolian document layout analysis faces unique challenges due to its vertical writing system and complex structural arrangements. Existing methods often struggle with the directional nature of traditional Mongolian text and require substantial computational resources. In this paper, we...

Full description

Saved in:
Bibliographic Details
Main Authors: Chenyang Zhou, Monghjaya Ha, Licheng Wu
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/8/4594
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional Mongolian document layout analysis faces unique challenges due to its vertical writing system and complex structural arrangements. Existing methods often struggle with the directional nature of traditional Mongolian text and require substantial computational resources. In this paper, we propose a direction-aware lightweight framework that effectively addresses these challenges. Our framework introduces three key innovations: a modified MobileNetV3 backbone with asymmetric convolutions for efficient vertical feature extraction, a dynamic feature enhancement module with channel attention for adaptive multi-scale information fusion, and a direction-aware detection head with <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mo form="prefix">sin</mo><mi>θ</mi><mo>,</mo><mo form="prefix">cos</mo><mi>θ</mi><mo>)</mo></mrow></semantics></math></inline-formula> vector representation for accurate orientation modeling. We evaluate our method on TMDLAD, a newly constructed traditional Mongolian document layout analysis dataset, comparing it with both heavy ResNet-50-based models and lightweight alternatives. The experimental results demonstrate that our approach achieves state-of-the-art performance, with 0.715 mAP and 92.3% direction accuracy with a mean absolute error of only 2.5°, while maintaining high efficiency at 28.6 FPS using only 8.3 M parameters. Our model outperforms the best ResNet-50-based model by 3.6% in mAP and the best lightweight model by 4.3% in mAP, while uniquely providing direction prediction capability that other lightweight models lack. The proposed framework significantly outperforms existing methods in both accuracy and efficiency, providing a practical solution for traditional Mongolian document layout analysis that can be extended to other vertical writing systems.
ISSN:2076-3417