LMGDoc: Light Multigranular GNN for Efficient Document Understanding
Document understanding is a critical task in extracting structured information from documents such as forms, receipts, and reports. Visually Rich Documents (VRDs) present unique challenges due to their complex layouts, heterogeneous content (text, tables, images), and diverse structural patterns. Wh...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10994464/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Document understanding is a critical task in extracting structured information from documents such as forms, receipts, and reports. Visually Rich Documents (VRDs) present unique challenges due to their complex layouts, heterogeneous content (text, tables, images), and diverse structural patterns. While recent transformer-based models have achieved strong performance, they typically rely on large-scale pretraining and have high computational demands, limiting their usability in real-time or resource-constrained settings. In this paper, we propose LMGDoc, a lightweight model capable of understanding and processing VRDs composed of three main components: a Feature Extractor using GloVe for text embedding and a novel layout encoding method, a Graph Constructor for representing document structures at multiple granular levels (word, region, and page), and a Graph Convolution Network (GNN). LMGDoc achieves competitive results on classification and Key Information Extraction (KIE) benchmarks with only 0.4M parameter size compared to state-of-the-art models. |
|---|---|
| ISSN: | 2169-3536 |