LMGDoc: Light Multigranular GNN for Efficient Document Understanding

Document understanding is a critical task in extracting structured information from documents such as forms, receipts, and reports. Visually Rich Documents (VRDs) present unique challenges due to their complex layouts, heterogeneous content (text, tables, images), and diverse structural patterns. Wh...

Full description

Saved in:
Bibliographic Details
Main Authors: Abdellatif Sassioui, Yasser Elouargui, Mohamed El Kamili, Rachid Benouini, El Mehdi Benyoussef, Meriyem Chergui, Mohammed Ouzzif
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10994464/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Document understanding is a critical task in extracting structured information from documents such as forms, receipts, and reports. Visually Rich Documents (VRDs) present unique challenges due to their complex layouts, heterogeneous content (text, tables, images), and diverse structural patterns. While recent transformer-based models have achieved strong performance, they typically rely on large-scale pretraining and have high computational demands, limiting their usability in real-time or resource-constrained settings. In this paper, we propose LMGDoc, a lightweight model capable of understanding and processing VRDs composed of three main components: a Feature Extractor using GloVe for text embedding and a novel layout encoding method, a Graph Constructor for representing document structures at multiple granular levels (word, region, and page), and a Graph Convolution Network (GNN). LMGDoc achieves competitive results on classification and Key Information Extraction (KIE) benchmarks with only 0.4M parameter size compared to state-of-the-art models.
ISSN:2169-3536