ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image

Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge m...

Full description

Saved in:
Bibliographic Details
Main Authors: Duy Ho Vo Hoang, Huy Vo Quoc, Bui Thanh Hung
Format: Article
Language:English
Published: PeerJ Inc. 2024-11-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2536.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.
ISSN:2376-5992