ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image

Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge m...

Full description

Saved in:
Bibliographic Details
Main Authors: Duy Ho Vo Hoang, Huy Vo Quoc, Bui Thanh Hung
Format: Article
Language:English
Published: PeerJ Inc. 2024-11-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2536.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850219911804092416
author Duy Ho Vo Hoang
Huy Vo Quoc
Bui Thanh Hung
author_facet Duy Ho Vo Hoang
Huy Vo Quoc
Bui Thanh Hung
author_sort Duy Ho Vo Hoang
collection DOAJ
description Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.
format Article
id doaj-art-cbfb35cdfd9444aa9fab31e7a2892892
institution OA Journals
issn 2376-5992
language English
publishDate 2024-11-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-cbfb35cdfd9444aa9fab31e7a28928922025-08-20T02:07:13ZengPeerJ Inc.PeerJ Computer Science2376-59922024-11-0110e253610.7717/peerj-cs.2536ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned imageDuy Ho Vo HoangHuy Vo QuocBui Thanh HungExtracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.https://peerj.com/articles/cs-2536.pdfInformation extractionCNNBertGATDeep learningScanned image
spellingShingle Duy Ho Vo Hoang
Huy Vo Quoc
Bui Thanh Hung
ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
PeerJ Computer Science
Information extraction
CNN
Bert
GAT
Deep learning
Scanned image
title ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
title_full ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
title_fullStr ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
title_full_unstemmed ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
title_short ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
title_sort conbgat a novel model combining convolutional neural networks transformer and graph attention network for information extraction from scanned image
topic Information extraction
CNN
Bert
GAT
Deep learning
Scanned image
url https://peerj.com/articles/cs-2536.pdf
work_keys_str_mv AT duyhovohoang conbgatanovelmodelcombiningconvolutionalneuralnetworkstransformerandgraphattentionnetworkforinformationextractionfromscannedimage
AT huyvoquoc conbgatanovelmodelcombiningconvolutionalneuralnetworkstransformerandgraphattentionnetworkforinformationextractionfromscannedimage
AT buithanhhung conbgatanovelmodelcombiningconvolutionalneuralnetworkstransformerandgraphattentionnetworkforinformationextractionfromscannedimage