ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image
Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge m...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
PeerJ Inc.
2024-11-01
|
| Series: | PeerJ Computer Science |
| Subjects: | |
| Online Access: | https://peerj.com/articles/cs-2536.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850219911804092416 |
|---|---|
| author | Duy Ho Vo Hoang Huy Vo Quoc Bui Thanh Hung |
| author_facet | Duy Ho Vo Hoang Huy Vo Quoc Bui Thanh Hung |
| author_sort | Duy Ho Vo Hoang |
| collection | DOAJ |
| description | Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image. |
| format | Article |
| id | doaj-art-cbfb35cdfd9444aa9fab31e7a2892892 |
| institution | OA Journals |
| issn | 2376-5992 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | PeerJ Inc. |
| record_format | Article |
| series | PeerJ Computer Science |
| spelling | doaj-art-cbfb35cdfd9444aa9fab31e7a28928922025-08-20T02:07:13ZengPeerJ Inc.PeerJ Computer Science2376-59922024-11-0110e253610.7717/peerj-cs.2536ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned imageDuy Ho Vo HoangHuy Vo QuocBui Thanh HungExtracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.https://peerj.com/articles/cs-2536.pdfInformation extractionCNNBertGATDeep learningScanned image |
| spellingShingle | Duy Ho Vo Hoang Huy Vo Quoc Bui Thanh Hung ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image PeerJ Computer Science Information extraction CNN Bert GAT Deep learning Scanned image |
| title | ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image |
| title_full | ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image |
| title_fullStr | ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image |
| title_full_unstemmed | ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image |
| title_short | ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image |
| title_sort | conbgat a novel model combining convolutional neural networks transformer and graph attention network for information extraction from scanned image |
| topic | Information extraction CNN Bert GAT Deep learning Scanned image |
| url | https://peerj.com/articles/cs-2536.pdf |
| work_keys_str_mv | AT duyhovohoang conbgatanovelmodelcombiningconvolutionalneuralnetworkstransformerandgraphattentionnetworkforinformationextractionfromscannedimage AT huyvoquoc conbgatanovelmodelcombiningconvolutionalneuralnetworkstransformerandgraphattentionnetworkforinformationextractionfromscannedimage AT buithanhhung conbgatanovelmodelcombiningconvolutionalneuralnetworkstransformerandgraphattentionnetworkforinformationextractionfromscannedimage |