HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Tra...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/13/2/266 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588097049591808 |
---|---|
author | Mahmoud SalahEldin Kasem Mohamed Mahmoud Bilel Yagoub Mostafa Farouk Senussi Mahmoud Abdalla Hyun-Soo Kang |
author_facet | Mahmoud SalahEldin Kasem Mohamed Mahmoud Bilel Yagoub Mostafa Farouk Senussi Mahmoud Abdalla Hyun-Soo Kang |
author_sort | Mahmoud SalahEldin Kasem |
collection | DOAJ |
description | Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks. |
format | Article |
id | doaj-art-e73292d4f5fb401c941a408cc8f0cc9d |
institution | Kabale University |
issn | 2227-7390 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj-art-e73292d4f5fb401c941a408cc8f0cc9d2025-01-24T13:39:57ZengMDPI AGMathematics2227-73902025-01-0113226610.3390/math13020266HTTD: A Hierarchical Transformer for Accurate Table Detection in Document ImagesMahmoud SalahEldin Kasem0Mohamed Mahmoud1Bilel Yagoub2Mostafa Farouk Senussi3Mahmoud Abdalla4Hyun-Soo Kang5Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaTable detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks.https://www.mdpi.com/2227-7390/13/2/266table detectionvision transformerdocument processingmultiscale feature extractiondeformable attentiondocument image analysis |
spellingShingle | Mahmoud SalahEldin Kasem Mohamed Mahmoud Bilel Yagoub Mostafa Farouk Senussi Mahmoud Abdalla Hyun-Soo Kang HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images Mathematics table detection vision transformer document processing multiscale feature extraction deformable attention document image analysis |
title | HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images |
title_full | HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images |
title_fullStr | HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images |
title_full_unstemmed | HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images |
title_short | HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images |
title_sort | httd a hierarchical transformer for accurate table detection in document images |
topic | table detection vision transformer document processing multiscale feature extraction deformable attention document image analysis |
url | https://www.mdpi.com/2227-7390/13/2/266 |
work_keys_str_mv | AT mahmoudsalaheldinkasem httdahierarchicaltransformerforaccuratetabledetectionindocumentimages AT mohamedmahmoud httdahierarchicaltransformerforaccuratetabledetectionindocumentimages AT bilelyagoub httdahierarchicaltransformerforaccuratetabledetectionindocumentimages AT mostafafarouksenussi httdahierarchicaltransformerforaccuratetabledetectionindocumentimages AT mahmoudabdalla httdahierarchicaltransformerforaccuratetabledetectionindocumentimages AT hyunsookang httdahierarchicaltransformerforaccuratetabledetectionindocumentimages |