HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images

Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Tra...

Full description

Saved in:
Bibliographic Details
Main Authors: Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Bilel Yagoub, Mostafa Farouk Senussi, Mahmoud Abdalla, Hyun-Soo Kang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/2/266
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588097049591808
author Mahmoud SalahEldin Kasem
Mohamed Mahmoud
Bilel Yagoub
Mostafa Farouk Senussi
Mahmoud Abdalla
Hyun-Soo Kang
author_facet Mahmoud SalahEldin Kasem
Mohamed Mahmoud
Bilel Yagoub
Mostafa Farouk Senussi
Mahmoud Abdalla
Hyun-Soo Kang
author_sort Mahmoud SalahEldin Kasem
collection DOAJ
description Table detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks.
format Article
id doaj-art-e73292d4f5fb401c941a408cc8f0cc9d
institution Kabale University
issn 2227-7390
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-e73292d4f5fb401c941a408cc8f0cc9d2025-01-24T13:39:57ZengMDPI AGMathematics2227-73902025-01-0113226610.3390/math13020266HTTD: A Hierarchical Transformer for Accurate Table Detection in Document ImagesMahmoud SalahEldin Kasem0Mohamed Mahmoud1Bilel Yagoub2Mostafa Farouk Senussi3Mahmoud Abdalla4Hyun-Soo Kang5Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Republic of KoreaTable detection in document images is a challenging problem due to diverse layouts, irregular structures, and embedded graphical elements. In this study, we present HTTD (Hierarchical Transformer for Table Detection), a cutting-edge model that combines a Swin-L Transformer backbone with advanced Transformer-based mechanisms to achieve superior performance. HTTD addresses three key challenges: handling diverse document layouts, including historical and modern structures; improving computational efficiency and training convergence; and demonstrating adaptability to non-standard tasks like medical imaging and receipt key detection. Evaluated on benchmark datasets, HTTD achieves state-of-the-art results, with precision rates of 96.98% on ICDAR-2019 cTDaR, 96.43% on TNCR, and 93.14% on TabRecSet. These results validate its effectiveness and efficiency, paving the way for advanced document analysis and data digitization tasks.https://www.mdpi.com/2227-7390/13/2/266table detectionvision transformerdocument processingmultiscale feature extractiondeformable attentiondocument image analysis
spellingShingle Mahmoud SalahEldin Kasem
Mohamed Mahmoud
Bilel Yagoub
Mostafa Farouk Senussi
Mahmoud Abdalla
Hyun-Soo Kang
HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
Mathematics
table detection
vision transformer
document processing
multiscale feature extraction
deformable attention
document image analysis
title HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
title_full HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
title_fullStr HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
title_full_unstemmed HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
title_short HTTD: A Hierarchical Transformer for Accurate Table Detection in Document Images
title_sort httd a hierarchical transformer for accurate table detection in document images
topic table detection
vision transformer
document processing
multiscale feature extraction
deformable attention
document image analysis
url https://www.mdpi.com/2227-7390/13/2/266
work_keys_str_mv AT mahmoudsalaheldinkasem httdahierarchicaltransformerforaccuratetabledetectionindocumentimages
AT mohamedmahmoud httdahierarchicaltransformerforaccuratetabledetectionindocumentimages
AT bilelyagoub httdahierarchicaltransformerforaccuratetabledetectionindocumentimages
AT mostafafarouksenussi httdahierarchicaltransformerforaccuratetabledetectionindocumentimages
AT mahmoudabdalla httdahierarchicaltransformerforaccuratetabledetectionindocumentimages
AT hyunsookang httdahierarchicaltransformerforaccuratetabledetectionindocumentimages