HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
Surface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inade...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/5/1333 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850031195540160512 |
|---|---|
| author | Xiyin Chen Xiaohu Zhang Yonghua Shi Junjie Pang |
| author_facet | Xiyin Chen Xiaohu Zhang Yonghua Shi Junjie Pang |
| author_sort | Xiyin Chen |
| collection | DOAJ |
| description | Surface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inadequate detection accuracy for small-scale defects due to substantial downsampling, inconsistencies between classification scores and localization confidence, and feature resolution loss caused by simple upsampling and downsampling strategies. To address these challenges, we propose the HCT-Det model, which incorporates a window-based self-attention residual (WSA-R) block structure. This structure combines window-based self-attention (WSA) blocks to reduce computational overhead and parallel residual convolutional (Res) blocks to enhance local feature continuity. The model’s backbone generates three cross-scale features as encoder inputs, which undergo Intra-Scale Feature Interaction (ISFI) and Cross-Scale Feature Interaction (CSFI) to improve detection accuracy for targets of various sizes. A Soft IoU-Aware mechanism ensures alignment between classification scores and intersection-over-union (IoU) metrics during training. Additionally, Hybrid Downsampling (HDownsample) and Hybrid Upsampling (HUpsample) modules minimize feature degradation. Our experiments demonstrate that HCT-Det achieved a mean average precision (mAP@0.5) of 0.795 on the NEU-DET dataset and 0.733 on the GC10-DET dataset, outperforming other state-of-the-art approaches. These results highlight the model’s effectiveness in improving computational efficiency and detection accuracy for steel surface defect detection. |
| format | Article |
| id | doaj-art-7a1f33e8562a4efbaebf8334bcaa4fe0 |
| institution | DOAJ |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-7a1f33e8562a4efbaebf8334bcaa4fe02025-08-20T02:59:01ZengMDPI AGSensors1424-82202025-02-01255133310.3390/s25051333HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer FeaturesXiyin Chen0Xiaohu Zhang1Yonghua Shi2Junjie Pang3School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, ChinaHymson Laser Technology Group Co., Ltd., Shenzhen 518110, ChinaSchool of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, ChinaSchool of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, ChinaSurface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inadequate detection accuracy for small-scale defects due to substantial downsampling, inconsistencies between classification scores and localization confidence, and feature resolution loss caused by simple upsampling and downsampling strategies. To address these challenges, we propose the HCT-Det model, which incorporates a window-based self-attention residual (WSA-R) block structure. This structure combines window-based self-attention (WSA) blocks to reduce computational overhead and parallel residual convolutional (Res) blocks to enhance local feature continuity. The model’s backbone generates three cross-scale features as encoder inputs, which undergo Intra-Scale Feature Interaction (ISFI) and Cross-Scale Feature Interaction (CSFI) to improve detection accuracy for targets of various sizes. A Soft IoU-Aware mechanism ensures alignment between classification scores and intersection-over-union (IoU) metrics during training. Additionally, Hybrid Downsampling (HDownsample) and Hybrid Upsampling (HUpsample) modules minimize feature degradation. Our experiments demonstrate that HCT-Det achieved a mean average precision (mAP@0.5) of 0.795 on the NEU-DET dataset and 0.733 on the GC10-DET dataset, outperforming other state-of-the-art approaches. These results highlight the model’s effectiveness in improving computational efficiency and detection accuracy for steel surface defect detection.https://www.mdpi.com/1424-8220/25/5/1333transformerconvolutional neural networkdefect detection |
| spellingShingle | Xiyin Chen Xiaohu Zhang Yonghua Shi Junjie Pang HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features Sensors transformer convolutional neural network defect detection |
| title | HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features |
| title_full | HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features |
| title_fullStr | HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features |
| title_full_unstemmed | HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features |
| title_short | HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features |
| title_sort | hct det a high accuracy end to end model for steel defect detection based on hierarchical cnn transformer features |
| topic | transformer convolutional neural network defect detection |
| url | https://www.mdpi.com/1424-8220/25/5/1333 |
| work_keys_str_mv | AT xiyinchen hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures AT xiaohuzhang hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures AT yonghuashi hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures AT junjiepang hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures |