HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features

Surface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inade...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiyin Chen, Xiaohu Zhang, Yonghua Shi, Junjie Pang
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/5/1333
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850031195540160512
author Xiyin Chen
Xiaohu Zhang
Yonghua Shi
Junjie Pang
author_facet Xiyin Chen
Xiaohu Zhang
Yonghua Shi
Junjie Pang
author_sort Xiyin Chen
collection DOAJ
description Surface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inadequate detection accuracy for small-scale defects due to substantial downsampling, inconsistencies between classification scores and localization confidence, and feature resolution loss caused by simple upsampling and downsampling strategies. To address these challenges, we propose the HCT-Det model, which incorporates a window-based self-attention residual (WSA-R) block structure. This structure combines window-based self-attention (WSA) blocks to reduce computational overhead and parallel residual convolutional (Res) blocks to enhance local feature continuity. The model’s backbone generates three cross-scale features as encoder inputs, which undergo Intra-Scale Feature Interaction (ISFI) and Cross-Scale Feature Interaction (CSFI) to improve detection accuracy for targets of various sizes. A Soft IoU-Aware mechanism ensures alignment between classification scores and intersection-over-union (IoU) metrics during training. Additionally, Hybrid Downsampling (HDownsample) and Hybrid Upsampling (HUpsample) modules minimize feature degradation. Our experiments demonstrate that HCT-Det achieved a mean average precision (mAP@0.5) of 0.795 on the NEU-DET dataset and 0.733 on the GC10-DET dataset, outperforming other state-of-the-art approaches. These results highlight the model’s effectiveness in improving computational efficiency and detection accuracy for steel surface defect detection.
format Article
id doaj-art-7a1f33e8562a4efbaebf8334bcaa4fe0
institution DOAJ
issn 1424-8220
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-7a1f33e8562a4efbaebf8334bcaa4fe02025-08-20T02:59:01ZengMDPI AGSensors1424-82202025-02-01255133310.3390/s25051333HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer FeaturesXiyin Chen0Xiaohu Zhang1Yonghua Shi2Junjie Pang3School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, ChinaHymson Laser Technology Group Co., Ltd., Shenzhen 518110, ChinaSchool of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, ChinaSchool of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou 510640, ChinaSurface defect detection is essential for ensuring the quality and safety of steel products. While Transformer-based methods have achieved state-of-the-art performance, they face several limitations, including high computational costs due to the quadratic complexity of the attention mechanism, inadequate detection accuracy for small-scale defects due to substantial downsampling, inconsistencies between classification scores and localization confidence, and feature resolution loss caused by simple upsampling and downsampling strategies. To address these challenges, we propose the HCT-Det model, which incorporates a window-based self-attention residual (WSA-R) block structure. This structure combines window-based self-attention (WSA) blocks to reduce computational overhead and parallel residual convolutional (Res) blocks to enhance local feature continuity. The model’s backbone generates three cross-scale features as encoder inputs, which undergo Intra-Scale Feature Interaction (ISFI) and Cross-Scale Feature Interaction (CSFI) to improve detection accuracy for targets of various sizes. A Soft IoU-Aware mechanism ensures alignment between classification scores and intersection-over-union (IoU) metrics during training. Additionally, Hybrid Downsampling (HDownsample) and Hybrid Upsampling (HUpsample) modules minimize feature degradation. Our experiments demonstrate that HCT-Det achieved a mean average precision (mAP@0.5) of 0.795 on the NEU-DET dataset and 0.733 on the GC10-DET dataset, outperforming other state-of-the-art approaches. These results highlight the model’s effectiveness in improving computational efficiency and detection accuracy for steel surface defect detection.https://www.mdpi.com/1424-8220/25/5/1333transformerconvolutional neural networkdefect detection
spellingShingle Xiyin Chen
Xiaohu Zhang
Yonghua Shi
Junjie Pang
HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
Sensors
transformer
convolutional neural network
defect detection
title HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
title_full HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
title_fullStr HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
title_full_unstemmed HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
title_short HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features
title_sort hct det a high accuracy end to end model for steel defect detection based on hierarchical cnn transformer features
topic transformer
convolutional neural network
defect detection
url https://www.mdpi.com/1424-8220/25/5/1333
work_keys_str_mv AT xiyinchen hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures
AT xiaohuzhang hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures
AT yonghuashi hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures
AT junjiepang hctdetahighaccuracyendtoendmodelforsteeldefectdetectionbasedonhierarchicalcnntransformerfeatures