Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limi...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Journal of Imaging |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2313-433X/11/4/102 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849714275354935296 |
|---|---|
| author | Junning Xu Sanxin Jiang |
| author_facet | Junning Xu Sanxin Jiang |
| author_sort | Junning Xu |
| collection | DOAJ |
| description | There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods. |
| format | Article |
| id | doaj-art-05e7bcdc600f4b0ea7d64cb392592009 |
| institution | DOAJ |
| issn | 2313-433X |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Journal of Imaging |
| spelling | doaj-art-05e7bcdc600f4b0ea7d64cb3925920092025-08-20T03:13:45ZengMDPI AGJournal of Imaging2313-433X2025-03-0111410210.3390/jimaging11040102Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly DetectionJunning Xu0Sanxin Jiang1College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, ChinaCollege of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, ChinaThere are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods.https://www.mdpi.com/2313-433X/11/4/102anomaly detectionknowledge distillationcross-layer transferattention mechanismfeature extraction |
| spellingShingle | Junning Xu Sanxin Jiang Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection Journal of Imaging anomaly detection knowledge distillation cross-layer transfer attention mechanism feature extraction |
| title | Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection |
| title_full | Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection |
| title_fullStr | Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection |
| title_full_unstemmed | Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection |
| title_short | Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection |
| title_sort | hierarchical knowledge transfer cross layer distillation for industrial anomaly detection |
| topic | anomaly detection knowledge distillation cross-layer transfer attention mechanism feature extraction |
| url | https://www.mdpi.com/2313-433X/11/4/102 |
| work_keys_str_mv | AT junningxu hierarchicalknowledgetransfercrosslayerdistillationforindustrialanomalydetection AT sanxinjiang hierarchicalknowledgetransfercrosslayerdistillationforindustrialanomalydetection |