Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection

There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limi...

Full description

Saved in:
Bibliographic Details
Main Authors: Junning Xu, Sanxin Jiang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/11/4/102
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714275354935296
author Junning Xu
Sanxin Jiang
author_facet Junning Xu
Sanxin Jiang
author_sort Junning Xu
collection DOAJ
description There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods.
format Article
id doaj-art-05e7bcdc600f4b0ea7d64cb392592009
institution DOAJ
issn 2313-433X
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Journal of Imaging
spelling doaj-art-05e7bcdc600f4b0ea7d64cb3925920092025-08-20T03:13:45ZengMDPI AGJournal of Imaging2313-433X2025-03-0111410210.3390/jimaging11040102Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly DetectionJunning Xu0Sanxin Jiang1College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, ChinaCollege of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, ChinaThere are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods.https://www.mdpi.com/2313-433X/11/4/102anomaly detectionknowledge distillationcross-layer transferattention mechanismfeature extraction
spellingShingle Junning Xu
Sanxin Jiang
Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
Journal of Imaging
anomaly detection
knowledge distillation
cross-layer transfer
attention mechanism
feature extraction
title Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_full Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_fullStr Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_full_unstemmed Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_short Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_sort hierarchical knowledge transfer cross layer distillation for industrial anomaly detection
topic anomaly detection
knowledge distillation
cross-layer transfer
attention mechanism
feature extraction
url https://www.mdpi.com/2313-433X/11/4/102
work_keys_str_mv AT junningxu hierarchicalknowledgetransfercrosslayerdistillationforindustrialanomalydetection
AT sanxinjiang hierarchicalknowledgetransfercrosslayerdistillationforindustrialanomalydetection