Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection

There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Junning Xu, Sanxin Jiang
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Journal of Imaging
Subjects:	anomaly detection knowledge distillation cross-layer transfer attention mechanism feature extraction
Online Access:	https://www.mdpi.com/2313-433X/11/4/102
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849714275354935296
author	Junning Xu Sanxin Jiang
author_facet	Junning Xu Sanxin Jiang
author_sort	Junning Xu
collection	DOAJ
description	There are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods.
format	Article
id	doaj-art-05e7bcdc600f4b0ea7d64cb392592009
institution	DOAJ
issn	2313-433X
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Journal of Imaging
spelling	doaj-art-05e7bcdc600f4b0ea7d64cb3925920092025-08-20T03:13:45ZengMDPI AGJournal of Imaging2313-433X2025-03-0111410210.3390/jimaging11040102Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly DetectionJunning Xu0Sanxin Jiang1College of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, ChinaCollege of Electronics and Information Engineering, Shanghai University of Electric Power, Shanghai 201306, ChinaThere are two problems with traditional knowledge distillation methods in industrial anomaly detection: first, traditional methods mostly use feature alignment between the same layers. The second is that similar or even identical structures are usually used to build teacher-student models, thus limiting the ability to represent anomalies in multiple ways. To address these issues, this work proposes a Hierarchical Knowledge Transfer (HKT) framework for detecting industrial surface anomalies. First, HKT utilizes the deep knowledge of the highest feature layer in the teacher’s network to guide student learning at every level, thus enabling cross-layer interactions. Multiple projectors are built inside the model to facilitate the teacher in transferring knowledge to each layer of the student. Second, the teacher-student structural symmetry is decoupled by embedding Convolutional Block Attention Modules (CBAM) in the student network. Finally, based on HKT, a more powerful anomaly detection model, HKT+, is developed. By adding two additional convolutional layers to the teacher and student networks of HKT, HKT+ achieves enhanced detection capabilities at the cost of a relatively small increase in model parameters. Experiments on the MVTec AD and BeanTech AD(BTAD) datasets show that HKT+ achieves state-of-the-art performance with average area under the receiver operating characteristic curve (AUROC) scores of 98.69% and 94.58%, respectively, which outperforms most current state-of-the-art methods.https://www.mdpi.com/2313-433X/11/4/102anomaly detectionknowledge distillationcross-layer transferattention mechanismfeature extraction
spellingShingle	Junning Xu Sanxin Jiang Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection Journal of Imaging anomaly detection knowledge distillation cross-layer transfer attention mechanism feature extraction
title	Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_full	Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_fullStr	Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_full_unstemmed	Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_short	Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection
title_sort	hierarchical knowledge transfer cross layer distillation for industrial anomaly detection
topic	anomaly detection knowledge distillation cross-layer transfer attention mechanism feature extraction
url	https://www.mdpi.com/2313-433X/11/4/102
work_keys_str_mv	AT junningxu hierarchicalknowledgetransfercrosslayerdistillationforindustrialanomalydetection AT sanxinjiang hierarchicalknowledgetransfercrosslayerdistillationforindustrialanomalydetection

Hierarchical Knowledge Transfer: Cross-Layer Distillation for Industrial Anomaly Detection

Similar Items