Category semantic and global relation distillation for object detection

Object detection, a fundamental task in computer vision, has witnessed remarkable success in domains such as autonomous driving, robotics, and facial recognition, owing to advancements in convolutional neural networks. Despite these successes, state-of-the-art models for object detection often come...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yanpeng LIANG, Zhonggui MA, Zongjie WANG, Zhuo LI
Format:	Article
Language:	zho
Published:	Science Press 2025-04-01
Series:	工程科学学报
Subjects:	knowledge distillation object detection model compression attention mechanism computer vision
Online Access:	http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.04.25.001
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849726612113719296
author	Yanpeng LIANG Zhonggui MA Zongjie WANG Zhuo LI
author_facet	Yanpeng LIANG Zhonggui MA Zongjie WANG Zhuo LI
author_sort	Yanpeng LIANG
collection	DOAJ
description	Object detection, a fundamental task in computer vision, has witnessed remarkable success in domains such as autonomous driving, robotics, and facial recognition, owing to advancements in convolutional neural networks. Despite these successes, state-of-the-art models for object detection often come with a high number of parameters, pushing the limits of modern hardware and posing challenges for deployment on devices with limited resources. To address this challenge, various model compression techniques have been developed, including network pruning, lightweight architecture design, neural network quantization, and knowledge distillation. Knowledge distillation stands out as it transfers knowledge from large teacher models to compact student models without modifying the network structure, enabling the student models to perform nearly as well as their larger counterparts. However, most distillation techniques have been optimized for image classification, not object detection, which involves simultaneously detecting and classifying multiple target objects within natural images. These objects often exhibit variations in scale, intricate interclass relationships, and are dispersed across different locations. These factors make it difficult to balance the contributions of different elements, such as bounding box centers and backgrounds during distillation. Consequently, incorporating knowledge distillation into object detection models poses substantial challenges. To settle these questions, this study proposes a novel attention-based knowledge distillation framework for object detection, striking a better balance between efficiency and accuracy. This study is primarily divided into the following points: first, it introduces the use of category semantic attention to accurately identify and focus on foreground semantic regions for each class in the neck feature pyramid’s output feature map of the teacher detector. This process effectively communicates crucial positional information data for each class to the student model and helps manage challenges related to multiscale targets. To mitigate differences between teacher and student model feature maps, this study normalizes the feature maps used for distillation, ensuring they have zero mean and unit variance. Furthermore, to improve the handling of background information in category semantic distillation and tackle issues related to the disrupted relationships between foreground and background regions as well as overlooked relationships among different class targets, this study proposes a criss-cross attention mechanism. This mechanism is designed to capture long-range dependencies between target pixels in the teacher model, which are then transmitted to the student model to further enhance its detection capabilities. Combining the aforementioned two distillation techniques, this study introduces the category semantic and global relation (CSGR) distillation approach. The first technique targets crucial foreground positions for each class, whereas the second captures global relationships among target pixels across different classes. To validate the effectiveness and generalization of the proposed method, extensive experiments were conducted on challenging benchmarks, including SODA10M, PASCAL VOC, and MiniCOCO. Across various object detectors, the student models distilled through CSGR distillation exhibit impressive improvements compared with those trained from scratch. Compared with other baseline methods, the proposed approach achieved competitive improvements in mean average precision without considerably increasing the number of parameters and FLOPS during distillation training, thereby striking a better balance between accuracy and efficiency.
format	Article
id	doaj-art-7ecfe5aeebf74c4cb732e58f61360092
institution	DOAJ
issn	2095-9389
language	zho
publishDate	2025-04-01
publisher	Science Press
record_format	Article
series	工程科学学报
spelling	doaj-art-7ecfe5aeebf74c4cb732e58f613600922025-08-20T03:10:07ZzhoScience Press工程科学学报2095-93892025-04-0147485086110.13374/j.issn2095-9389.2024.04.25.001240425-0001Category semantic and global relation distillation for object detectionYanpeng LIANG0Zhonggui MA1Zongjie WANG2Zhuo LI3School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaSchool of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, ChinaObject detection, a fundamental task in computer vision, has witnessed remarkable success in domains such as autonomous driving, robotics, and facial recognition, owing to advancements in convolutional neural networks. Despite these successes, state-of-the-art models for object detection often come with a high number of parameters, pushing the limits of modern hardware and posing challenges for deployment on devices with limited resources. To address this challenge, various model compression techniques have been developed, including network pruning, lightweight architecture design, neural network quantization, and knowledge distillation. Knowledge distillation stands out as it transfers knowledge from large teacher models to compact student models without modifying the network structure, enabling the student models to perform nearly as well as their larger counterparts. However, most distillation techniques have been optimized for image classification, not object detection, which involves simultaneously detecting and classifying multiple target objects within natural images. These objects often exhibit variations in scale, intricate interclass relationships, and are dispersed across different locations. These factors make it difficult to balance the contributions of different elements, such as bounding box centers and backgrounds during distillation. Consequently, incorporating knowledge distillation into object detection models poses substantial challenges. To settle these questions, this study proposes a novel attention-based knowledge distillation framework for object detection, striking a better balance between efficiency and accuracy. This study is primarily divided into the following points: first, it introduces the use of category semantic attention to accurately identify and focus on foreground semantic regions for each class in the neck feature pyramid’s output feature map of the teacher detector. This process effectively communicates crucial positional information data for each class to the student model and helps manage challenges related to multiscale targets. To mitigate differences between teacher and student model feature maps, this study normalizes the feature maps used for distillation, ensuring they have zero mean and unit variance. Furthermore, to improve the handling of background information in category semantic distillation and tackle issues related to the disrupted relationships between foreground and background regions as well as overlooked relationships among different class targets, this study proposes a criss-cross attention mechanism. This mechanism is designed to capture long-range dependencies between target pixels in the teacher model, which are then transmitted to the student model to further enhance its detection capabilities. Combining the aforementioned two distillation techniques, this study introduces the category semantic and global relation (CSGR) distillation approach. The first technique targets crucial foreground positions for each class, whereas the second captures global relationships among target pixels across different classes. To validate the effectiveness and generalization of the proposed method, extensive experiments were conducted on challenging benchmarks, including SODA10M, PASCAL VOC, and MiniCOCO. Across various object detectors, the student models distilled through CSGR distillation exhibit impressive improvements compared with those trained from scratch. Compared with other baseline methods, the proposed approach achieved competitive improvements in mean average precision without considerably increasing the number of parameters and FLOPS during distillation training, thereby striking a better balance between accuracy and efficiency.http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.04.25.001knowledge distillationobject detectionmodel compressionattention mechanismcomputer vision
spellingShingle	Yanpeng LIANG Zhonggui MA Zongjie WANG Zhuo LI Category semantic and global relation distillation for object detection 工程科学学报 knowledge distillation object detection model compression attention mechanism computer vision
title	Category semantic and global relation distillation for object detection
title_full	Category semantic and global relation distillation for object detection
title_fullStr	Category semantic and global relation distillation for object detection
title_full_unstemmed	Category semantic and global relation distillation for object detection
title_short	Category semantic and global relation distillation for object detection
title_sort	category semantic and global relation distillation for object detection
topic	knowledge distillation object detection model compression attention mechanism computer vision
url	http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.04.25.001
work_keys_str_mv	AT yanpengliang categorysemanticandglobalrelationdistillationforobjectdetection AT zhongguima categorysemanticandglobalrelationdistillationforobjectdetection AT zongjiewang categorysemanticandglobalrelationdistillationforobjectdetection AT zhuoli categorysemanticandglobalrelationdistillationforobjectdetection

Category semantic and global relation distillation for object detection

Similar Items