Compressing Neural Networks on Limited Computing Resources

Network compression is a crucial technique for applying deep learning models to edge or mobile devices. However, the cost of achieving higher benchmark performance through compression is continuously increasing, making network compression a significant burden—especially for small industri...

Full description

Saved in:
Bibliographic Details
Main Authors: Seunghyun Lee, Dongjun Lee, Minju Hyun, Heeje Kim, Byung Cheol Song
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10988545/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850189601533067264
author Seunghyun Lee
Dongjun Lee
Minju Hyun
Heeje Kim
Byung Cheol Song
author_facet Seunghyun Lee
Dongjun Lee
Minju Hyun
Heeje Kim
Byung Cheol Song
author_sort Seunghyun Lee
collection DOAJ
description Network compression is a crucial technique for applying deep learning models to edge or mobile devices. However, the cost of achieving higher benchmark performance through compression is continuously increasing, making network compression a significant burden—especially for small industries focused on developing compact models. Specifically, existing network compression techniques often require extensive computational resources, rendering them impractical for edge devices and small-scale applications. To democratize network compression, we propose a general-purpose framework that combines novel filter pruning and knowledge distillation techniques. First, unlike conventional filter pruning methods based on static heuristics and costly neural architecture search (NAS)-based approaches, our method leverages meta-learning for rapid and fine examination of the importance of each gate. This enables rapid and stable sub-network discovery, significantly improving the pruning process. Second, to minimize the computational cost of knowledge distillation, we introduce a synthetic teacher assistant that leverages precomputed fixed knowledge—referring to the stored feature maps/logits of the teacher network. By leveraging fixed knowledge, we mitigate the cost incurred by the teacher network and facilitate the transmission of fixed knowledge to the student via synthetic teacher assistants, thereby preventing distribution collapse. Our proposed framework dramatically reduces the compression overhead while maintaining high accuracy, achieving a 55.2% reduction in FLOPs of ResNet-50 trained on ImageNet while preserving 76.2% top-1 accuracy with only 199 GPU hours—significantly lower than previous state-of-the-art methods. Overall, our framework democratizes deep learning compression by offering a cost-effective and computationally feasible solution, enabling broader adoption in low-resource environments.
format Article
id doaj-art-de6b40cacc844e4c942d46dd7ddeb89f
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-de6b40cacc844e4c942d46dd7ddeb89f2025-08-20T02:15:34ZengIEEEIEEE Access2169-35362025-01-0113800638007510.1109/ACCESS.2025.356710210988545Compressing Neural Networks on Limited Computing ResourcesSeunghyun Lee0https://orcid.org/0000-0001-7139-1764Dongjun Lee1Minju Hyun2https://orcid.org/0009-0004-5523-4573Heeje Kim3https://orcid.org/0009-0006-4654-1832Byung Cheol Song4https://orcid.org/0000-0001-8742-3433Department of Electrical and Computer Engineering, Inha University, Incheon, Republic of KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon, Republic of KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon, Republic of KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon, Republic of KoreaDepartment of Electrical and Computer Engineering, Inha University, Incheon, Republic of KoreaNetwork compression is a crucial technique for applying deep learning models to edge or mobile devices. However, the cost of achieving higher benchmark performance through compression is continuously increasing, making network compression a significant burden—especially for small industries focused on developing compact models. Specifically, existing network compression techniques often require extensive computational resources, rendering them impractical for edge devices and small-scale applications. To democratize network compression, we propose a general-purpose framework that combines novel filter pruning and knowledge distillation techniques. First, unlike conventional filter pruning methods based on static heuristics and costly neural architecture search (NAS)-based approaches, our method leverages meta-learning for rapid and fine examination of the importance of each gate. This enables rapid and stable sub-network discovery, significantly improving the pruning process. Second, to minimize the computational cost of knowledge distillation, we introduce a synthetic teacher assistant that leverages precomputed fixed knowledge—referring to the stored feature maps/logits of the teacher network. By leveraging fixed knowledge, we mitigate the cost incurred by the teacher network and facilitate the transmission of fixed knowledge to the student via synthetic teacher assistants, thereby preventing distribution collapse. Our proposed framework dramatically reduces the compression overhead while maintaining high accuracy, achieving a 55.2% reduction in FLOPs of ResNet-50 trained on ImageNet while preserving 76.2% top-1 accuracy with only 199 GPU hours—significantly lower than previous state-of-the-art methods. Overall, our framework democratizes deep learning compression by offering a cost-effective and computationally feasible solution, enabling broader adoption in low-resource environments.https://ieeexplore.ieee.org/document/10988545/Deep neural network compressionfilter pruningknowledge transferefficient neural networkslightweight deep learning models
spellingShingle Seunghyun Lee
Dongjun Lee
Minju Hyun
Heeje Kim
Byung Cheol Song
Compressing Neural Networks on Limited Computing Resources
IEEE Access
Deep neural network compression
filter pruning
knowledge transfer
efficient neural networks
lightweight deep learning models
title Compressing Neural Networks on Limited Computing Resources
title_full Compressing Neural Networks on Limited Computing Resources
title_fullStr Compressing Neural Networks on Limited Computing Resources
title_full_unstemmed Compressing Neural Networks on Limited Computing Resources
title_short Compressing Neural Networks on Limited Computing Resources
title_sort compressing neural networks on limited computing resources
topic Deep neural network compression
filter pruning
knowledge transfer
efficient neural networks
lightweight deep learning models
url https://ieeexplore.ieee.org/document/10988545/
work_keys_str_mv AT seunghyunlee compressingneuralnetworksonlimitedcomputingresources
AT dongjunlee compressingneuralnetworksonlimitedcomputingresources
AT minjuhyun compressingneuralnetworksonlimitedcomputingresources
AT heejekim compressingneuralnetworksonlimitedcomputingresources
AT byungcheolsong compressingneuralnetworksonlimitedcomputingresources