Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization

Natural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jin-Hwan Kim, Young-Seok Choi
Format:	Article
Language:	English
Published:	MDPI AG 2025-04-01
Series:	Entropy
Subjects:	natural language processing pre-trained language model Korean language model knowledge distillation low-rank factorization resource-constrained environment
Online Access:	https://www.mdpi.com/1099-4300/27/4/379
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849713832015953920
author	Jin-Hwan Kim Young-Seok Choi
author_facet	Jin-Hwan Kim Young-Seok Choi
author_sort	Jin-Hwan Kim
collection	DOAJ
description	Natural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure and semantics by assigning probabilities to sequences of words. The trend towards large language models (LLMs) has shown significant performance improvements with increasing model size. However, the deployment of LLMs on resource-limited devices such as mobile and edge devices remains a challenge. This issue is particularly pronounced in languages other than English, including Korean, where pre-trained models are relatively scarce. Addressing this gap, we introduce a novel lightweight pre-trained Korean language model that leverages knowledge distillation and low-rank factorization techniques. Our approach distills knowledge from a 432 MB (approximately 110 M parameters) teacher model into student models of substantially reduced sizes (e.g., 53 MB ≈ 14 M parameters, 35 MB ≈ 13 M parameters, 30 MB ≈ 11 M parameters, and 18 MB ≈ 4 M parameters). The smaller student models further employ low-rank factorization to minimize the parameter count within the Transformer’s feed-forward network (FFN) and embedding layer. We evaluate the efficacy of our lightweight model across six established Korean NLP tasks. Notably, our most compact model, KR-ELECTRA-Small-KD, attains over 97.387% of the teacher model’s performance despite an 8.15× reduction in size. Remarkably, on the NSMC sentiment classification benchmark, KR-ELECTRA-Small-KD surpasses the teacher model with an accuracy of 89.720%. These findings underscore the potential of our model as an efficient solution for NLP applications in resource-constrained settings.
format	Article
id	doaj-art-c84ab186bed447d9ac8bcd7936e1de04
institution	DOAJ
issn	1099-4300
language	English
publishDate	2025-04-01
publisher	MDPI AG
record_format	Article
series	Entropy
spelling	doaj-art-c84ab186bed447d9ac8bcd7936e1de042025-08-20T03:13:51ZengMDPI AGEntropy1099-43002025-04-0127437910.3390/e27040379Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank FactorizationJin-Hwan Kim0Young-Seok Choi1Korea Telecom Corporation Agentic AI Lab, Seongnam-si 13606, Republic of KoreaDepartment of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaNatural Language Processing (NLP) stands as a forefront of artificial intelligence research, empowering computational systems to comprehend and process human language as used in everyday contexts. Language models (LMs) underpin this field, striving to capture the intricacies of linguistic structure and semantics by assigning probabilities to sequences of words. The trend towards large language models (LLMs) has shown significant performance improvements with increasing model size. However, the deployment of LLMs on resource-limited devices such as mobile and edge devices remains a challenge. This issue is particularly pronounced in languages other than English, including Korean, where pre-trained models are relatively scarce. Addressing this gap, we introduce a novel lightweight pre-trained Korean language model that leverages knowledge distillation and low-rank factorization techniques. Our approach distills knowledge from a 432 MB (approximately 110 M parameters) teacher model into student models of substantially reduced sizes (e.g., 53 MB ≈ 14 M parameters, 35 MB ≈ 13 M parameters, 30 MB ≈ 11 M parameters, and 18 MB ≈ 4 M parameters). The smaller student models further employ low-rank factorization to minimize the parameter count within the Transformer’s feed-forward network (FFN) and embedding layer. We evaluate the efficacy of our lightweight model across six established Korean NLP tasks. Notably, our most compact model, KR-ELECTRA-Small-KD, attains over 97.387% of the teacher model’s performance despite an 8.15× reduction in size. Remarkably, on the NSMC sentiment classification benchmark, KR-ELECTRA-Small-KD surpasses the teacher model with an accuracy of 89.720%. These findings underscore the potential of our model as an efficient solution for NLP applications in resource-constrained settings.https://www.mdpi.com/1099-4300/27/4/379natural language processingpre-trained language modelKorean language modelknowledge distillationlow-rank factorizationresource-constrained environment
spellingShingle	Jin-Hwan Kim Young-Seok Choi Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization Entropy natural language processing pre-trained language model Korean language model knowledge distillation low-rank factorization resource-constrained environment
title	Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization
title_full	Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization
title_fullStr	Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization
title_full_unstemmed	Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization
title_short	Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization
title_sort	lightweight pre trained korean language model based on knowledge distillation and low rank factorization
topic	natural language processing pre-trained language model Korean language model knowledge distillation low-rank factorization resource-constrained environment
url	https://www.mdpi.com/1099-4300/27/4/379
work_keys_str_mv	AT jinhwankim lightweightpretrainedkoreanlanguagemodelbasedonknowledgedistillationandlowrankfactorization AT youngseokchoi lightweightpretrainedkoreanlanguagemodelbasedonknowledgedistillationandlowrankfactorization

Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization

Similar Items