Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression

Background: Modern computational linguistics heavily relies on large language models that demonstrate strong performance in various Natural Language Inference (NLI) tasks. These models, however, require substantial computational resources for both training and deployment. To address this challenge,...

Full description

Saved in:

Bibliographic Details
Main Author:	Сергей Александрович Плетенев
Format:	Article
Language:	English
Published:	National Research University Higher School of Economics 2024-12-01
Series:	Journal of Language and Education
Subjects:	factorization-based compression large language model optimization linguistic representation probing resource-efficient NLP
Online Access:	https://jle.hse.ru/article/view/22368
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841556000968015872
author	Сергей Александрович Плетенев
author_facet	Сергей Александрович Плетенев
author_sort	Сергей Александрович Плетенев
collection	DOAJ
description	Background: Modern computational linguistics heavily relies on large language models that demonstrate strong performance in various Natural Language Inference (NLI) tasks. These models, however, require substantial computational resources for both training and deployment. To address this challenge, a range of compression and acceleration techniques has been developed, including quantization, pruning, and factorization. Each of these approaches operates differently, can be applied at various levels of the model architecture, and is suited to different deployment scenarios. Purpose: The objective of this study is to analyze and evaluate a factorization-based compression technique that reduces the computational footprint of large language models while preserving their accuracy in NLI tasks, particularly for resource-constrained or latency-sensitive applications. Method: To evaluate the impact of factorization-based compression, we conducted probing experiments. First, we chose a widely-used pre-trained model (Bert-base and Llama 2) as our baseline. Then, we applied low-rank factorization to its transformer layers using various singular value decomposition algorithms at different compression rates. After that, we used probing tasks to analyze the changes in the internal representations and linguistic knowledge of the compressed models. We compared the changes in the model's internal representations with its ability to solve natural language inference (NLI) tasks and the compression rate achieved through factorization. Results: Naive uniform factorization often led to significant accuracy drops, even at small compression rates, reflecting a noticeable degradation in the model's ability to understand textual entailments. Probing tasks showed that these uniformly compressed models lost important syntactic and semantic information, which aligned with the performance decline we observed. However, targeted compression approaches, such as selectively compressing the most redundant parts of the model or weighting algorithms, mitigated these negative effects. Conclusion: These results demonstrate that factorization, when used properly, can significantly reduce computational requirements while preserving the core linguistic capabilities of large language models. Our research can inform the development of future compression techniques that adapt factorization strategies to the inherent structure of models and their tasks. These insights can help deploy LLMs in scenarios with limited computational resources.
format	Article
id	doaj-art-b918b134f2e54e5a94f50fc968f8b6e3
institution	Kabale University
issn	2411-7390
language	English
publishDate	2024-12-01
publisher	National Research University Higher School of Economics
record_format	Article
series	Journal of Language and Education
spelling	doaj-art-b918b134f2e54e5a94f50fc968f8b6e32025-01-07T16:17:15ZengNational Research University Higher School of EconomicsJournal of Language and Education2411-73902024-12-0110410.17323/jle.2024.22368Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model CompressionСергей Александрович Плетенев0AIRI, Moscow, Russia; Skoltech, Moscow, Russia Background: Modern computational linguistics heavily relies on large language models that demonstrate strong performance in various Natural Language Inference (NLI) tasks. These models, however, require substantial computational resources for both training and deployment. To address this challenge, a range of compression and acceleration techniques has been developed, including quantization, pruning, and factorization. Each of these approaches operates differently, can be applied at various levels of the model architecture, and is suited to different deployment scenarios. Purpose: The objective of this study is to analyze and evaluate a factorization-based compression technique that reduces the computational footprint of large language models while preserving their accuracy in NLI tasks, particularly for resource-constrained or latency-sensitive applications. Method: To evaluate the impact of factorization-based compression, we conducted probing experiments. First, we chose a widely-used pre-trained model (Bert-base and Llama 2) as our baseline. Then, we applied low-rank factorization to its transformer layers using various singular value decomposition algorithms at different compression rates. After that, we used probing tasks to analyze the changes in the internal representations and linguistic knowledge of the compressed models. We compared the changes in the model's internal representations with its ability to solve natural language inference (NLI) tasks and the compression rate achieved through factorization. Results: Naive uniform factorization often led to significant accuracy drops, even at small compression rates, reflecting a noticeable degradation in the model's ability to understand textual entailments. Probing tasks showed that these uniformly compressed models lost important syntactic and semantic information, which aligned with the performance decline we observed. However, targeted compression approaches, such as selectively compressing the most redundant parts of the model or weighting algorithms, mitigated these negative effects. Conclusion: These results demonstrate that factorization, when used properly, can significantly reduce computational requirements while preserving the core linguistic capabilities of large language models. Our research can inform the development of future compression techniques that adapt factorization strategies to the inherent structure of models and their tasks. These insights can help deploy LLMs in scenarios with limited computational resources. https://jle.hse.ru/article/view/22368factorization-based compressionlarge language model optimizationlinguistic representation probingresource-efficient NLP
spellingShingle	Сергей Александрович Плетенев Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression Journal of Language and Education factorization-based compression large language model optimization linguistic representation probing resource-efficient NLP
title	Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_full	Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_fullStr	Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_full_unstemmed	Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_short	Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_sort	probing the pitfalls understanding svd s shortcomings in language model compression
topic	factorization-based compression large language model optimization linguistic representation probing resource-efficient NLP
url	https://jle.hse.ru/article/view/22368
work_keys_str_mv	AT sergejaleksandrovičpletenev probingthepitfallsunderstandingsvdsshortcomingsinlanguagemodelcompression

Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression

Similar Items