Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression

Background: Modern computational linguistics heavily relies on large language models that demonstrate strong performance in various Natural Language Inference (NLI) tasks. These models, however, require substantial computational resources for both training and deployment. To address this challenge,...

Full description

Saved in:
Bibliographic Details
Main Author: Сергей Александрович Плетенев
Format: Article
Language:English
Published: National Research University Higher School of Economics 2024-12-01
Series:Journal of Language and Education
Subjects:
Online Access:https://jle.hse.ru/article/view/22368
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841556000968015872
author Сергей Александрович Плетенев
author_facet Сергей Александрович Плетенев
author_sort Сергей Александрович Плетенев
collection DOAJ
description Background: Modern computational linguistics heavily relies on large language models that demonstrate strong performance in various Natural Language Inference (NLI) tasks. These models, however, require substantial computational resources for both training and deployment. To address this challenge, a range of compression and acceleration techniques has been developed, including quantization, pruning, and factorization. Each of these approaches operates differently, can be applied at various levels of the model architecture, and is suited to different deployment scenarios. Purpose:  The objective of this study is to analyze and evaluate a factorization-based compression technique that reduces the computational footprint of large language models while preserving their accuracy in NLI tasks, particularly for resource-constrained or latency-sensitive applications. Method: To evaluate the impact of factorization-based compression, we conducted probing experiments. First, we chose a widely-used pre-trained model (Bert-base and Llama 2) as our baseline. Then, we applied low-rank factorization to its transformer layers using various singular value decomposition algorithms at different compression rates. After that, we used probing tasks to analyze the changes in the internal representations and linguistic knowledge of the compressed models. We compared the changes in the model's internal representations with its ability to solve natural language inference (NLI) tasks and the compression rate achieved through factorization. Results: Naive uniform factorization often led to significant accuracy drops, even at small compression rates, reflecting a noticeable degradation in the model's ability to understand textual entailments. Probing tasks showed that these uniformly compressed models lost important syntactic and semantic information, which aligned with the performance decline we observed. However, targeted compression approaches, such as selectively compressing the most redundant parts of the model or weighting algorithms, mitigated these negative effects. Conclusion:  These results demonstrate that factorization, when used properly, can significantly reduce computational requirements while preserving the core linguistic capabilities of large language models. Our research can inform the development of future compression techniques that adapt factorization strategies to the inherent structure of models and their tasks. These insights can help deploy LLMs in scenarios with limited computational resources.
format Article
id doaj-art-b918b134f2e54e5a94f50fc968f8b6e3
institution Kabale University
issn 2411-7390
language English
publishDate 2024-12-01
publisher National Research University Higher School of Economics
record_format Article
series Journal of Language and Education
spelling doaj-art-b918b134f2e54e5a94f50fc968f8b6e32025-01-07T16:17:15ZengNational Research University Higher School of EconomicsJournal of Language and Education2411-73902024-12-0110410.17323/jle.2024.22368Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model CompressionСергей Александрович Плетенев0AIRI, Moscow, Russia; Skoltech, Moscow, Russia Background: Modern computational linguistics heavily relies on large language models that demonstrate strong performance in various Natural Language Inference (NLI) tasks. These models, however, require substantial computational resources for both training and deployment. To address this challenge, a range of compression and acceleration techniques has been developed, including quantization, pruning, and factorization. Each of these approaches operates differently, can be applied at various levels of the model architecture, and is suited to different deployment scenarios. Purpose:  The objective of this study is to analyze and evaluate a factorization-based compression technique that reduces the computational footprint of large language models while preserving their accuracy in NLI tasks, particularly for resource-constrained or latency-sensitive applications. Method: To evaluate the impact of factorization-based compression, we conducted probing experiments. First, we chose a widely-used pre-trained model (Bert-base and Llama 2) as our baseline. Then, we applied low-rank factorization to its transformer layers using various singular value decomposition algorithms at different compression rates. After that, we used probing tasks to analyze the changes in the internal representations and linguistic knowledge of the compressed models. We compared the changes in the model's internal representations with its ability to solve natural language inference (NLI) tasks and the compression rate achieved through factorization. Results: Naive uniform factorization often led to significant accuracy drops, even at small compression rates, reflecting a noticeable degradation in the model's ability to understand textual entailments. Probing tasks showed that these uniformly compressed models lost important syntactic and semantic information, which aligned with the performance decline we observed. However, targeted compression approaches, such as selectively compressing the most redundant parts of the model or weighting algorithms, mitigated these negative effects. Conclusion:  These results demonstrate that factorization, when used properly, can significantly reduce computational requirements while preserving the core linguistic capabilities of large language models. Our research can inform the development of future compression techniques that adapt factorization strategies to the inherent structure of models and their tasks. These insights can help deploy LLMs in scenarios with limited computational resources. https://jle.hse.ru/article/view/22368factorization-based compressionlarge language model optimizationlinguistic representation probingresource-efficient NLP
spellingShingle Сергей Александрович Плетенев
Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
Journal of Language and Education
factorization-based compression
large language model optimization
linguistic representation probing
resource-efficient NLP
title Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_full Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_fullStr Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_full_unstemmed Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_short Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
title_sort probing the pitfalls understanding svd s shortcomings in language model compression
topic factorization-based compression
large language model optimization
linguistic representation probing
resource-efficient NLP
url https://jle.hse.ru/article/view/22368
work_keys_str_mv AT sergejaleksandrovičpletenev probingthepitfallsunderstandingsvdsshortcomingsinlanguagemodelcompression