Long-context inference optimization for large language models: a survey

With the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive re...

Full description

Saved in:
Bibliographic Details
Main Authors: TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang
Format: Article
Language:zho
Published: China InfoCom Media Group 2025-01-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850136772734877696
author TAO Wei
WANG Jianzong
ZHANG Xulong
QU Xiaoyang
author_facet TAO Wei
WANG Jianzong
ZHANG Xulong
QU Xiaoyang
author_sort TAO Wei
collection DOAJ
description With the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. The study first revealed three key factors that affect efficiency: the first is the huge model size, the second is the attention mechanism operation with quadratic computational complexity, and the third is the autoregressive decoding strategy. These factors together restrict the overall performance of the model. Subsequently, a taxonomy was proposed, categorizing optimization techniques into model optimization, computation optimization, and system optimization, with detailed introductions to key technologies such as quantization, sparse attention, and operator fusion. The research results demonstrate that these optimization techniques can effectively enhance the performance of long-text inference. Finally, future research directions were outlined, emphasizing the importance offurther optimizing LLMs for long-text inference to meet the growing demands of context length.
format Article
id doaj-art-e23dc9d6d575493cb01595ee91ccb1b8
institution OA Journals
issn 2096-0271
language zho
publishDate 2025-01-01
publisher China InfoCom Media Group
record_format Article
series 大数据
spelling doaj-art-e23dc9d6d575493cb01595ee91ccb1b82025-08-20T02:31:02ZzhoChina InfoCom Media Group大数据2096-02712025-01-01123109257920Long-context inference optimization for large language models: a surveyTAO WeiWANG JianzongZHANG XulongQU XiaoyangWith the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. The study first revealed three key factors that affect efficiency: the first is the huge model size, the second is the attention mechanism operation with quadratic computational complexity, and the third is the autoregressive decoding strategy. These factors together restrict the overall performance of the model. Subsequently, a taxonomy was proposed, categorizing optimization techniques into model optimization, computation optimization, and system optimization, with detailed introductions to key technologies such as quantization, sparse attention, and operator fusion. The research results demonstrate that these optimization techniques can effectively enhance the performance of long-text inference. Finally, future research directions were outlined, emphasizing the importance offurther optimizing LLMs for long-text inference to meet the growing demands of context length.http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0long-context inferencemodel optimizationcomputation optimizationsystem optimization
spellingShingle TAO Wei
WANG Jianzong
ZHANG Xulong
QU Xiaoyang
Long-context inference optimization for large language models: a survey
大数据
long-context inference
model optimization
computation optimization
system optimization
title Long-context inference optimization for large language models: a survey
title_full Long-context inference optimization for large language models: a survey
title_fullStr Long-context inference optimization for large language models: a survey
title_full_unstemmed Long-context inference optimization for large language models: a survey
title_short Long-context inference optimization for large language models: a survey
title_sort long context inference optimization for large language models a survey
topic long-context inference
model optimization
computation optimization
system optimization
url http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0
work_keys_str_mv AT taowei longcontextinferenceoptimizationforlargelanguagemodelsasurvey
AT wangjianzong longcontextinferenceoptimizationforlargelanguagemodelsasurvey
AT zhangxulong longcontextinferenceoptimizationforlargelanguagemodelsasurvey
AT quxiaoyang longcontextinferenceoptimizationforlargelanguagemodelsasurvey