Long-context inference optimization for large language models: a survey

With the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive re...

Full description

Saved in:

Bibliographic Details
Main Authors:	TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang
Format:	Article
Language:	zho
Published:	China InfoCom Media Group 2025-01-01
Series:	大数据
Subjects:	long-context inference model optimization computation optimization system optimization
Online Access:	http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850136772734877696
author	TAO Wei WANG Jianzong ZHANG Xulong QU Xiaoyang
author_facet	TAO Wei WANG Jianzong ZHANG Xulong QU Xiaoyang
author_sort	TAO Wei
collection	DOAJ
description	With the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. The study first revealed three key factors that affect efficiency: the first is the huge model size, the second is the attention mechanism operation with quadratic computational complexity, and the third is the autoregressive decoding strategy. These factors together restrict the overall performance of the model. Subsequently, a taxonomy was proposed, categorizing optimization techniques into model optimization, computation optimization, and system optimization, with detailed introductions to key technologies such as quantization, sparse attention, and operator fusion. The research results demonstrate that these optimization techniques can effectively enhance the performance of long-text inference. Finally, future research directions were outlined, emphasizing the importance offurther optimizing LLMs for long-text inference to meet the growing demands of context length.
format	Article
id	doaj-art-e23dc9d6d575493cb01595ee91ccb1b8
institution	OA Journals
issn	2096-0271
language	zho
publishDate	2025-01-01
publisher	China InfoCom Media Group
record_format	Article
series	大数据
spelling	doaj-art-e23dc9d6d575493cb01595ee91ccb1b82025-08-20T02:31:02ZzhoChina InfoCom Media Group大数据2096-02712025-01-01123109257920Long-context inference optimization for large language models: a surveyTAO WeiWANG JianzongZHANG XulongQU XiaoyangWith the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. The study first revealed three key factors that affect efficiency: the first is the huge model size, the second is the attention mechanism operation with quadratic computational complexity, and the third is the autoregressive decoding strategy. These factors together restrict the overall performance of the model. Subsequently, a taxonomy was proposed, categorizing optimization techniques into model optimization, computation optimization, and system optimization, with detailed introductions to key technologies such as quantization, sparse attention, and operator fusion. The research results demonstrate that these optimization techniques can effectively enhance the performance of long-text inference. Finally, future research directions were outlined, emphasizing the importance offurther optimizing LLMs for long-text inference to meet the growing demands of context length.http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0long-context inferencemodel optimizationcomputation optimizationsystem optimization
spellingShingle	TAO Wei WANG Jianzong ZHANG Xulong QU Xiaoyang Long-context inference optimization for large language models: a survey 大数据 long-context inference model optimization computation optimization system optimization
title	Long-context inference optimization for large language models: a survey
title_full	Long-context inference optimization for large language models: a survey
title_fullStr	Long-context inference optimization for large language models: a survey
title_full_unstemmed	Long-context inference optimization for large language models: a survey
title_short	Long-context inference optimization for large language models: a survey
title_sort	long context inference optimization for large language models a survey
topic	long-context inference model optimization computation optimization system optimization
url	http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0
work_keys_str_mv	AT taowei longcontextinferenceoptimizationforlargelanguagemodelsasurvey AT wangjianzong longcontextinferenceoptimizationforlargelanguagemodelsasurvey AT zhangxulong longcontextinferenceoptimizationforlargelanguagemodelsasurvey AT quxiaoyang longcontextinferenceoptimizationforlargelanguagemodelsasurvey

Long-context inference optimization for large language models: a survey

Similar Items