Long-context inference optimization for large language models: a survey

Long-context inference optimization for large language models: a survey

Show other versions (1)

With the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency. To improve the efficiency of LLMs in long-text inference, a comprehensive re...

Full description

Saved in:

Bibliographic Details
Main Authors:	TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang
Format:	Article
Language:	zho
Published:	China InfoCom Media Group 2025-01-01
Series:	大数据
Subjects:	long-context inference model optimization computation optimization system optimization
Online Access:	http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109257920&Fpath=home&index=0
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Long-context inference optimization for large language models: a survey
by: TAO Wei, et al.
Published: (2025-01-01)

Opt-CoInfer: Optimal collaborative inference across IoT and cloud for fast and accurate CNN inference
by: Zhanhua Zhang, et al.
Published: (2023-01-01)

Efficient LLMs Training and Inference: An Introduction
by: Rui Li, et al.
Published: (2025-01-01)

NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters
by: Peng Li, et al.
Published: (2025-04-01)

INNOVATIVE BUSINESS DEVELOPMENT. STATISTICAL INFERENCE OF SIX SIGMA
by: DENIS ANATOL’EVICH Zhevnov
Published: (2018-04-01)

An optimization similarity fuzzy inference method for traffic signal control at an isolated intersection
by: Mahin Esmaeili, et al.
Published: (2025-12-01)

AsymGroup: Asymmetric Grouping and Communication Optimization for 2D Tensor Parallelism in LLM Inference
by: Ki Tae Kim, et al.
Published: (2025-01-01)

An Optimized Hybrid Model for Perishable Product Quality Inference in the Food Supply Chain
by: Muhammad Asrol, et al.
Published: (2025-02-01)

Quantum Long Short-Term Memory-Assisted Optimization for Efficient Vehicle Platooning in Connected and Autonomous Systems
by: Mahzabeen Emu, et al.
Published: (2025-01-01)

Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization
by: Qiangxing Tian, et al.
Published: (2025-03-01)

Evaluation and adjustment of clothing comfort based on fuzzy inference
by: Xueyun Zhang, et al.
Published: (2025-07-01)

Bayesian topology inference of regulatory networks under partial observability
by: Mohammad Alali, et al.
Published: (2025-06-01)

scEGOT: single-cell trajectory inference framework based on entropic Gaussian mixture optimal transport
by: Toshiaki Yachimura, et al.
Published: (2024-12-01)

Comparing AI versus optimization workflows for simulation-based inference of spatial-stochastic systems
by: Michael Alexander Ramirez Sierra, et al.
Published: (2025-01-01)

Entropy-Guided KV Caching for Efficient LLM Inference
by: Heekyum Kim, et al.
Published: (2025-07-01)

Cryptographic inference for large language model via secret sharing
by: CHENG Ke, et al.
Published: (2025-06-01)

Fog Service Placement Optimization: A Survey of State-of-the-Art Strategies and Techniques
by: Hemant Kumar Apat, et al.
Published: (2025-03-01)

Deep Learning for Sector-Specific Labor Market Forecasting: Integrating Job Postings and Macroeconomic Indicators
by: Haojun Ding
Published: (2025-01-01)

Multi-Area, Multi-Service and Multi-Tier Edge-Cloud Continuum Planning
by: Anargyros J. Roumeliotis, et al.
Published: (2025-06-01)

Simultaneous Multi-Objective and Topology Optimization: Effect of Mesh Refinement and Number of Iterations on Computational Cost
by: Daniel Miler, et al.
Published: (2025-07-01)

A Pareto Front Transformation Model for Multi- Objective-Based Constrained Optimization
by: Sanyou Zeng, et al.
Published: (2025-01-01)

Large Language Model Enhanced Particle Swarm Optimization for Hyperparameter Tuning for Deep Learning Models
by: Saad Hameed, et al.
Published: (2025-01-01)

An analytical risk mitigation framework for steel fabrication supply chains using fuzzy inference and house of risk
by: Fadhil Adita Ramadhan, et al.
Published: (2025-06-01)

Stochastic Optimal Control for Uncertain Structural Systems Under Random Excitations Based on Bayes Optimal Estimation
by: Hua Lei, et al.
Published: (2025-05-01)

Hybridization of Galactic Swarm and Evolution Whale Optimization for Global Search Problem
by: Binh Minh Nguyen, et al.
Published: (2020-01-01)

Enhancing air quality index forecast with string reduction, entropy weight and similarity measure using K-means clustering for fuzzy inference system
by: Kulandhainadar Mariavalavan Ordenshiya, et al.
Published: (2025-12-01)

Private Collaborative Edge Inference via Over-the-Air Computation
by: Selim F. Yilmaz, et al.
Published: (2025-01-01)

Cognitive biases as Bayesian probability weighting in context
by: Bruno Kopp
Published: (2025-08-01)

Robotic optimization of powdered beverages leveraging computer vision and Bayesian optimization
by: Emilia Szymańska, et al.
Published: (2025-06-01)

A Novel Maximum Power Point Inference Method for Distributed Marine Photovoltaic Monitoring
by: Yujie Chen, et al.
Published: (2025-05-01)

Joint Optimization of Computation Offloading and Task Scheduling Using Multi-Objective Arithmetic Optimization Algorithm in Cloud-Fog Computing
by: Asad Ali, et al.
Published: (2024-01-01)

Automated Pruning Framework for Large Language Models Using Combinatorial Optimization
by: Patcharapol Ratsapa, et al.
Published: (2025-05-01)

End-to-End Latency Optimization for Resilient Distributed Convolutional Neural Network Inference in Resource-Constrained Unmanned Aerial Vehicle Swarms
by: Jeongho Kim, et al.
Published: (2024-11-01)

BALI—A Benchmark for Accelerated Language Model Inference
by: Lena Jurkschat, et al.
Published: (2025-01-01)

Context-aware and boundary-optimized model for road marking instance segmentation using MLS point cloud intensity images
by: Dehui Li, et al.
Published: (2025-08-01)

Systematic Initialization Approaches for Portfolio Optimization Problems
by: Mehmet Altinoz, et al.
Published: (2019-01-01)

On the development of a practical Bayesian optimization algorithm for expensive experiments and simulations with changing environmental conditions
by: Mike Diessner, et al.
Published: (2024-01-01)

Mathematical Optimization in Machine Learning for Computational Chemistry
by: Ana Zekić
Published: (2025-07-01)

Fuel Costs Optimization for Long-Haul Flight with Refueling Layovers
by: Chartchai Leenawong, et al.
Published: (2024-12-01)

Bidirectional Collaborative Optimization Scheduling of Adjustable Resources in Computing Node and Power Node
by: ZHOU Qianyufan, YANG Ping, WAN Siyang, CUI Jiayan, LI Fengneng, WEI Zhichu
Published: (2025-02-01)