Entropy-Guided KV Caching for Efficient LLM Inference

Entropy-Guided KV Caching for Efficient LLM Inference

Large language models (LLMs), built upon Transformer architectures, have demonstrated remarkable performance in a wide range of natural language processing tasks. However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Heekyum Kim, Yuchul Jung
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Mathematics
Subjects:	LLM KV cache transformer LLM inference optimization attention entropy memory-efficient caching
Online Access:	https://www.mdpi.com/2227-7390/13/15/2366
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Data caching technologies in modern microprocessors
by: V. A. Egunov, et al.
Published: (2024-10-01)

Using Retrieval vs. Cache Augmented Generation for a Pok´emon Chatbot
by: Cengiz Gunay, et al.
Published: (2025-05-01)

A Hierarchical Cache Architecture-Oriented Cache Management Scheme for Information-Centric Networking
by: Yichao Chao, et al.
Published: (2025-01-01)

THE IMPACT ANALYSIS OF PREFETCH IN THE CACHE ON THE MICROPROCESSOR PERFORMANCE
by: B. Z. Shmeylin
Published: (2016-04-01)

BALI—A Benchmark for Accelerated Language Model Inference
by: Lena Jurkschat, et al.
Published: (2025-01-01)

Comprehensive Review of Collaborative Data Caching in Edge Computing
by: Abdul Tawab Akrami, et al.
Published: (2025-01-01)

Cache Assignment for a Flexible Mobile User in Wireless Heterogeneous Networks
by: Mohammad Hossein Amerimehr, et al.
Published: (2025-03-01)

Cache Coherence Protocol Design and Simulation Using IES (Invalid Exclusive read/write Shared) State
by: Baghdad Science Journal
Published: (2017-03-01)

Hybrid update / invalidate schemes for cache coherence protocols
by: R. V. Dovgopol, et al.
Published: (2016-11-01)

A Lightweight Caching Decision Strategy Based on Node Edge-Degree for Information Centric Networking
by: Qifeng Yang, et al.
Published: (2025-01-01)

ADAPTIVE FUZZY CACHING ALGORITHM FOR PROXY SERVERS
by: Alexander Igorevich Zhukov
Published: (2012-12-01)

A Latency-Aware and Resource-Efficient Content Caching Scheme for Content-Centric Networks
by: Yasar Khan, et al.
Published: (2025-01-01)

Probability-based heuristic content placement method for ICN caching
by: Hai-bo WU, et al.
Published: (2016-05-01)

Simulation of direct mapped, k-way and fully associative cache on all pairs shortest paths algorithms
by: A. A. Prihozhy
Published: (2019-12-01)

CacheCraft: A Topology-Aware PageRank Centrality Algorithm for Cache Optimization in Named Data Networking
by: Ridha M. Negara, et al.
Published: (2025-04-01)

Hierarchical division-based cache storage strategy in content-centric networking
by: Jun LI, et al.
Published: (2016-01-01)

Efficient Cache Performance Equivalent 2D-Texel to Memory Mapping Identification for Embedded GPUs
by: Ahmed El-Mahdy, et al.
Published: (2025-01-01)

Cooperative caching algorithm of UAV and user in UAV-assisted cellular network
by: Tiankui ZHANG, et al.
Published: (2020-09-01)

Optimizing proactive content caching with mobility aware deep reinforcement & asynchronous federate learning in VEC
by: Afsana Kabir Sinthia, et al.
Published: (2025-04-01)

Cache timing attack on SMS4
by: ZHAO Xin-jie, et al.
Published: (2010-01-01)

Coded Caching: User to Access Points Distribution in Wireless Local Area Networks
by: Mirna Haidar, et al.
Published: (2025-01-01)

Crowd-based collaboration caching mechanism in smart identifier network
by: Haifeng LI, et al.
Published: (2018-12-01)

Cache Aging with Learning (CAL): A Freshness-Based Data Caching Method for Information-Centric Networking on the Internet of Things (IoT)
by: Nemat Hazrati, et al.
Published: (2025-01-01)

Edge-first-based cooperative caching strategy in information centric networking
by: Jiang ZHI, et al.
Published: (2017-03-01)

An efficient routing and cache management algorithm based on interest-community for opportunity networks
by: Zhi REN, et al.
Published: (2018-06-01)

Truth Space Method for Caching Database Queries
by: S. V. Mosin, et al.
Published: (2015-04-01)

Design and Application of GISC Beijing Global Cache
by: Gu Wenjing, et al.
Published: (2025-07-01)

Caching strategy based on transmission delay for D2D cooperative edge caching system
by: Yan CAI, et al.
Published: (2021-03-01)

Service Caching in Multi-Tier Fog Radio Access Networks
by: Ayaz Ahmad, et al.
Published: (2024-01-01)

A model of ensuring LLM cybersecurity
by: Oleksii Neretin, et al.
Published: (2025-05-01)

Research on caching strategy based on transmission delay in Cell-Free massive MIMO systems
by: Rui WANG, et al.
Published: (2021-12-01)

CacheSim: A cache simulation framework for evaluating caching algorithms on resource-constrained edge devices
by: Jian Liu, et al.
Published: (2025-02-01)

Generative AI in cybersecurity: A comprehensive review of LLM applications and vulnerabilities
by: Mohamed Amine Ferrag, et al.
Published: (2025-01-01)

Hybrid deduplication system with content-based cache for cloud environment
by: Amdewar Godavari, et al.
Published: (2024-06-01)

You believe your LLM is not delusional? Think again! a study of LLM hallucination on foundation models under perturbation
by: Anirban Saha, et al.
Published: (2025-05-01)

Evaluating Large Language Models in Code Generation: INFINITE Methodology for Defining the Inference Index
by: Nicholas Christakis, et al.
Published: (2025-03-01)

On-Board Content Caching With Dynamic Cache Reconfiguration in Multi-Layer Satellite Edge Networks
by: Haftay Gebreslasie Abreha, et al.
Published: (2025-01-01)

Enhancing LLM Reasoning Capabilities Through Brokered Multi-Expert Reflection
by: Tejasvee Sheokand, et al.
Published: (2025-01-01)

Modeling and Evaluating a Cache System in ICN Routers Using a Programmable Switch and Computers
by: Junji Takemasa, et al.
Published: (2024-01-01)

Centralized Hierarchical Coded Caching Scheme for Two-Layer Network
by: Kun Zhao, et al.
Published: (2025-03-01)