Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sama Habibi, Ozgur Ercetin
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Adaptive scheduling distributed AI edge computing fair incentive mechanism large language models resource allocation
Online Access:	https://ieeexplore.ieee.org/document/11095716/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.
ISSN:	2169-3536

Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

Similar Items