Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer...

Full description

Saved in:
Bibliographic Details
Main Authors: Sama Habibi, Ozgur Ercetin
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11095716/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.
ISSN:2169-3536