Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling

This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer...

Full description

Saved in:
Bibliographic Details
Main Authors: Sama Habibi, Ozgur Ercetin
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11095716/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850075382149021696
author Sama Habibi
Ozgur Ercetin
author_facet Sama Habibi
Ozgur Ercetin
author_sort Sama Habibi
collection DOAJ
description This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.
format Article
id doaj-art-6aaaecb41fe1494884eb0a1f25fb355a
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-6aaaecb41fe1494884eb0a1f25fb355a2025-08-20T02:46:19ZengIEEEIEEE Access2169-35362025-01-011313161413163710.1109/ACCESS.2025.359230811095716Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive SchedulingSama Habibi0https://orcid.org/0000-0001-8556-5657Ozgur Ercetin1https://orcid.org/0000-0002-3454-5610Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, TürkiyeFaculty of Engineering and Natural Sciences, Sabancı University, Istanbul, TürkiyeThis paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.https://ieeexplore.ieee.org/document/11095716/Adaptive schedulingdistributed AIedge computingfair incentive mechanismlarge language modelsresource allocation
spellingShingle Sama Habibi
Ozgur Ercetin
Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
IEEE Access
Adaptive scheduling
distributed AI
edge computing
fair incentive mechanism
large language models
resource allocation
title Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
title_full Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
title_fullStr Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
title_full_unstemmed Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
title_short Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
title_sort edge llm inference with cost aware layer allocation and adaptive scheduling
topic Adaptive scheduling
distributed AI
edge computing
fair incentive mechanism
large language models
resource allocation
url https://ieeexplore.ieee.org/document/11095716/
work_keys_str_mv AT samahabibi edgellminferencewithcostawarelayerallocationandadaptivescheduling
AT ozgurercetin edgellminferencewithcostawarelayerallocationandadaptivescheduling