Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling
This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11095716/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850075382149021696 |
|---|---|
| author | Sama Habibi Ozgur Ercetin |
| author_facet | Sama Habibi Ozgur Ercetin |
| author_sort | Sama Habibi |
| collection | DOAJ |
| description | This paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints. |
| format | Article |
| id | doaj-art-6aaaecb41fe1494884eb0a1f25fb355a |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-6aaaecb41fe1494884eb0a1f25fb355a2025-08-20T02:46:19ZengIEEEIEEE Access2169-35362025-01-011313161413163710.1109/ACCESS.2025.359230811095716Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive SchedulingSama Habibi0https://orcid.org/0000-0001-8556-5657Ozgur Ercetin1https://orcid.org/0000-0002-3454-5610Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, TürkiyeFaculty of Engineering and Natural Sciences, Sabancı University, Istanbul, TürkiyeThis paper addresses two key challenges in distributed Large Language Model (LLM) inference at the edge: 1) cost-efficient and fair task allocation, and 2) dynamic scheduling under deadline constraints. We propose two mechanisms: the Fair Cost-Efficient Incentive Mechanism (FCIM) for task and layer assignment, and the Adaptive Dynamic Scheduling Algorithm (ADSA) for execution scheduling on individual devices. FCIM is an auction-based mechanism that selects cost-effective, memory-feasible devices while minimizing task latency, reward cost, and device usage. Its adaptive reward design ensures positive utility and fairness, even under shifting system priorities. ADSA enables preemption-aware, deadline-driven scheduling by dynamically reordering tasks based on arrival time and workload characteristics. Simulations demonstrate that FCIM reduces communication overhead by 54.7% and task completion time by 36.9% compared to static and performance-driven baselines, while ADSA reduces queueing delay by 39% under strict deadline constraints.https://ieeexplore.ieee.org/document/11095716/Adaptive schedulingdistributed AIedge computingfair incentive mechanismlarge language modelsresource allocation |
| spellingShingle | Sama Habibi Ozgur Ercetin Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling IEEE Access Adaptive scheduling distributed AI edge computing fair incentive mechanism large language models resource allocation |
| title | Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling |
| title_full | Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling |
| title_fullStr | Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling |
| title_full_unstemmed | Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling |
| title_short | Edge-LLM Inference With Cost-Aware Layer Allocation and Adaptive Scheduling |
| title_sort | edge llm inference with cost aware layer allocation and adaptive scheduling |
| topic | Adaptive scheduling distributed AI edge computing fair incentive mechanism large language models resource allocation |
| url | https://ieeexplore.ieee.org/document/11095716/ |
| work_keys_str_mv | AT samahabibi edgellminferencewithcostawarelayerallocationandadaptivescheduling AT ozgurercetin edgellminferencewithcostawarelayerallocationandadaptivescheduling |