BALI—A Benchmark for Accelerated Language Model Inference
The rise of Large Language Models (LLMs) has revolutionized natural language processing, enabling advancements across diverse applications, including chatbots, live translators, content generation, virtual assistants, and domain-specific automation tools. These applications rely on real-time or near...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11026002/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849421656245665792 |
|---|---|
| author | Lena Jurkschat Preetam Gattogi Sahar Vahdati Jens Lehmann |
| author_facet | Lena Jurkschat Preetam Gattogi Sahar Vahdati Jens Lehmann |
| author_sort | Lena Jurkschat |
| collection | DOAJ |
| description | The rise of Large Language Models (LLMs) has revolutionized natural language processing, enabling advancements across diverse applications, including chatbots, live translators, content generation, virtual assistants, and domain-specific automation tools. These applications rely on real-time or near-real-time responses to process sequential LLM requests, creating a critical demand for efficient and accelerated inference. These developments have led to numerous frameworks optimizing inference speed and resource utilization. However, they are often mutually incomparable or are inadequately described due to the lack of standardized benchmarks. Consequently, there is a notable lack of comparison frameworks due to the vast configuration space, bounded factors such as hardware specifications, inference framework parameters, and dataset variations. We propose BALI, an open-source Benchmark for Accelerated Language Model Inference, aiming to provide comprehensive analysis and standardized evaluation metrics to enhance the comparability of LLM performance across configurations. With BALI, we propose substantial measurements to evaluate and rank the efficiency of LLM frameworks across multiple aspects, including sequential decoding, parallelization, and setup efficiency. We show results for mainly small to medium-size models (1-30B parameters) in a sequential or non-batched setup, which is highly relevant for various real-time LLM applications. These observations reveal that the design decisions for such a framework constitute an application-dependent and multidimensional challenge. Thus, our objective is to provide LLM an inference benchmark with a clearly defined evaluation, incorporating multidimensional criteria to provide comparable performance assessments. |
| format | Article |
| id | doaj-art-1cae2238947b431cafffd9ed72168330 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-1cae2238947b431cafffd9ed721683302025-08-20T03:31:24ZengIEEEIEEE Access2169-35362025-01-0113989769898910.1109/ACCESS.2025.357689811026002BALI—A Benchmark for Accelerated Language Model InferenceLena Jurkschat0https://orcid.org/0009-0002-7332-5861Preetam Gattogi1https://orcid.org/0009-0001-4604-2009Sahar Vahdati2Jens Lehmann3Center for Interdisciplinary Digital Sciences (CIDS), ScaDS.AI Dresden/Leipzig, Technische Universität Dresden, Dresden, GermanyCenter for Interdisciplinary Digital Sciences (CIDS), ScaDS.AI Dresden/Leipzig, Technische Universität Dresden, Dresden, GermanyCenter for Interdisciplinary Digital Sciences (CIDS), ScaDS.AI Dresden/Leipzig, Technische Universität Dresden, Dresden, GermanyCenter for Interdisciplinary Digital Sciences (CIDS), ScaDS.AI Dresden/Leipzig, Technische Universität Dresden, Dresden, GermanyThe rise of Large Language Models (LLMs) has revolutionized natural language processing, enabling advancements across diverse applications, including chatbots, live translators, content generation, virtual assistants, and domain-specific automation tools. These applications rely on real-time or near-real-time responses to process sequential LLM requests, creating a critical demand for efficient and accelerated inference. These developments have led to numerous frameworks optimizing inference speed and resource utilization. However, they are often mutually incomparable or are inadequately described due to the lack of standardized benchmarks. Consequently, there is a notable lack of comparison frameworks due to the vast configuration space, bounded factors such as hardware specifications, inference framework parameters, and dataset variations. We propose BALI, an open-source Benchmark for Accelerated Language Model Inference, aiming to provide comprehensive analysis and standardized evaluation metrics to enhance the comparability of LLM performance across configurations. With BALI, we propose substantial measurements to evaluate and rank the efficiency of LLM frameworks across multiple aspects, including sequential decoding, parallelization, and setup efficiency. We show results for mainly small to medium-size models (1-30B parameters) in a sequential or non-batched setup, which is highly relevant for various real-time LLM applications. These observations reveal that the design decisions for such a framework constitute an application-dependent and multidimensional challenge. Thus, our objective is to provide LLM an inference benchmark with a clearly defined evaluation, incorporating multidimensional criteria to provide comparable performance assessments.https://ieeexplore.ieee.org/document/11026002/LLM inferencetransformer decoderLLM inference benchmarkinggeneration speedperformance analysisinference standardization |
| spellingShingle | Lena Jurkschat Preetam Gattogi Sahar Vahdati Jens Lehmann BALI—A Benchmark for Accelerated Language Model Inference IEEE Access LLM inference transformer decoder LLM inference benchmarking generation speed performance analysis inference standardization |
| title | BALI—A Benchmark for Accelerated Language Model Inference |
| title_full | BALI—A Benchmark for Accelerated Language Model Inference |
| title_fullStr | BALI—A Benchmark for Accelerated Language Model Inference |
| title_full_unstemmed | BALI—A Benchmark for Accelerated Language Model Inference |
| title_short | BALI—A Benchmark for Accelerated Language Model Inference |
| title_sort | bali x2014 a benchmark for accelerated language model inference |
| topic | LLM inference transformer decoder LLM inference benchmarking generation speed performance analysis inference standardization |
| url | https://ieeexplore.ieee.org/document/11026002/ |
| work_keys_str_mv | AT lenajurkschat balix2014abenchmarkforacceleratedlanguagemodelinference AT preetamgattogi balix2014abenchmarkforacceleratedlanguagemodelinference AT saharvahdati balix2014abenchmarkforacceleratedlanguagemodelinference AT jenslehmann balix2014abenchmarkforacceleratedlanguagemodelinference |