Infrastructure for the deployment of Large Language Models: challenges and solutions

Large Language Models are increasingly prevalent, and their capabilities are advancing rapidly due to extensive research in this field. A growing number of models are being developed, with sizes significantly surpassing 70 billion parameters. As a result, the ability to perform efficient and scalabl...

Full description

Saved in:
Bibliographic Details
Main Authors: Tomasz Walkowiak, Bartosz Walkowiak
Format: Article
Language:English
Published: Polish Academy of Sciences 2025-07-01
Series:International Journal of Electronics and Telecommunications
Subjects:
Online Access:https://journals.pan.pl/Content/135740/12_4999_Walkowiak_L_sk.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large Language Models are increasingly prevalent, and their capabilities are advancing rapidly due to extensive research in this field. A growing number of models are being developed, with sizes significantly surpassing 70 billion parameters. As a result, the ability to perform efficient and scalable inferences on these models is becoming crucial to maximize the utilization of valuable resources such as GPUs and CPUs. This thesis outlines a process for selecting the most effective tools for efficient inference, supported by the results of experiments. Additionally, it provides a comprehensive description of an end-to-end system for the inference process, encompassing all components from model inference and communication to user management and a userfriendly web interface. Furthermore, we detail the development of an LLM chatbot that leverages the function-calling capabilities of LLMs and integrates various external tools, including weather prediction, Wikipedia information, symbolic math, and image generation.
ISSN:2081-8491
2300-1933