Infrastructure for the deployment of Large Language Models: challenges and solutions
Large Language Models are increasingly prevalent, and their capabilities are advancing rapidly due to extensive research in this field. A growing number of models are being developed, with sizes significantly surpassing 70 billion parameters. As a result, the ability to perform efficient and scalabl...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Polish Academy of Sciences
2025-07-01
|
| Series: | International Journal of Electronics and Telecommunications |
| Subjects: | |
| Online Access: | https://journals.pan.pl/Content/135740/12_4999_Walkowiak_L_sk.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Large Language Models are increasingly prevalent, and their capabilities are advancing rapidly due to extensive research in this field. A growing number of models are being developed, with sizes significantly surpassing 70 billion parameters. As a result, the ability to perform efficient and scalable inferences on these models is becoming crucial to maximize the utilization of valuable resources such as GPUs and CPUs. This thesis outlines a process for selecting the most effective tools for efficient inference, supported by the results of experiments. Additionally, it provides a comprehensive description of an end-to-end system for the inference process, encompassing all components from model inference and communication to user management and a userfriendly web interface. Furthermore, we detail the development of an LLM chatbot that leverages the function-calling capabilities of LLMs and integrates various external tools, including weather prediction, Wikipedia information, symbolic math, and image generation. |
|---|---|
| ISSN: | 2081-8491 2300-1933 |