Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
Large language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms t...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Open Journal of the Computer Society |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11072851/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849340004817436672 |
|---|---|
| author | Seongho Kim Jihyun Moon Juntaek Oh Insu Choi Joon-Sung Yang |
| author_facet | Seongho Kim Jihyun Moon Juntaek Oh Insu Choi Joon-Sung Yang |
| author_sort | Seongho Kim |
| collection | DOAJ |
| description | Large language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms to process context effectively and address long-term dependencies more efficiently than earlier approaches, such as recurrent neural networks (RNNs). Notably, since the introduction of Llama, the architectural development of LLMs has significantly converged, predominantly settling on a Transformer-based decoder-only architecture. The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. However, it also introduces new challenges: meeting compute and memory efficiency requirements across diverse deployment targets, ranging from data center servers to resource-constrained edge devices. To address these challenges, we survey the evolution of LLMs at two complementary levels: architectural trends and their underlying operational mechanisms. Furthermore, we quantify how hyperparameter settings influence inference latency by profiling kernel-level execution on a modern GPU architecture. Our findings reveal that identical models can exhibit varying performance based on hyperparameter configurations and deployment contexts, emphasizing the need for scalable and efficient solutions. The insights distilled from this analysis guide the optimization of performance and efficiency within these converged LLM architectures, thereby extending their applicability across a broader range of environments. |
| format | Article |
| id | doaj-art-94a3ad65896745de8afdf0519079bd05 |
| institution | Kabale University |
| issn | 2644-1268 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Open Journal of the Computer Society |
| spelling | doaj-art-94a3ad65896745de8afdf0519079bd052025-08-20T03:44:00ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-0161214122610.1109/OJCS.2025.358700511072851Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of OperationsSeongho Kim0https://orcid.org/0009-0008-9306-9301Jihyun Moon1https://orcid.org/0009-0006-5280-3392Juntaek Oh2https://orcid.org/0009-0006-6869-6069Insu Choi3https://orcid.org/0009-0009-2016-6714Joon-Sung Yang4https://orcid.org/0000-0002-1502-5353Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Systems Semiconductor Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaLarge language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms to process context effectively and address long-term dependencies more efficiently than earlier approaches, such as recurrent neural networks (RNNs). Notably, since the introduction of Llama, the architectural development of LLMs has significantly converged, predominantly settling on a Transformer-based decoder-only architecture. The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. However, it also introduces new challenges: meeting compute and memory efficiency requirements across diverse deployment targets, ranging from data center servers to resource-constrained edge devices. To address these challenges, we survey the evolution of LLMs at two complementary levels: architectural trends and their underlying operational mechanisms. Furthermore, we quantify how hyperparameter settings influence inference latency by profiling kernel-level execution on a modern GPU architecture. Our findings reveal that identical models can exhibit varying performance based on hyperparameter configurations and deployment contexts, emphasizing the need for scalable and efficient solutions. The insights distilled from this analysis guide the optimization of performance and efficiency within these converged LLM architectures, thereby extending their applicability across a broader range of environments.https://ieeexplore.ieee.org/document/11072851/Edge computingLLMNLPtransformer architectureand server deployment |
| spellingShingle | Seongho Kim Jihyun Moon Juntaek Oh Insu Choi Joon-Sung Yang Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations IEEE Open Journal of the Computer Society Edge computing LLM NLP transformer architecture and server deployment |
| title | Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations |
| title_full | Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations |
| title_fullStr | Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations |
| title_full_unstemmed | Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations |
| title_short | Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations |
| title_sort | survey and evaluation of converging architecture in llms based on footsteps of operations |
| topic | Edge computing LLM NLP transformer architecture and server deployment |
| url | https://ieeexplore.ieee.org/document/11072851/ |
| work_keys_str_mv | AT seonghokim surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT jihyunmoon surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT juntaekoh surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT insuchoi surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT joonsungyang surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations |