Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations

Large language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Seongho Kim, Jihyun Moon, Juntaek Oh, Insu Choi, Joon-Sung Yang
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Open Journal of the Computer Society
Subjects:	Edge computing LLM NLP transformer architecture and server deployment
Online Access:	https://ieeexplore.ieee.org/document/11072851/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849340004817436672
author	Seongho Kim Jihyun Moon Juntaek Oh Insu Choi Joon-Sung Yang
author_facet	Seongho Kim Jihyun Moon Juntaek Oh Insu Choi Joon-Sung Yang
author_sort	Seongho Kim
collection	DOAJ
description	Large language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms to process context effectively and address long-term dependencies more efficiently than earlier approaches, such as recurrent neural networks (RNNs). Notably, since the introduction of Llama, the architectural development of LLMs has significantly converged, predominantly settling on a Transformer-based decoder-only architecture. The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. However, it also introduces new challenges: meeting compute and memory efficiency requirements across diverse deployment targets, ranging from data center servers to resource-constrained edge devices. To address these challenges, we survey the evolution of LLMs at two complementary levels: architectural trends and their underlying operational mechanisms. Furthermore, we quantify how hyperparameter settings influence inference latency by profiling kernel-level execution on a modern GPU architecture. Our findings reveal that identical models can exhibit varying performance based on hyperparameter configurations and deployment contexts, emphasizing the need for scalable and efficient solutions. The insights distilled from this analysis guide the optimization of performance and efficiency within these converged LLM architectures, thereby extending their applicability across a broader range of environments.
format	Article
id	doaj-art-94a3ad65896745de8afdf0519079bd05
institution	Kabale University
issn	2644-1268
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of the Computer Society
spelling	doaj-art-94a3ad65896745de8afdf0519079bd052025-08-20T03:44:00ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-0161214122610.1109/OJCS.2025.358700511072851Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of OperationsSeongho Kim0https://orcid.org/0009-0008-9306-9301Jihyun Moon1https://orcid.org/0009-0006-5280-3392Juntaek Oh2https://orcid.org/0009-0006-6869-6069Insu Choi3https://orcid.org/0009-0009-2016-6714Joon-Sung Yang4https://orcid.org/0000-0002-1502-5353Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Systems Semiconductor Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaLarge language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms to process context effectively and address long-term dependencies more efficiently than earlier approaches, such as recurrent neural networks (RNNs). Notably, since the introduction of Llama, the architectural development of LLMs has significantly converged, predominantly settling on a Transformer-based decoder-only architecture. The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. However, it also introduces new challenges: meeting compute and memory efficiency requirements across diverse deployment targets, ranging from data center servers to resource-constrained edge devices. To address these challenges, we survey the evolution of LLMs at two complementary levels: architectural trends and their underlying operational mechanisms. Furthermore, we quantify how hyperparameter settings influence inference latency by profiling kernel-level execution on a modern GPU architecture. Our findings reveal that identical models can exhibit varying performance based on hyperparameter configurations and deployment contexts, emphasizing the need for scalable and efficient solutions. The insights distilled from this analysis guide the optimization of performance and efficiency within these converged LLM architectures, thereby extending their applicability across a broader range of environments.https://ieeexplore.ieee.org/document/11072851/Edge computingLLMNLPtransformer architectureand server deployment
spellingShingle	Seongho Kim Jihyun Moon Juntaek Oh Insu Choi Joon-Sung Yang Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations IEEE Open Journal of the Computer Society Edge computing LLM NLP transformer architecture and server deployment
title	Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_full	Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_fullStr	Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_full_unstemmed	Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_short	Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_sort	survey and evaluation of converging architecture in llms based on footsteps of operations
topic	Edge computing LLM NLP transformer architecture and server deployment
url	https://ieeexplore.ieee.org/document/11072851/
work_keys_str_mv	AT seonghokim surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT jihyunmoon surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT juntaekoh surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT insuchoi surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations AT joonsungyang surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations

Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations

Similar Items