Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations

Large language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms t...

Full description

Saved in:
Bibliographic Details
Main Authors: Seongho Kim, Jihyun Moon, Juntaek Oh, Insu Choi, Joon-Sung Yang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11072851/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849340004817436672
author Seongho Kim
Jihyun Moon
Juntaek Oh
Insu Choi
Joon-Sung Yang
author_facet Seongho Kim
Jihyun Moon
Juntaek Oh
Insu Choi
Joon-Sung Yang
author_sort Seongho Kim
collection DOAJ
description Large language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms to process context effectively and address long-term dependencies more efficiently than earlier approaches, such as recurrent neural networks (RNNs). Notably, since the introduction of Llama, the architectural development of LLMs has significantly converged, predominantly settling on a Transformer-based decoder-only architecture. The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. However, it also introduces new challenges: meeting compute and memory efficiency requirements across diverse deployment targets, ranging from data center servers to resource-constrained edge devices. To address these challenges, we survey the evolution of LLMs at two complementary levels: architectural trends and their underlying operational mechanisms. Furthermore, we quantify how hyperparameter settings influence inference latency by profiling kernel-level execution on a modern GPU architecture. Our findings reveal that identical models can exhibit varying performance based on hyperparameter configurations and deployment contexts, emphasizing the need for scalable and efficient solutions. The insights distilled from this analysis guide the optimization of performance and efficiency within these converged LLM architectures, thereby extending their applicability across a broader range of environments.
format Article
id doaj-art-94a3ad65896745de8afdf0519079bd05
institution Kabale University
issn 2644-1268
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Computer Society
spelling doaj-art-94a3ad65896745de8afdf0519079bd052025-08-20T03:44:00ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-0161214122610.1109/OJCS.2025.358700511072851Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of OperationsSeongho Kim0https://orcid.org/0009-0008-9306-9301Jihyun Moon1https://orcid.org/0009-0006-5280-3392Juntaek Oh2https://orcid.org/0009-0006-6869-6069Insu Choi3https://orcid.org/0009-0009-2016-6714Joon-Sung Yang4https://orcid.org/0000-0002-1502-5353Department of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Systems Semiconductor Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaLarge language models (LLMs), which have emerged from advances in natural language processing (NLP), enable chatbots, virtual assistants, and numerous domain-specific applications. These models, often comprising billions of parameters, leverage the Transformer architecture and Attention mechanisms to process context effectively and address long-term dependencies more efficiently than earlier approaches, such as recurrent neural networks (RNNs). Notably, since the introduction of Llama, the architectural development of LLMs has significantly converged, predominantly settling on a Transformer-based decoder-only architecture. The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. However, it also introduces new challenges: meeting compute and memory efficiency requirements across diverse deployment targets, ranging from data center servers to resource-constrained edge devices. To address these challenges, we survey the evolution of LLMs at two complementary levels: architectural trends and their underlying operational mechanisms. Furthermore, we quantify how hyperparameter settings influence inference latency by profiling kernel-level execution on a modern GPU architecture. Our findings reveal that identical models can exhibit varying performance based on hyperparameter configurations and deployment contexts, emphasizing the need for scalable and efficient solutions. The insights distilled from this analysis guide the optimization of performance and efficiency within these converged LLM architectures, thereby extending their applicability across a broader range of environments.https://ieeexplore.ieee.org/document/11072851/Edge computingLLMNLPtransformer architectureand server deployment
spellingShingle Seongho Kim
Jihyun Moon
Juntaek Oh
Insu Choi
Joon-Sung Yang
Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
IEEE Open Journal of the Computer Society
Edge computing
LLM
NLP
transformer architecture
and server deployment
title Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_full Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_fullStr Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_full_unstemmed Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_short Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
title_sort survey and evaluation of converging architecture in llms based on footsteps of operations
topic Edge computing
LLM
NLP
transformer architecture
and server deployment
url https://ieeexplore.ieee.org/document/11072851/
work_keys_str_mv AT seonghokim surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations
AT jihyunmoon surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations
AT juntaekoh surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations
AT insuchoi surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations
AT joonsungyang surveyandevaluationofconvergingarchitectureinllmsbasedonfootstepsofoperations