-
1
-
2
ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
Published 2024-01-01Get full text
Article -
3
Long-context inference optimization for large language models: a survey
Published 2025-01-01“…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”
Get full text
Article -
4
Long-context inference optimization for large language models: a survey
Published 2025-01-01“…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”
Get full text
Article -
5
A study on classification based concurrent API calls and optimal model combination for tool augmented LLMs for AI agent
Published 2025-07-01“…With the rapid advancement of LLMs, enhanced models continue to emerge. Considering the trade-offs between performance and cost in models, it is crucial to find an optimal combination of models in each stage of tool augmented LLM. …”
Get full text
Article -
6
Entropy-Guided KV Caching for Efficient LLM Inference
Published 2025-07-01“…However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associated with managing the key–value (KV) cache during inference. Optimizing this process is therefore crucial for improving LLM efficiency and scalability. …”
Get full text
Article -
7
AsymGroup: Asymmetric Grouping and Communication Optimization for 2D Tensor Parallelism in LLM Inference
Published 2025-01-01“…Recent advances in Large Language Models (LLMs), such as GPT and LLaMA, have demonstrated remarkable capabilities across a wide array of natural language processing tasks. …”
Get full text
Article -
8
ORANSight-2.0: Foundational LLMs for O-RAN
Published 2025-01-01“…We thoroughly evaluate the energy characteristics of ORANSight-2.0, demonstrating its efficiency in training, inference, and inference with RAG augmentation, ensuring optimal performance while maintaining low computational and energy costs. …”
Get full text
Article -
9
Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
Published 2025-01-01“…The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. …”
Get full text
Article -
10
LLMs on a Budget: System-Level Approaches to Power-Efficient and Scalable Fine-Tuning
Published 2025-01-01“…Large Language Models (LLMs) have shown remarkable capabilities in various applications, including robotics, telecommunications, and scientific discovery. …”
Get full text
Article -
11
Few-Shot Optimization for Sensor Data Using Large Language Models: A Case Study on Fatigue Detection
Published 2025-05-01“…In this paper, we propose a novel few-shot optimization with Hybrid Euclidean Distance with Large Language Models (HED-LM) to improve example selection for sensor-based classification tasks. …”
Get full text
Article -
12
-
13
BALI—A Benchmark for Accelerated Language Model Inference
Published 2025-01-01“…These applications rely on real-time or near-real-time responses to process sequential LLM requests, creating a critical demand for efficient and accelerated inference. These developments have led to numerous frameworks optimizing inference speed and resource utilization. …”
Get full text
Article -
14
Impact of Developer Queries on the Effectiveness of Conversational Large Language Models in Programming
Published 2025-06-01“…These findings suggest that the nature of the queries made to LLMs influences the success of programming tasks and provides insights into how AI tools can assist learning in software development.…”
Get full text
Article -
15
Agentic AI Systems: Architecture and Evaluation Using a Frictionless Parking Scenario
Published 2025-01-01“…Key metrics (agent’s response time or latency and lexical consistency) show that a lightweight gpt-4o-mini backbone and concise verbosity minimize latency, while medium prompt specificity and moderate query complexity optimize consistency. Decoding entropy influences stylistic diversity without significant latency costs but reduces consistency at high settings. …”
Get full text
Article -
16
Human-Centered AI for Migrant Integration Through LLM and RAG Optimization
Published 2024-12-01“…Our proposal involves the optimal tuning of key hyperparameters for LLMs and RAG through multi-criteria decision-making (MCDM) methods to ensure the solutions are fair, equitable, and non-discriminatory. …”
Get full text
Article -
17
AI driven cardiovascular risk prediction using NLP and Large Language Models for personalized medicine in athletes
Published 2025-06-01“…Furthermore, the research underscores the role of LLMs in personalized medicine, identifying patient-specific risk factors and optimizing treatment pathways for cardiac patients. …”
Get full text
Article -
18
Exploring the Joint Influence of Built Environment Factors on Urban Rail Transit Peak-Hour Ridership Using DeepSeek
Published 2025-05-01“…Recent advancements in the reasoning capabilities of large language models (LLMs) offer a robust methodological foundation for analyzing the complex joint influence of multiple built environment factors. …”
Get full text
Article -
19
Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
Published 2024-12-01Get full text
Article -
20
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study
Published 2025-06-01“…The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. …”
Get full text
Article