-
1
-
2
ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
Published 2024-01-01Get full text
Article -
3
Long-context inference optimization for large language models: a survey
Published 2025-01-01“…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”
Get full text
Article -
4
Long-context inference optimization for large language models: a survey
Published 2025-01-01“…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”
Get full text
Article -
5
A study on classification based concurrent API calls and optimal model combination for tool augmented LLMs for AI agent
Published 2025-07-01“…With the rapid advancement of LLMs, enhanced models continue to emerge. Considering the trade-offs between performance and cost in models, it is crucial to find an optimal combination of models in each stage of tool augmented LLM. …”
Get full text
Article -
6
Entropy-Guided KV Caching for Efficient LLM Inference
Published 2025-07-01“…However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associated with managing the key–value (KV) cache during inference. Optimizing this process is therefore crucial for improving LLM efficiency and scalability. …”
Get full text
Article -
7
AsymGroup: Asymmetric Grouping and Communication Optimization for 2D Tensor Parallelism in LLM Inference
Published 2025-01-01“…Recent advances in Large Language Models (LLMs), such as GPT and LLaMA, have demonstrated remarkable capabilities across a wide array of natural language processing tasks. …”
Get full text
Article -
8
ORANSight-2.0: Foundational LLMs for O-RAN
Published 2025-01-01“…We thoroughly evaluate the energy characteristics of ORANSight-2.0, demonstrating its efficiency in training, inference, and inference with RAG augmentation, ensuring optimal performance while maintaining low computational and energy costs. …”
Get full text
Article -
9
Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations
Published 2025-01-01“…The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. …”
Get full text
Article -
10
LLMs on a Budget: System-Level Approaches to Power-Efficient and Scalable Fine-Tuning
Published 2025-01-01“…Large Language Models (LLMs) have shown remarkable capabilities in various applications, including robotics, telecommunications, and scientific discovery. …”
Get full text
Article -
11
Few-Shot Optimization for Sensor Data Using Large Language Models: A Case Study on Fatigue Detection
Published 2025-05-01“…In this paper, we propose a novel few-shot optimization with Hybrid Euclidean Distance with Large Language Models (HED-LM) to improve example selection for sensor-based classification tasks. …”
Get full text
Article -
12
-
13
BALI—A Benchmark for Accelerated Language Model Inference
Published 2025-01-01“…These applications rely on real-time or near-real-time responses to process sequential LLM requests, creating a critical demand for efficient and accelerated inference. These developments have led to numerous frameworks optimizing inference speed and resource utilization. …”
Get full text
Article -
14
Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression
Published 2024-12-01Get full text
Article -
15
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study
Published 2025-06-01“…The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. …”
Get full text
Article -
16
Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets
Published 2025-01-01“…However, certain classes—particularly Neutral sentiment and Fear emotion—remain challenging, with lower agreement even among human annotators. Inference time varies significantly: optimized models complete predictions in under an hour, while some large models require several days. …”
Get full text
Article -
17
LAMARS: Large Language Model-Based Anticipation Mechanism Acceleration in Real-Time Robotic Systems
Published 2025-01-01“…Large language models (LLMs) have assumed an increasingly crucial role in robotic systems because of their ability to leverage the extensive knowledge they possess in robotic inference and task handling. …”
Get full text
Article -
18
Efficient Management of Safety Documents Using Text-Based Analytics to Extract Safety Attributes From Construction Accident Reports
Published 2025-01-01“…Future work should focus on API creation, secure machine learning pipelines, and optimized deployment of LLMs, particularly in complex contexts.…”
Get full text
Article -
19
Data extraction from polymer literature using large language models
Published 2024-12-01“…We suggest methodologies to optimize costs, provide insights on effective inference via in-context few-shots learning, and illuminate gaps and opportunities for future studies utilizing LLMs for natural language processing in polymer science. …”
Get full text
Article -
20
InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro
Published 2024-12-01“…Large Language Models (LLMs), based on transformer architecture, have demonstrated remarkable capabilities in natural language processing tasks, enabling machines to generate human-like text and engage in meaningful dialogues. …”
Get full text
Article