Search Results - llms inference optimization :: Kabale University Library Catalog

Search alternatives:
inference » conference (Expand Search), influence (Expand Search)

1

Efficient LLMs Training and Inference: An Introduction by Rui Li, Deji Fu, Chunyu Shi, Zhilan Huang, Gang Lu

Published 2025-01-01

Get full text

Article

Save to List

Saved in:
2

ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs by Bingjie Xiang, Jiarui Wu, Xiaoying Han, Qian Gu, Fei Chao, Xiao Yang, Fan Wu, Xin Fu

Published 2024-01-01

Get full text

Article

Save to List

Saved in:
3

Long-context inference optimization for large language models: a survey by TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang

Published 2025-01-01
“…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”

Get full text

Article

Save to List

Saved in:
4

Long-context inference optimization for large language models: a survey by TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang

Published 2025-01-01
“…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”

Get full text

Article

Save to List

Saved in:
5

A study on classification based concurrent API calls and optimal model combination for tool augmented LLMs for AI agent by HeounMo Go, SangHyun Park

Published 2025-07-01
“…With the rapid advancement of LLMs, enhanced models continue to emerge. Considering the trade-offs between performance and cost in models, it is crucial to find an optimal combination of models in each stage of tool augmented LLM. …”

Get full text

Article

Save to List

Saved in:
6

Entropy-Guided KV Caching for Efficient LLM Inference by Heekyum Kim, Yuchul Jung

Published 2025-07-01
“…However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associated with managing the key–value (KV) cache during inference. Optimizing this process is therefore crucial for improving LLM efficiency and scalability. …”

Get full text

Article

Save to List

Saved in:
7

AsymGroup: Asymmetric Grouping and Communication Optimization for 2D Tensor Parallelism in LLM Inference by Ki Tae Kim, Seok-Ju Im, Eui-Young Chung

Published 2025-01-01
“…Recent advances in Large Language Models (LLMs), such as GPT and LLaMA, have demonstrated remarkable capabilities across a wide array of natural language processing tasks. …”

Get full text

Article

Save to List

Saved in:
8

ORANSight-2.0: Foundational LLMs for O-RAN by Pranshav Gajjar, Vijay K. Shah

Published 2025-01-01
“…We thoroughly evaluate the energy characteristics of ORANSight-2.0, demonstrating its efficiency in training, inference, and inference with RAG augmentation, ensuring optimal performance while maintaining low computational and energy costs. …”

Get full text

Article

Save to List

Saved in:
9

Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations by Seongho Kim, Jihyun Moon, Juntaek Oh, Insu Choi, Joon-Sung Yang

Published 2025-01-01
“…The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. …”

Get full text

Article

Save to List

Saved in:
10

LLMs on a Budget: System-Level Approaches to Power-Efficient and Scalable Fine-Tuning by Kailash Gogineni, Ali Suvizi, Guru Venkataramani

Published 2025-01-01
“…Large Language Models (LLMs) have shown remarkable capabilities in various applications, including robotics, telecommunications, and scientific discovery. …”

Get full text

Article

Save to List

Saved in:
11

Few-Shot Optimization for Sensor Data Using Large Language Models: A Case Study on Fatigue Detection by Elsen Ronando, Sozo Inoue

Published 2025-05-01
“…In this paper, we propose a novel few-shot optimization with Hybrid Euclidean Distance with Large Language Models (HED-LM) to improve example selection for sensor-based classification tasks. …”

Get full text

Article

Save to List

Saved in:
12

Unveiling the Power of Large Language Models: A Comparative Study of Retrieval-Augmented Generation, Fine-Tuning, and Their Synergistic Fusion for Enhanced Performance by Gulsum Budakoglu, Hakan Emekci

Published 2025-01-01

Get full text

Article

Save to List

Saved in:
13

BALI—A Benchmark for Accelerated Language Model Inference by Lena Jurkschat, Preetam Gattogi, Sahar Vahdati, Jens Lehmann

Published 2025-01-01
“…These applications rely on real-time or near-real-time responses to process sequential LLM requests, creating a critical demand for efficient and accelerated inference. These developments have led to numerous frameworks optimizing inference speed and resource utilization. …”

Get full text

Article

Save to List

Saved in:
14

Probing the Pitfalls: Understanding SVD’s Shortcomings in Language Model Compression by Сергей Александрович Плетенев

Published 2024-12-01

Get full text

Article

Save to List

Saved in:
15

Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study by Chunliang Chen, Xinyu Wang, Ming Guan, Wenjing Yue, Yuanbin Wu, Ya Zhou, Xiaoling Wang

Published 2025-06-01
“…The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. …”

Get full text

Article

Save to List

Saved in:
16

Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets by Arbi Haza Nasution, Aytug Onan, Yohei Murakami, Winda Monika, Anggi Hanafiah

Published 2025-01-01
“…However, certain classes—particularly Neutral sentiment and Fear emotion—remain challenging, with lower agreement even among human annotators. Inference time varies significantly: optimized models complete predictions in under an hour, while some large models require several days. …”

Get full text

Article

Save to List

Saved in:
17

LAMARS: Large Language Model-Based Anticipation Mechanism Acceleration in Real-Time Robotic Systems by Yifang Gao, Wei Luo, Xuye Wang, Shunshun Zhang, Patrick Goh

Published 2025-01-01
“…Large language models (LLMs) have assumed an increasingly crucial role in robotic systems because of their ability to leverage the extensive knowledge they possess in robotic inference and task handling. …”

Get full text

Article

Save to List

Saved in:
18

Efficient Management of Safety Documents Using Text-Based Analytics to Extract Safety Attributes From Construction Accident Reports by Vedat Togan, Fatemeh Mostofi, Onur Behzat Tokdemir, Fethi Kadioglu

Published 2025-01-01
“…Future work should focus on API creation, secure machine learning pipelines, and optimized deployment of LLMs, particularly in complex contexts.…”

Get full text

Article

Save to List

Saved in:
19

Data extraction from polymer literature using large language models by Sonakshi Gupta, Akhlak Mahmood, Pranav Shetty, Aishat Adeboye, Rampi Ramprasad

Published 2024-12-01
“…We suggest methodologies to optimize costs, provide insights on effective inference via in-context few-shots learning, and illuminate gaps and opportunities for future studies utilizing LLMs for natural language processing in polymer science. …”

Get full text

Article

Save to List

Saved in:
20

InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro by Pengcheng Feng, Yihao Chen, Jinke Yu, Hao Yue, Zhelong Jiang, Yi Xiao, Wan’ang Xiao, Huaxiang Lu, Gang Chen

Published 2024-12-01
“…Large Language Models (LLMs), based on transformer architecture, have demonstrated remarkable capabilities in natural language processing tasks, enabling machines to generate human-like text and engage in meaningful dialogues. …”

Get full text

Article

Save to List

Saved in: