Search alternatives:
inference » conference (Expand Search), influence (Expand Search)
Showing 1 - 20 results of 22 for search 'llms inference optimization', query time: 0.11s Refine Results
  1. 1
  2. 2
  3. 3

    Long-context inference optimization for large language models: a survey by TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang

    Published 2025-01-01
    “…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”
    Get full text
    Article
  4. 4

    Long-context inference optimization for large language models: a survey by TAO Wei, WANG Jianzong, ZHANG Xulong, QU Xiaoyang

    Published 2025-01-01
    “…To improve the efficiency of LLMs in long-text inference, a comprehensive review and analysis of existing optimization techniques were conducted. …”
    Get full text
    Article
  5. 5

    A study on classification based concurrent API calls and optimal model combination for tool augmented LLMs for AI agent by HeounMo Go, SangHyun Park

    Published 2025-07-01
    “…With the rapid advancement of LLMs, enhanced models continue to emerge. Considering the trade-offs between performance and cost in models, it is crucial to find an optimal combination of models in each stage of tool augmented LLM. …”
    Get full text
    Article
  6. 6

    Entropy-Guided KV Caching for Efficient LLM Inference by Heekyum Kim, Yuchul Jung

    Published 2025-07-01
    “…However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associated with managing the key–value (KV) cache during inference. Optimizing this process is therefore crucial for improving LLM efficiency and scalability. …”
    Get full text
    Article
  7. 7

    AsymGroup: Asymmetric Grouping and Communication Optimization for 2D Tensor Parallelism in LLM Inference by Ki Tae Kim, Seok-Ju Im, Eui-Young Chung

    Published 2025-01-01
    “…Recent advances in Large Language Models (LLMs), such as GPT and LLaMA, have demonstrated remarkable capabilities across a wide array of natural language processing tasks. …”
    Get full text
    Article
  8. 8

    ORANSight-2.0: Foundational LLMs for O-RAN by Pranshav Gajjar, Vijay K. Shah

    Published 2025-01-01
    “…We thoroughly evaluate the energy characteristics of ORANSight-2.0, demonstrating its efficiency in training, inference, and inference with RAG augmentation, ensuring optimal performance while maintaining low computational and energy costs. …”
    Get full text
    Article
  9. 9

    Survey and Evaluation of Converging Architecture in LLMs Based on Footsteps of Operations by Seongho Kim, Jihyun Moon, Juntaek Oh, Insu Choi, Joon-Sung Yang

    Published 2025-01-01
    “…The evolution of LLMs has been driven by advances in high-bandwidth memory, specialized accelerators, and optimized architectures, enabling models to scale to billions of parameters. …”
    Get full text
    Article
  10. 10

    LLMs on a Budget: System-Level Approaches to Power-Efficient and Scalable Fine-Tuning by Kailash Gogineni, Ali Suvizi, Guru Venkataramani

    Published 2025-01-01
    “…Large Language Models (LLMs) have shown remarkable capabilities in various applications, including robotics, telecommunications, and scientific discovery. …”
    Get full text
    Article
  11. 11

    Few-Shot Optimization for Sensor Data Using Large Language Models: A Case Study on Fatigue Detection by Elsen Ronando, Sozo Inoue

    Published 2025-05-01
    “…In this paper, we propose a novel few-shot optimization with Hybrid Euclidean Distance with Large Language Models (HED-LM) to improve example selection for sensor-based classification tasks. …”
    Get full text
    Article
  12. 12
  13. 13

    BALI—A Benchmark for Accelerated Language Model Inference by Lena Jurkschat, Preetam Gattogi, Sahar Vahdati, Jens Lehmann

    Published 2025-01-01
    “…These applications rely on real-time or near-real-time responses to process sequential LLM requests, creating a critical demand for efficient and accelerated inference. These developments have led to numerous frameworks optimizing inference speed and resource utilization. …”
    Get full text
    Article
  14. 14
  15. 15

    Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Method Development Study by Chunliang Chen, Xinyu Wang, Ming Guan, Wenjing Yue, Yuanbin Wu, Ya Zhou, Xiaoling Wang

    Published 2025-06-01
    “…The optimized LLMs show a high degree of similarity in reasoning results, consistent with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. …”
    Get full text
    Article
  16. 16

    Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets by Arbi Haza Nasution, Aytug Onan, Yohei Murakami, Winda Monika, Anggi Hanafiah

    Published 2025-01-01
    “…However, certain classes—particularly Neutral sentiment and Fear emotion—remain challenging, with lower agreement even among human annotators. Inference time varies significantly: optimized models complete predictions in under an hour, while some large models require several days. …”
    Get full text
    Article
  17. 17

    LAMARS: Large Language Model-Based Anticipation Mechanism Acceleration in Real-Time Robotic Systems by Yifang Gao, Wei Luo, Xuye Wang, Shunshun Zhang, Patrick Goh

    Published 2025-01-01
    “…Large language models (LLMs) have assumed an increasingly crucial role in robotic systems because of their ability to leverage the extensive knowledge they possess in robotic inference and task handling. …”
    Get full text
    Article
  18. 18

    Efficient Management of Safety Documents Using Text-Based Analytics to Extract Safety Attributes From Construction Accident Reports by Vedat Togan, Fatemeh Mostofi, Onur Behzat Tokdemir, Fethi Kadioglu

    Published 2025-01-01
    “…Future work should focus on API creation, secure machine learning pipelines, and optimized deployment of LLMs, particularly in complex contexts.…”
    Get full text
    Article
  19. 19

    Data extraction from polymer literature using large language models by Sonakshi Gupta, Akhlak Mahmood, Pranav Shetty, Aishat Adeboye, Rampi Ramprasad

    Published 2024-12-01
    “…We suggest methodologies to optimize costs, provide insights on effective inference via in-context few-shots learning, and illuminate gaps and opportunities for future studies utilizing LLMs for natural language processing in polymer science. …”
    Get full text
    Article
  20. 20

    InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro by Pengcheng Feng, Yihao Chen, Jinke Yu, Hao Yue, Zhelong Jiang, Yi Xiao, Wan’ang Xiao, Huaxiang Lu, Gang Chen

    Published 2024-12-01
    “…Large Language Models (LLMs), based on transformer architecture, have demonstrated remarkable capabilities in natural language processing tasks, enabling machines to generate human-like text and engage in meaningful dialogues. …”
    Get full text
    Article