ResDecode: Accelerating Large Language Models Inference via Residual Decoding Heads

Large language Models (LLMs) have immense potential to enhance the capabilities of Cyber-Physical-Social Intelligence (CPSI) systems, enabling them to better engage with complex cyber, physical, and social environments. However, the high inference latency of LLMs, which is inherited from the autoreg...

Full description

Saved in:
Bibliographic Details
Main Authors: Ziqian Zeng, Jiahong Yu, Qianshi Pang, Zihao Wang, Huiping Zhuang, Fan Yu, Hongen Shao, Xiaofeng Zou
Format: Article
Language:English
Published: Tsinghua University Press 2025-06-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020074
Tags: Add Tag
No Tags, Be the first to tag this record!