FPGA Acceleration With Hessian-Based Comprehensive Intra-Layer Mixed-Precision Quantization for Transformer Models
Recent advancements in using FPGAs as co-processors for language model acceleration, particularly for energy efficiency and flexibility, face challenges due to limited memory capacity. This limitation hinders the deployment of transformer-based language models. To address this challenge, we propose...
Saved in:
| Main Authors: | Woohong Byun, Jongseok Woo, Saibal Mukhopadhyay |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10973048/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs
by: Mustafa Tasci, et al.
Published: (2025-01-01) -
An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
by: Xin Zhou, et al.
Published: (2024-01-01) -
Survey of FPGA based recurrent neural network accelerator
by: Chen GAO, et al.
Published: (2019-08-01) -
Image Processing Hardware Acceleration—A Review of Operations Involved and Current Hardware Approaches
by: Costin-Emanuel Vasile, et al.
Published: (2024-11-01) -
An NVMe-Based Secure Computing Platform With FPGA-Based TFHE Accelerator
by: Yoshihiro Ohba, et al.
Published: (2025-01-01)