Fully Quantized Matrix Arithmetic-Only BERT Model and Its FPGA-Based Accelerator

In this paper, we propose a fully quantized matrix arithmetic-only BERT (FQ MA-BERT) model to enable efficient natural language processing. Conventionally, the BERT model relies on floating point arithmetic for inference and requires not only linear matrix multiplication but also nonlinear functions...

Full description

Saved in:
Bibliographic Details
Main Authors: Hiroshi Fuketa, Toshihiro Katashita, Yohei Hori, Masakazu Hioki
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11045889/
Tags: Add Tag
No Tags, Be the first to tag this record!