A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm

Lattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively....

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Rashid, Omar S. Sonbul, Sajjad Shaukat Jamal, Amar Y. Jaffar, Azamat Kakhorov
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/1/17
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively. Therefore, this work presents a unified NTT hardware accelerator architecture to facilitate the polynomial multiplications of the CRYSTALS-Kyber PQC algorithm. Moreover, a unified butterfly unit design of Cooley–Tukey and Gentleman–Sande configurations is proposed to implement the FNTT and INTT operations using one adder, one multiplier, and one subtractor, sharing four routing multiplexers and one Barrett-based modular reduction unit. The critical path of the proposed butterfly unit is minimized using pipelining. An efficient controller is implemented for control functionalities. The simulation results after the post-place and -route step are provided on Xilinx Virtex-6 and Virtex-7 field-programmable gate array devices. Also, the proposed design is physically implemented for validation on Virtex-7 FPGA. The number of slices utilized on Virtex-6 and Virtex-7 devices is 398 and 312, the required number of clock cycles for one set of FNTT and INTT computations is 1410 and 1540, and the maximum operating frequency is 256 and 290 MHz, respectively. The average figure of merit (FoM), where FoM is the ratio of throughput to slices, illustrates 62% better performance than the most relevant NTT design from the literature.
ISSN:2078-2489