A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm
Lattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively....
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/16/1/17 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588359311032320 |
---|---|
author | Muhammad Rashid Omar S. Sonbul Sajjad Shaukat Jamal Amar Y. Jaffar Azamat Kakhorov |
author_facet | Muhammad Rashid Omar S. Sonbul Sajjad Shaukat Jamal Amar Y. Jaffar Azamat Kakhorov |
author_sort | Muhammad Rashid |
collection | DOAJ |
description | Lattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively. Therefore, this work presents a unified NTT hardware accelerator architecture to facilitate the polynomial multiplications of the CRYSTALS-Kyber PQC algorithm. Moreover, a unified butterfly unit design of Cooley–Tukey and Gentleman–Sande configurations is proposed to implement the FNTT and INTT operations using one adder, one multiplier, and one subtractor, sharing four routing multiplexers and one Barrett-based modular reduction unit. The critical path of the proposed butterfly unit is minimized using pipelining. An efficient controller is implemented for control functionalities. The simulation results after the post-place and -route step are provided on Xilinx Virtex-6 and Virtex-7 field-programmable gate array devices. Also, the proposed design is physically implemented for validation on Virtex-7 FPGA. The number of slices utilized on Virtex-6 and Virtex-7 devices is 398 and 312, the required number of clock cycles for one set of FNTT and INTT computations is 1410 and 1540, and the maximum operating frequency is 256 and 290 MHz, respectively. The average figure of merit (FoM), where FoM is the ratio of throughput to slices, illustrates 62% better performance than the most relevant NTT design from the literature. |
format | Article |
id | doaj-art-9557e23be65248309505890f2acd893f |
institution | Kabale University |
issn | 2078-2489 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Information |
spelling | doaj-art-9557e23be65248309505890f2acd893f2025-01-24T13:35:09ZengMDPI AGInformation2078-24892024-12-011611710.3390/info16010017A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC AlgorithmMuhammad Rashid0Omar S. Sonbul1Sajjad Shaukat Jamal2Amar Y. Jaffar3Azamat Kakhorov4Computer and Network Engineering Department, Umm Al-Qura University, Makkah 24382, Saudi ArabiaComputer and Network Engineering Department, Umm Al-Qura University, Makkah 24382, Saudi ArabiaDepartment of Mathematics, College of Science, King Khalid University, Abha 61413, Saudi ArabiaComputer and Network Engineering Department, Umm Al-Qura University, Makkah 24382, Saudi ArabiaDepartment of Artificial intelligence, Tashkent State University of Economics, Tashkent 100066, UzbekistanLattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively. Therefore, this work presents a unified NTT hardware accelerator architecture to facilitate the polynomial multiplications of the CRYSTALS-Kyber PQC algorithm. Moreover, a unified butterfly unit design of Cooley–Tukey and Gentleman–Sande configurations is proposed to implement the FNTT and INTT operations using one adder, one multiplier, and one subtractor, sharing four routing multiplexers and one Barrett-based modular reduction unit. The critical path of the proposed butterfly unit is minimized using pipelining. An efficient controller is implemented for control functionalities. The simulation results after the post-place and -route step are provided on Xilinx Virtex-6 and Virtex-7 field-programmable gate array devices. Also, the proposed design is physically implemented for validation on Virtex-7 FPGA. The number of slices utilized on Virtex-6 and Virtex-7 devices is 398 and 312, the required number of clock cycles for one set of FNTT and INTT computations is 1410 and 1540, and the maximum operating frequency is 256 and 290 MHz, respectively. The average figure of merit (FoM), where FoM is the ratio of throughput to slices, illustrates 62% better performance than the most relevant NTT design from the literature.https://www.mdpi.com/2078-2489/16/1/17Number theoretic transform (NTT)CRYSTALS-Kyberhardware acceleratorPQCFPGA |
spellingShingle | Muhammad Rashid Omar S. Sonbul Sajjad Shaukat Jamal Amar Y. Jaffar Azamat Kakhorov A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm Information Number theoretic transform (NTT) CRYSTALS-Kyber hardware accelerator PQC FPGA |
title | A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm |
title_full | A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm |
title_fullStr | A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm |
title_full_unstemmed | A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm |
title_short | A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm |
title_sort | pipelined hardware design of fntt and intt of crystals kyber pqc algorithm |
topic | Number theoretic transform (NTT) CRYSTALS-Kyber hardware accelerator PQC FPGA |
url | https://www.mdpi.com/2078-2489/16/1/17 |
work_keys_str_mv | AT muhammadrashid apipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT omarssonbul apipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT sajjadshaukatjamal apipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT amaryjaffar apipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT azamatkakhorov apipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT muhammadrashid pipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT omarssonbul pipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT sajjadshaukatjamal pipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT amaryjaffar pipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm AT azamatkakhorov pipelinedhardwaredesignoffnttandinttofcrystalskyberpqcalgorithm |