FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation

Falcon is a lattice-based quantum-resistant digital signature scheme renowned for its high signature generation/verification speed and compact signature size. The scheme has been selected to be drafted in the third round of the post-quantum cryptography (PQC) standardization process due to its uniq...

Full description

Saved in:
Bibliographic Details
Main Authors: Yi Ouyang, Yihong Zhu, Wenping Zhu, Bohan Yang, Zirui Zhang, Hanning Wang, Qichao Tao, Min Zhu, Shaojun Wei, Leibo Liu
Format: Article
Language:English
Published: Ruhr-Universität Bochum 2024-12-01
Series:Transactions on Cryptographic Hardware and Embedded Systems
Subjects:
Online Access:https://tosc.iacr.org/index.php/TCHES/article/view/11927
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850109477019189248
author Yi Ouyang
Yihong Zhu
Wenping Zhu
Bohan Yang
Zirui Zhang
Hanning Wang
Qichao Tao
Min Zhu
Shaojun Wei
Leibo Liu
author_facet Yi Ouyang
Yihong Zhu
Wenping Zhu
Bohan Yang
Zirui Zhang
Hanning Wang
Qichao Tao
Min Zhu
Shaojun Wei
Leibo Liu
author_sort Yi Ouyang
collection DOAJ
description Falcon is a lattice-based quantum-resistant digital signature scheme renowned for its high signature generation/verification speed and compact signature size. The scheme has been selected to be drafted in the third round of the post-quantum cryptography (PQC) standardization process due to its unique attributes and robust security features. Despite its strengths, there has been a lack of research on hardware acceleration, primarily due to its complex calculation flow and floating-point operations, which hinders its widespread adoption. To address this issue, we propose FalconSign, a high-performance, configurable crypto-processor designed to accelerate Falcon signature generation on FPGA/ASIC through algorithmhardware co-design. Our approach involves a new scheduling flow and architecture for Fast-Fourier Sampling to enhance computing unit reuse and reduce processing time. Additionally, we introduce several optimized modules, including configurable randomness generation units, parallel floating-point processing units, and an optimized SamplerZ module, to improve execution efficiency. Furthermore, this paper presents a finely optimized hardware accelerator for the Falcon scheme. Our FPGA implementation results demonstrate a throughput improvement of approximately 5.1 x compared to state-of-the-art designs, with 2.8x/4.5x/4.2x/3.2x fewer in the area (LUTs/FFs/DSPs/BRAMs)-time product, for NIST security level V. The crypto-processor occupies an area of 0.71 mm2 and achieves 5.2k OPS at throughput on the TSMC 28nm process for NIST security level I.
format Article
id doaj-art-5988255d9aac4eafa6ad2e0e58e917ba
institution OA Journals
issn 2569-2925
language English
publishDate 2024-12-01
publisher Ruhr-Universität Bochum
record_format Article
series Transactions on Cryptographic Hardware and Embedded Systems
spelling doaj-art-5988255d9aac4eafa6ad2e0e58e917ba2025-08-20T02:38:03ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-12-012025110.46586/tches.v2025.i1.203-226FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature GenerationYi Ouyang0Yihong Zhu1Wenping Zhu2Bohan Yang3Zirui Zhang4Hanning Wang5Qichao Tao6Min Zhu7Shaojun Wei8Leibo Liu9Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaWuxi Micro Innovation Integrated Circuit Design Co., Ltd., Wuxi, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, China Falcon is a lattice-based quantum-resistant digital signature scheme renowned for its high signature generation/verification speed and compact signature size. The scheme has been selected to be drafted in the third round of the post-quantum cryptography (PQC) standardization process due to its unique attributes and robust security features. Despite its strengths, there has been a lack of research on hardware acceleration, primarily due to its complex calculation flow and floating-point operations, which hinders its widespread adoption. To address this issue, we propose FalconSign, a high-performance, configurable crypto-processor designed to accelerate Falcon signature generation on FPGA/ASIC through algorithmhardware co-design. Our approach involves a new scheduling flow and architecture for Fast-Fourier Sampling to enhance computing unit reuse and reduce processing time. Additionally, we introduce several optimized modules, including configurable randomness generation units, parallel floating-point processing units, and an optimized SamplerZ module, to improve execution efficiency. Furthermore, this paper presents a finely optimized hardware accelerator for the Falcon scheme. Our FPGA implementation results demonstrate a throughput improvement of approximately 5.1 x compared to state-of-the-art designs, with 2.8x/4.5x/4.2x/3.2x fewer in the area (LUTs/FFs/DSPs/BRAMs)-time product, for NIST security level V. The crypto-processor occupies an area of 0.71 mm2 and achieves 5.2k OPS at throughput on the TSMC 28nm process for NIST security level I. https://tosc.iacr.org/index.php/TCHES/article/view/11927Post-quantum cryptographyFalconLatticeFast-Fourier SamplingFloating-pointHigh-performance
spellingShingle Yi Ouyang
Yihong Zhu
Wenping Zhu
Bohan Yang
Zirui Zhang
Hanning Wang
Qichao Tao
Min Zhu
Shaojun Wei
Leibo Liu
FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
Transactions on Cryptographic Hardware and Embedded Systems
Post-quantum cryptography
Falcon
Lattice
Fast-Fourier Sampling
Floating-point
High-performance
title FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
title_full FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
title_fullStr FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
title_full_unstemmed FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
title_short FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
title_sort falconsign an efficient and high throughput hardware architecture for falcon signature generation
topic Post-quantum cryptography
Falcon
Lattice
Fast-Fourier Sampling
Floating-point
High-performance
url https://tosc.iacr.org/index.php/TCHES/article/view/11927
work_keys_str_mv AT yiouyang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT yihongzhu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT wenpingzhu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT bohanyang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT ziruizhang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT hanningwang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT qichaotao falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT minzhu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT shaojunwei falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration
AT leiboliu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration