FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation
Falcon is a lattice-based quantum-resistant digital signature scheme renowned for its high signature generation/verification speed and compact signature size. The scheme has been selected to be drafted in the third round of the post-quantum cryptography (PQC) standardization process due to its uniq...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Ruhr-Universität Bochum
2024-12-01
|
| Series: | Transactions on Cryptographic Hardware and Embedded Systems |
| Subjects: | |
| Online Access: | https://tosc.iacr.org/index.php/TCHES/article/view/11927 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850109477019189248 |
|---|---|
| author | Yi Ouyang Yihong Zhu Wenping Zhu Bohan Yang Zirui Zhang Hanning Wang Qichao Tao Min Zhu Shaojun Wei Leibo Liu |
| author_facet | Yi Ouyang Yihong Zhu Wenping Zhu Bohan Yang Zirui Zhang Hanning Wang Qichao Tao Min Zhu Shaojun Wei Leibo Liu |
| author_sort | Yi Ouyang |
| collection | DOAJ |
| description |
Falcon is a lattice-based quantum-resistant digital signature scheme renowned for its high signature generation/verification speed and compact signature size. The scheme has been selected to be drafted in the third round of the post-quantum cryptography (PQC) standardization process due to its unique attributes and robust security features. Despite its strengths, there has been a lack of research on hardware acceleration, primarily due to its complex calculation flow and floating-point operations, which hinders its widespread adoption. To address this issue, we propose FalconSign, a high-performance, configurable crypto-processor designed to accelerate Falcon signature generation on FPGA/ASIC through algorithmhardware co-design. Our approach involves a new scheduling flow and architecture for Fast-Fourier Sampling to enhance computing unit reuse and reduce processing time. Additionally, we introduce several optimized modules, including configurable randomness generation units, parallel floating-point processing units, and an optimized SamplerZ module, to improve execution efficiency. Furthermore, this paper presents a finely optimized hardware accelerator for the Falcon scheme. Our FPGA implementation results demonstrate a throughput improvement of approximately 5.1 x compared to state-of-the-art designs, with 2.8x/4.5x/4.2x/3.2x fewer in the area (LUTs/FFs/DSPs/BRAMs)-time product, for NIST security level V. The crypto-processor occupies an area of 0.71 mm2 and achieves 5.2k OPS at throughput on the TSMC 28nm process for NIST security level I.
|
| format | Article |
| id | doaj-art-5988255d9aac4eafa6ad2e0e58e917ba |
| institution | OA Journals |
| issn | 2569-2925 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Ruhr-Universität Bochum |
| record_format | Article |
| series | Transactions on Cryptographic Hardware and Embedded Systems |
| spelling | doaj-art-5988255d9aac4eafa6ad2e0e58e917ba2025-08-20T02:38:03ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-12-012025110.46586/tches.v2025.i1.203-226FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature GenerationYi Ouyang0Yihong Zhu1Wenping Zhu2Bohan Yang3Zirui Zhang4Hanning Wang5Qichao Tao6Min Zhu7Shaojun Wei8Leibo Liu9Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaWuxi Micro Innovation Integrated Circuit Design Co., Ltd., Wuxi, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, Beijing, China Falcon is a lattice-based quantum-resistant digital signature scheme renowned for its high signature generation/verification speed and compact signature size. The scheme has been selected to be drafted in the third round of the post-quantum cryptography (PQC) standardization process due to its unique attributes and robust security features. Despite its strengths, there has been a lack of research on hardware acceleration, primarily due to its complex calculation flow and floating-point operations, which hinders its widespread adoption. To address this issue, we propose FalconSign, a high-performance, configurable crypto-processor designed to accelerate Falcon signature generation on FPGA/ASIC through algorithmhardware co-design. Our approach involves a new scheduling flow and architecture for Fast-Fourier Sampling to enhance computing unit reuse and reduce processing time. Additionally, we introduce several optimized modules, including configurable randomness generation units, parallel floating-point processing units, and an optimized SamplerZ module, to improve execution efficiency. Furthermore, this paper presents a finely optimized hardware accelerator for the Falcon scheme. Our FPGA implementation results demonstrate a throughput improvement of approximately 5.1 x compared to state-of-the-art designs, with 2.8x/4.5x/4.2x/3.2x fewer in the area (LUTs/FFs/DSPs/BRAMs)-time product, for NIST security level V. The crypto-processor occupies an area of 0.71 mm2 and achieves 5.2k OPS at throughput on the TSMC 28nm process for NIST security level I. https://tosc.iacr.org/index.php/TCHES/article/view/11927Post-quantum cryptographyFalconLatticeFast-Fourier SamplingFloating-pointHigh-performance |
| spellingShingle | Yi Ouyang Yihong Zhu Wenping Zhu Bohan Yang Zirui Zhang Hanning Wang Qichao Tao Min Zhu Shaojun Wei Leibo Liu FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation Transactions on Cryptographic Hardware and Embedded Systems Post-quantum cryptography Falcon Lattice Fast-Fourier Sampling Floating-point High-performance |
| title | FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation |
| title_full | FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation |
| title_fullStr | FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation |
| title_full_unstemmed | FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation |
| title_short | FalconSign: An Efficient and High-Throughput Hardware Architecture for Falcon Signature Generation |
| title_sort | falconsign an efficient and high throughput hardware architecture for falcon signature generation |
| topic | Post-quantum cryptography Falcon Lattice Fast-Fourier Sampling Floating-point High-performance |
| url | https://tosc.iacr.org/index.php/TCHES/article/view/11927 |
| work_keys_str_mv | AT yiouyang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT yihongzhu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT wenpingzhu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT bohanyang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT ziruizhang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT hanningwang falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT qichaotao falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT minzhu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT shaojunwei falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration AT leiboliu falconsignanefficientandhighthroughputhardwarearchitectureforfalconsignaturegeneration |