A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier

Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcin...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiangren Chen, Bohan Yang, Wenping Zhu, Hanning Wang, Qichao Tao, Shuying Yin, Min Zhu, Shaojun Wei, Leibo Liu
Format: Article
Language:English
Published: Ruhr-Universität Bochum 2024-12-01
Series:Transactions on Cryptographic Hardware and Embedded Systems
Subjects:
Online Access:https://tosc.iacr.org/index.php/TCHES/article/view/11930
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850171041241890816
author Xiangren Chen
Bohan Yang
Wenping Zhu
Hanning Wang
Qichao Tao
Shuying Yin
Min Zhu
Shaojun Wei
Leibo Liu
author_facet Xiangren Chen
Bohan Yang
Wenping Zhu
Hanning Wang
Qichao Tao
Shuying Yin
Min Zhu
Shaojun Wei
Leibo Liu
author_sort Xiangren Chen
collection DOAJ
description Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcing and cryptocurrencies. Recent general-purpose ZKP schemes, e.g., zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK), suffer from time-consuming proof generation, which is mainly bottlenecked by the large-scale number theoretic transformation (NTT) and multi-scalar point multiplication (MSM). To boost its wide application, great interest has been shown in expediting the proof generation on various platforms like GPU, FPGA and ASIC. So far as we know, current works on the hardware designs for ZKP employ two separated data-paths for NTT and MSM, overlooking the potential of resource reusage. In this work, we particularly explore the feasibility and profit of implementing both NTT and MSM with a unified and high-performance hardware architecture. For the crucial operator design, we propose a dual-precision, load-balanced and fully-pipelined Montgomery multiplier (LBFP MM) by introducing the new mixed-radix technique and improving the prior quotient-decoupled strategy. Collectively, we also integrate orthogonal ideas to further enhance the performance of LBFP MM, including the customized constant multiplication, truncated LSB/MSB multiplication/addition and Karatsuba technique. On top of that, we present the unified, scalable and highperformance hardware architecture that conducts both NTT and MSM in a versatile pipelined execution mechanism, intensively sharing the common computation and memory resource. The proposed accelerator manages to overlap the on-chip memory computation with off-chip memory access, considerably reducing the overall cycle counts for NTT and MSM. We showcase the implementation of modular multiplier and overall architecture on the BLS12-381 elliptic curve for zk-SNARK. Extensive experiments are carried out under TSMC 28nm synthesis and similar simulation set, which demonstrate impressive improvements: (1) the proposed LBFP MM obtains 1.8x speed-up and 1.3x less area cost versus the state-of-the-art design; (2) the unified accelerator achieves 12.1x and 5.8x acceleration for NTT and MSM while also consumes 4.3x lower overall on-chip area overhead, when compared to the most related and advanced work PipeZK.
format Article
id doaj-art-80bd397b87084316bb7d8a4a4ac2e643
institution OA Journals
issn 2569-2925
language English
publishDate 2024-12-01
publisher Ruhr-Universität Bochum
record_format Article
series Transactions on Cryptographic Hardware and Embedded Systems
spelling doaj-art-80bd397b87084316bb7d8a4a4ac2e6432025-08-20T02:20:21ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-12-012025110.46586/tches.v2025.i1.275-313A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery MultiplierXiangren Chen0Bohan Yang1Wenping Zhu2Hanning Wang3Qichao Tao4Shuying Yin5Min Zhu6Shaojun Wei7Leibo Liu8Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaWuxi Micro Innovation Integrated Circuit Design Co., LtdBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcing and cryptocurrencies. Recent general-purpose ZKP schemes, e.g., zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK), suffer from time-consuming proof generation, which is mainly bottlenecked by the large-scale number theoretic transformation (NTT) and multi-scalar point multiplication (MSM). To boost its wide application, great interest has been shown in expediting the proof generation on various platforms like GPU, FPGA and ASIC. So far as we know, current works on the hardware designs for ZKP employ two separated data-paths for NTT and MSM, overlooking the potential of resource reusage. In this work, we particularly explore the feasibility and profit of implementing both NTT and MSM with a unified and high-performance hardware architecture. For the crucial operator design, we propose a dual-precision, load-balanced and fully-pipelined Montgomery multiplier (LBFP MM) by introducing the new mixed-radix technique and improving the prior quotient-decoupled strategy. Collectively, we also integrate orthogonal ideas to further enhance the performance of LBFP MM, including the customized constant multiplication, truncated LSB/MSB multiplication/addition and Karatsuba technique. On top of that, we present the unified, scalable and highperformance hardware architecture that conducts both NTT and MSM in a versatile pipelined execution mechanism, intensively sharing the common computation and memory resource. The proposed accelerator manages to overlap the on-chip memory computation with off-chip memory access, considerably reducing the overall cycle counts for NTT and MSM. We showcase the implementation of modular multiplier and overall architecture on the BLS12-381 elliptic curve for zk-SNARK. Extensive experiments are carried out under TSMC 28nm synthesis and similar simulation set, which demonstrate impressive improvements: (1) the proposed LBFP MM obtains 1.8x speed-up and 1.3x less area cost versus the state-of-the-art design; (2) the unified accelerator achieves 12.1x and 5.8x acceleration for NTT and MSM while also consumes 4.3x lower overall on-chip area overhead, when compared to the most related and advanced work PipeZK. https://tosc.iacr.org/index.php/TCHES/article/view/11930ZKPNTTMontgomery multiplicationMSM
spellingShingle Xiangren Chen
Bohan Yang
Wenping Zhu
Hanning Wang
Qichao Tao
Shuying Yin
Min Zhu
Shaojun Wei
Leibo Liu
A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
Transactions on Cryptographic Hardware and Embedded Systems
ZKP
NTT
Montgomery multiplication
MSM
title A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_full A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_fullStr A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_full_unstemmed A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_short A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_sort high performance ntt msm accelerator for zero knowledge proof using load balanced fully pipelined montgomery multiplier
topic ZKP
NTT
Montgomery multiplication
MSM
url https://tosc.iacr.org/index.php/TCHES/article/view/11930
work_keys_str_mv AT xiangrenchen ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT bohanyang ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT wenpingzhu ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT hanningwang ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT qichaotao ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT shuyingyin ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT minzhu ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT shaojunwei ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT leiboliu ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT xiangrenchen highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT bohanyang highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT wenpingzhu highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT hanningwang highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT qichaotao highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT shuyingyin highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT minzhu highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT shaojunwei highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier
AT leiboliu highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier