A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier

Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiangren Chen, Bohan Yang, Wenping Zhu, Hanning Wang, Qichao Tao, Shuying Yin, Min Zhu, Shaojun Wei, Leibo Liu
Format:	Article
Language:	English
Published:	Ruhr-Universität Bochum 2024-12-01
Series:	Transactions on Cryptographic Hardware and Embedded Systems
Subjects:	ZKP NTT Montgomery multiplication MSM
Online Access:	https://tosc.iacr.org/index.php/TCHES/article/view/11930
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850171041241890816
author	Xiangren Chen Bohan Yang Wenping Zhu Hanning Wang Qichao Tao Shuying Yin Min Zhu Shaojun Wei Leibo Liu
author_facet	Xiangren Chen Bohan Yang Wenping Zhu Hanning Wang Qichao Tao Shuying Yin Min Zhu Shaojun Wei Leibo Liu
author_sort	Xiangren Chen
collection	DOAJ
description	Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcing and cryptocurrencies. Recent general-purpose ZKP schemes, e.g., zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK), suffer from time-consuming proof generation, which is mainly bottlenecked by the large-scale number theoretic transformation (NTT) and multi-scalar point multiplication (MSM). To boost its wide application, great interest has been shown in expediting the proof generation on various platforms like GPU, FPGA and ASIC. So far as we know, current works on the hardware designs for ZKP employ two separated data-paths for NTT and MSM, overlooking the potential of resource reusage. In this work, we particularly explore the feasibility and profit of implementing both NTT and MSM with a unified and high-performance hardware architecture. For the crucial operator design, we propose a dual-precision, load-balanced and fully-pipelined Montgomery multiplier (LBFP MM) by introducing the new mixed-radix technique and improving the prior quotient-decoupled strategy. Collectively, we also integrate orthogonal ideas to further enhance the performance of LBFP MM, including the customized constant multiplication, truncated LSB/MSB multiplication/addition and Karatsuba technique. On top of that, we present the unified, scalable and highperformance hardware architecture that conducts both NTT and MSM in a versatile pipelined execution mechanism, intensively sharing the common computation and memory resource. The proposed accelerator manages to overlap the on-chip memory computation with off-chip memory access, considerably reducing the overall cycle counts for NTT and MSM. We showcase the implementation of modular multiplier and overall architecture on the BLS12-381 elliptic curve for zk-SNARK. Extensive experiments are carried out under TSMC 28nm synthesis and similar simulation set, which demonstrate impressive improvements: (1) the proposed LBFP MM obtains 1.8x speed-up and 1.3x less area cost versus the state-of-the-art design; (2) the unified accelerator achieves 12.1x and 5.8x acceleration for NTT and MSM while also consumes 4.3x lower overall on-chip area overhead, when compared to the most related and advanced work PipeZK.
format	Article
id	doaj-art-80bd397b87084316bb7d8a4a4ac2e643
institution	OA Journals
issn	2569-2925
language	English
publishDate	2024-12-01
publisher	Ruhr-Universität Bochum
record_format	Article
series	Transactions on Cryptographic Hardware and Embedded Systems
spelling	doaj-art-80bd397b87084316bb7d8a4a4ac2e6432025-08-20T02:20:21ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252024-12-012025110.46586/tches.v2025.i1.275-313A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery MultiplierXiangren Chen0Bohan Yang1Wenping Zhu2Hanning Wang3Qichao Tao4Shuying Yin5Min Zhu6Shaojun Wei7Leibo Liu8Beijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaWuxi Micro Innovation Integrated Circuit Design Co., LtdBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, ChinaBeijing National Research Center for Information Science and Technology (BNRist), School of Integrated Circuits, Tsinghua University, China Zero-knowledge proof (ZKP) is an attractive cryptographic paradigm that allows a party to prove the correctness of a given statement without revealing any additional information. It offers both computation integrity and privacy, witnessing many celebrated deployments, such as computation outsourcing and cryptocurrencies. Recent general-purpose ZKP schemes, e.g., zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK), suffer from time-consuming proof generation, which is mainly bottlenecked by the large-scale number theoretic transformation (NTT) and multi-scalar point multiplication (MSM). To boost its wide application, great interest has been shown in expediting the proof generation on various platforms like GPU, FPGA and ASIC. So far as we know, current works on the hardware designs for ZKP employ two separated data-paths for NTT and MSM, overlooking the potential of resource reusage. In this work, we particularly explore the feasibility and profit of implementing both NTT and MSM with a unified and high-performance hardware architecture. For the crucial operator design, we propose a dual-precision, load-balanced and fully-pipelined Montgomery multiplier (LBFP MM) by introducing the new mixed-radix technique and improving the prior quotient-decoupled strategy. Collectively, we also integrate orthogonal ideas to further enhance the performance of LBFP MM, including the customized constant multiplication, truncated LSB/MSB multiplication/addition and Karatsuba technique. On top of that, we present the unified, scalable and highperformance hardware architecture that conducts both NTT and MSM in a versatile pipelined execution mechanism, intensively sharing the common computation and memory resource. The proposed accelerator manages to overlap the on-chip memory computation with off-chip memory access, considerably reducing the overall cycle counts for NTT and MSM. We showcase the implementation of modular multiplier and overall architecture on the BLS12-381 elliptic curve for zk-SNARK. Extensive experiments are carried out under TSMC 28nm synthesis and similar simulation set, which demonstrate impressive improvements: (1) the proposed LBFP MM obtains 1.8x speed-up and 1.3x less area cost versus the state-of-the-art design; (2) the unified accelerator achieves 12.1x and 5.8x acceleration for NTT and MSM while also consumes 4.3x lower overall on-chip area overhead, when compared to the most related and advanced work PipeZK. https://tosc.iacr.org/index.php/TCHES/article/view/11930ZKPNTTMontgomery multiplicationMSM
spellingShingle	Xiangren Chen Bohan Yang Wenping Zhu Hanning Wang Qichao Tao Shuying Yin Min Zhu Shaojun Wei Leibo Liu A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier Transactions on Cryptographic Hardware and Embedded Systems ZKP NTT Montgomery multiplication MSM
title	A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_full	A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_fullStr	A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_full_unstemmed	A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_short	A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier
title_sort	high performance ntt msm accelerator for zero knowledge proof using load balanced fully pipelined montgomery multiplier
topic	ZKP NTT Montgomery multiplication MSM
url	https://tosc.iacr.org/index.php/TCHES/article/view/11930
work_keys_str_mv	AT xiangrenchen ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT bohanyang ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT wenpingzhu ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT hanningwang ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT qichaotao ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT shuyingyin ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT minzhu ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT shaojunwei ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT leiboliu ahighperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT xiangrenchen highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT bohanyang highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT wenpingzhu highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT hanningwang highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT qichaotao highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT shuyingyin highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT minzhu highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT shaojunwei highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier AT leiboliu highperformancenttmsmacceleratorforzeroknowledgeproofusingloadbalancedfullypipelinedmontgomerymultiplier

A High-performance NTT/MSM Accelerator for Zero-knowledge Proof Using Load-balanced Fully-pipelined Montgomery Multiplier

Similar Items