Sieving with Streaming Memory Access

We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ziyu Zhao, Jintai Ding, Bo-Yin Yang
Format:	Article
Language:	English
Published:	Ruhr-Universität Bochum 2025-03-01
Series:	Transactions on Cryptographic Hardware and Embedded Systems
Subjects:	Sieving Lattice Cryptanalysis SVP
Online Access:	https://tches.iacr.org/index.php/TCHES/article/view/12051
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850252204977422336
author	Ziyu Zhao Jintai Ding Bo-Yin Yang
author_facet	Ziyu Zhao Jintai Ding Bo-Yin Yang
author_sort	Ziyu Zhao
collection	DOAJ
description	We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the BDGL (Becker–Ducas– Gama–Laahoven 2016) sieve, which uses asymptotically the fewest logical operations. In particular, a dimension-n BGJ sieve uses only 20.2075n+o(n) streaming (non-random) main memory accesses. A key insight: Bucket sizes decrease by orders of magnitude after each BGJ filtering layer, so that sub-buckets fit into successively much smaller (hence faster) storage areas. Our refined BGJ is competitive at cryptographic sizes and should outperform BDGL for all practically achievable dimensions. The above is corroborated by the results from our efficient CPU-based BGJ implementation in an optimized framework, which saves about 40% RAM footprint and is ≥ 24.5x more efficient gate-count-wise compared to the Ducas–Stevens–van Woerden 2021 4-GPU implementation, which like most prior sieving-based SVP computations is a HK3 (Herold–Kirshanova 2017) sieve. Notably, we solved the 183-dimensional SVP Darmstadt Challenge in 30 days on a 112-core server and 0.87 TB of RAM; similarly we also found a short vector in the 796-dimensional Ideal-SVP Challenge. Our implementation may offer further insights into the behavior of asymptotically “fast” sieving algorithms when applied to large-scale problems. Moreover, our refined cost estimation of SVP based on this implementation suggests that some NIST PQC candidates (e.g. Falcon-512), are not sure to meet NIST’s security requirements.
format	Article
id	doaj-art-1cc01cddf88944e9a68c863d8de6ba16
institution	OA Journals
issn	2569-2925
language	English
publishDate	2025-03-01
publisher	Ruhr-Universität Bochum
record_format	Article
series	Transactions on Cryptographic Hardware and Embedded Systems
spelling	doaj-art-1cc01cddf88944e9a68c863d8de6ba162025-08-20T01:57:43ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252025-03-012025210.46586/tches.v2025.i2.362-384Sieving with Streaming Memory AccessZiyu Zhao0Jintai Ding1Bo-Yin Yang2Tsinghua University, Beijing, ChinaXi’an Jiaotong-Liverpool University, Suzhou, China; Basque Center For Applied Mathematics, Bilbao, SpainAcademia Sinica, Taipei, Taiwan We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the BDGL (Becker–Ducas– Gama–Laahoven 2016) sieve, which uses asymptotically the fewest logical operations. In particular, a dimension-n BGJ sieve uses only 20.2075n+o(n) streaming (non-random) main memory accesses. A key insight: Bucket sizes decrease by orders of magnitude after each BGJ filtering layer, so that sub-buckets fit into successively much smaller (hence faster) storage areas. Our refined BGJ is competitive at cryptographic sizes and should outperform BDGL for all practically achievable dimensions. The above is corroborated by the results from our efficient CPU-based BGJ implementation in an optimized framework, which saves about 40% RAM footprint and is ≥ 24.5x more efficient gate-count-wise compared to the Ducas–Stevens–van Woerden 2021 4-GPU implementation, which like most prior sieving-based SVP computations is a HK3 (Herold–Kirshanova 2017) sieve. Notably, we solved the 183-dimensional SVP Darmstadt Challenge in 30 days on a 112-core server and 0.87 TB of RAM; similarly we also found a short vector in the 796-dimensional Ideal-SVP Challenge. Our implementation may offer further insights into the behavior of asymptotically “fast” sieving algorithms when applied to large-scale problems. Moreover, our refined cost estimation of SVP based on this implementation suggests that some NIST PQC candidates (e.g. Falcon-512), are not sure to meet NIST’s security requirements. https://tches.iacr.org/index.php/TCHES/article/view/12051SievingLattice CryptanalysisSVP
spellingShingle	Ziyu Zhao Jintai Ding Bo-Yin Yang Sieving with Streaming Memory Access Transactions on Cryptographic Hardware and Embedded Systems Sieving Lattice Cryptanalysis SVP
title	Sieving with Streaming Memory Access
title_full	Sieving with Streaming Memory Access
title_fullStr	Sieving with Streaming Memory Access
title_full_unstemmed	Sieving with Streaming Memory Access
title_short	Sieving with Streaming Memory Access
title_sort	sieving with streaming memory access
topic	Sieving Lattice Cryptanalysis SVP
url	https://tches.iacr.org/index.php/TCHES/article/view/12051
work_keys_str_mv	AT ziyuzhao sievingwithstreamingmemoryaccess AT jintaiding sievingwithstreamingmemoryaccess AT boyinyang sievingwithstreamingmemoryaccess

Sieving with Streaming Memory Access

Similar Items