Sieving with Streaming Memory Access

We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the...

Full description

Saved in:
Bibliographic Details
Main Authors: Ziyu Zhao, Jintai Ding, Bo-Yin Yang
Format: Article
Language:English
Published: Ruhr-Universität Bochum 2025-03-01
Series:Transactions on Cryptographic Hardware and Embedded Systems
Subjects:
Online Access:https://tches.iacr.org/index.php/TCHES/article/view/12051
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850252204977422336
author Ziyu Zhao
Jintai Ding
Bo-Yin Yang
author_facet Ziyu Zhao
Jintai Ding
Bo-Yin Yang
author_sort Ziyu Zhao
collection DOAJ
description We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the BDGL (Becker–Ducas– Gama–Laahoven 2016) sieve, which uses asymptotically the fewest logical operations. In particular, a dimension-n BGJ sieve uses only 20.2075n+o(n) streaming (non-random) main memory accesses. A key insight: Bucket sizes decrease by orders of magnitude after each BGJ filtering layer, so that sub-buckets fit into successively much smaller (hence faster) storage areas. Our refined BGJ is competitive at cryptographic sizes and should outperform BDGL for all practically achievable dimensions. The above is corroborated by the results from our efficient CPU-based BGJ implementation in an optimized framework, which saves about 40% RAM footprint and is ≥ 24.5x more efficient gate-count-wise compared to the Ducas–Stevens–van Woerden 2021 4-GPU implementation, which like most prior sieving-based SVP computations is a HK3 (Herold–Kirshanova 2017) sieve. Notably, we solved the 183-dimensional SVP Darmstadt Challenge in 30 days on a 112-core server and 0.87 TB of RAM; similarly we also found a short vector in the 796-dimensional Ideal-SVP Challenge. Our implementation may offer further insights into the behavior of asymptotically “fast” sieving algorithms when applied to large-scale problems. Moreover, our refined cost estimation of SVP based on this implementation suggests that some NIST PQC candidates (e.g. Falcon-512), are not sure to meet NIST’s security requirements.
format Article
id doaj-art-1cc01cddf88944e9a68c863d8de6ba16
institution OA Journals
issn 2569-2925
language English
publishDate 2025-03-01
publisher Ruhr-Universität Bochum
record_format Article
series Transactions on Cryptographic Hardware and Embedded Systems
spelling doaj-art-1cc01cddf88944e9a68c863d8de6ba162025-08-20T01:57:43ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252025-03-012025210.46586/tches.v2025.i2.362-384Sieving with Streaming Memory AccessZiyu Zhao0Jintai Ding1Bo-Yin Yang2Tsinghua University, Beijing, ChinaXi’an Jiaotong-Liverpool University, Suzhou, China; Basque Center For Applied Mathematics, Bilbao, SpainAcademia Sinica, Taipei, Taiwan We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the BDGL (Becker–Ducas– Gama–Laahoven 2016) sieve, which uses asymptotically the fewest logical operations. In particular, a dimension-n BGJ sieve uses only 20.2075n+o(n) streaming (non-random) main memory accesses. A key insight: Bucket sizes decrease by orders of magnitude after each BGJ filtering layer, so that sub-buckets fit into successively much smaller (hence faster) storage areas. Our refined BGJ is competitive at cryptographic sizes and should outperform BDGL for all practically achievable dimensions. The above is corroborated by the results from our efficient CPU-based BGJ implementation in an optimized framework, which saves about 40% RAM footprint and is ≥ 24.5x more efficient gate-count-wise compared to the Ducas–Stevens–van Woerden 2021 4-GPU implementation, which like most prior sieving-based SVP computations is a HK3 (Herold–Kirshanova 2017) sieve. Notably, we solved the 183-dimensional SVP Darmstadt Challenge in 30 days on a 112-core server and 0.87 TB of RAM; similarly we also found a short vector in the 796-dimensional Ideal-SVP Challenge. Our implementation may offer further insights into the behavior of asymptotically “fast” sieving algorithms when applied to large-scale problems. Moreover, our refined cost estimation of SVP based on this implementation suggests that some NIST PQC candidates (e.g. Falcon-512), are not sure to meet NIST’s security requirements. https://tches.iacr.org/index.php/TCHES/article/view/12051SievingLattice CryptanalysisSVP
spellingShingle Ziyu Zhao
Jintai Ding
Bo-Yin Yang
Sieving with Streaming Memory Access
Transactions on Cryptographic Hardware and Embedded Systems
Sieving
Lattice Cryptanalysis
SVP
title Sieving with Streaming Memory Access
title_full Sieving with Streaming Memory Access
title_fullStr Sieving with Streaming Memory Access
title_full_unstemmed Sieving with Streaming Memory Access
title_short Sieving with Streaming Memory Access
title_sort sieving with streaming memory access
topic Sieving
Lattice Cryptanalysis
SVP
url https://tches.iacr.org/index.php/TCHES/article/view/12051
work_keys_str_mv AT ziyuzhao sievingwithstreamingmemoryaccess
AT jintaiding sievingwithstreamingmemoryaccess
AT boyinyang sievingwithstreamingmemoryaccess