Sieving with Streaming Memory Access
We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Ruhr-Universität Bochum
2025-03-01
|
| Series: | Transactions on Cryptographic Hardware and Embedded Systems |
| Subjects: | |
| Online Access: | https://tches.iacr.org/index.php/TCHES/article/view/12051 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850252204977422336 |
|---|---|
| author | Ziyu Zhao Jintai Ding Bo-Yin Yang |
| author_facet | Ziyu Zhao Jintai Ding Bo-Yin Yang |
| author_sort | Ziyu Zhao |
| collection | DOAJ |
| description |
We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the BDGL (Becker–Ducas– Gama–Laahoven 2016) sieve, which uses asymptotically the fewest logical operations. In particular, a dimension-n BGJ sieve uses only 20.2075n+o(n) streaming (non-random) main memory accesses. A key insight: Bucket sizes decrease by orders of magnitude after each BGJ filtering layer, so that sub-buckets fit into successively much smaller (hence faster) storage areas. Our refined BGJ is competitive at cryptographic sizes and should outperform BDGL for all practically achievable dimensions.
The above is corroborated by the results from our efficient CPU-based BGJ implementation in an optimized framework, which saves about 40% RAM footprint and is ≥ 24.5x more efficient gate-count-wise compared to the Ducas–Stevens–van Woerden 2021 4-GPU implementation, which like most prior sieving-based SVP computations is a HK3 (Herold–Kirshanova 2017) sieve. Notably, we solved the 183-dimensional SVP Darmstadt Challenge in 30 days on a 112-core server and 0.87 TB of RAM; similarly we also found a short vector in the 796-dimensional Ideal-SVP Challenge. Our implementation may offer further insights into the behavior of asymptotically “fast” sieving algorithms when applied to large-scale problems. Moreover, our refined cost estimation of SVP based on this implementation suggests that some NIST PQC candidates (e.g. Falcon-512), are not sure to meet NIST’s security requirements.
|
| format | Article |
| id | doaj-art-1cc01cddf88944e9a68c863d8de6ba16 |
| institution | OA Journals |
| issn | 2569-2925 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Ruhr-Universität Bochum |
| record_format | Article |
| series | Transactions on Cryptographic Hardware and Embedded Systems |
| spelling | doaj-art-1cc01cddf88944e9a68c863d8de6ba162025-08-20T01:57:43ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252025-03-012025210.46586/tches.v2025.i2.362-384Sieving with Streaming Memory AccessZiyu Zhao0Jintai Ding1Bo-Yin Yang2Tsinghua University, Beijing, ChinaXi’an Jiaotong-Liverpool University, Suzhou, China; Basque Center For Applied Mathematics, Bilbao, SpainAcademia Sinica, Taipei, Taiwan We implement an optimized BGJ (Becker–Gama–Joux 2015) sieve and analyze its behavior in a study of RAM access overheads (and their minimization) in sieving algorithms for large lattice problems. Both experiment and theory points to BGJ’s inherent structure being much more memory-efficient than the BDGL (Becker–Ducas– Gama–Laahoven 2016) sieve, which uses asymptotically the fewest logical operations. In particular, a dimension-n BGJ sieve uses only 20.2075n+o(n) streaming (non-random) main memory accesses. A key insight: Bucket sizes decrease by orders of magnitude after each BGJ filtering layer, so that sub-buckets fit into successively much smaller (hence faster) storage areas. Our refined BGJ is competitive at cryptographic sizes and should outperform BDGL for all practically achievable dimensions. The above is corroborated by the results from our efficient CPU-based BGJ implementation in an optimized framework, which saves about 40% RAM footprint and is ≥ 24.5x more efficient gate-count-wise compared to the Ducas–Stevens–van Woerden 2021 4-GPU implementation, which like most prior sieving-based SVP computations is a HK3 (Herold–Kirshanova 2017) sieve. Notably, we solved the 183-dimensional SVP Darmstadt Challenge in 30 days on a 112-core server and 0.87 TB of RAM; similarly we also found a short vector in the 796-dimensional Ideal-SVP Challenge. Our implementation may offer further insights into the behavior of asymptotically “fast” sieving algorithms when applied to large-scale problems. Moreover, our refined cost estimation of SVP based on this implementation suggests that some NIST PQC candidates (e.g. Falcon-512), are not sure to meet NIST’s security requirements. https://tches.iacr.org/index.php/TCHES/article/view/12051SievingLattice CryptanalysisSVP |
| spellingShingle | Ziyu Zhao Jintai Ding Bo-Yin Yang Sieving with Streaming Memory Access Transactions on Cryptographic Hardware and Embedded Systems Sieving Lattice Cryptanalysis SVP |
| title | Sieving with Streaming Memory Access |
| title_full | Sieving with Streaming Memory Access |
| title_fullStr | Sieving with Streaming Memory Access |
| title_full_unstemmed | Sieving with Streaming Memory Access |
| title_short | Sieving with Streaming Memory Access |
| title_sort | sieving with streaming memory access |
| topic | Sieving Lattice Cryptanalysis SVP |
| url | https://tches.iacr.org/index.php/TCHES/article/view/12051 |
| work_keys_str_mv | AT ziyuzhao sievingwithstreamingmemoryaccess AT jintaiding sievingwithstreamingmemoryaccess AT boyinyang sievingwithstreamingmemoryaccess |