VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping

Bit-wise Fully Homomorphic Encryption schemes like FHEW and TFHE offer efficient functional bootstrapping, enabling concurrent function evaluation and noise reduction. While advantageous for secure computations, these schemes suffer from high data expansion, posing significant performance challenge...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shiyu Shen, Hao Yang, Zhe Liu, Ying Liu, Xianhui Lu, Wangchen Dai, Lu Zhou, Yunlei Zhao, Ray C. C. Cheung
Format:	Article
Language:	English
Published:	Ruhr-Universität Bochum 2025-06-01
Series:	Transactions on Cryptographic Hardware and Embedded Systems
Subjects:	Fully Homomorphic Encryption Bootstrapping FHEW TFHE GPU acceleration
Online Access:	https://tches.iacr.org/index.php/TCHES/article/view/12211
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850233313774534656
author	Shiyu Shen Hao Yang Zhe Liu Ying Liu Xianhui Lu Wangchen Dai Lu Zhou Yunlei Zhao Ray C. C. Cheung
author_facet	Shiyu Shen Hao Yang Zhe Liu Ying Liu Xianhui Lu Wangchen Dai Lu Zhou Yunlei Zhao Ray C. C. Cheung
author_sort	Shiyu Shen
collection	DOAJ
description	Bit-wise Fully Homomorphic Encryption schemes like FHEW and TFHE offer efficient functional bootstrapping, enabling concurrent function evaluation and noise reduction. While advantageous for secure computations, these schemes suffer from high data expansion, posing significant performance challenges in practical ap- plications due to massive ciphertexts. To address these issues, we propose VeloFHE, a CUDA-accelerated design to enhance the efficiency of FHEW and TFHE schemes on GPUs. We develop a novel hybrid four-step Number Theoretic Transform (NTT) approach for fast polynomial multiplication. By decomposing large-scale NTTs into highly parallelizable submodules, incorporating cyclic and negacyclic convolutions, and introducing several memory-oriented optimizations, we significantly reduce both the computational complexity and memory requirements. For blind rotation, besides the gadget decomposition approach, we also apply a recent proposed modulus raising technique to both schemes to alleviate memory pressure. We further optimize it by refining computational flow to reduce noise from scaling and maintain accumulator compatibility. For key switching, we address input-output parallelism mismatches, and offloading suitable computations to the CPU, effectively hiding latency through asynchronous execution. Additionally, we explore batching in bootstrapping, de- veloping a general framework that accommodates both schemes with either gadget decomposition or modulus raising method. Our experimental results demonstrate significant performance improvements. The proposed NTT implementation shows over 35% improvement compared to recent GPU implementations. On an RTX 4090 GPU, we achieve speedups of 371.86x and 390.44x for FHEW and TFHE gate bootstrapping, respectively, compared to OpenFHE running on a 48-thread CPU at a 128-bit security level. The corresponding throughputs are 7,007 and 11,378 operations per second. Furthermore, relative to the state-of-the-art GPU implementation [XLK+25], our approach provides speedups of 2.56x, 2.24x, and 2.33x for TFHE gate bootstrapping, homomorphic evaluation of arbitrary functions, and homomorphic flooring operation, respectively. Our VeloFHE surpasses some current hardware designs, offering an effective solution for more practical and efficient privacy-preserving computations.
format	Article
id	doaj-art-d79068f1abab4e38b98704e96f510fd6
institution	OA Journals
issn	2569-2925
language	English
publishDate	2025-06-01
publisher	Ruhr-Universität Bochum
record_format	Article
series	Transactions on Cryptographic Hardware and Embedded Systems
spelling	doaj-art-d79068f1abab4e38b98704e96f510fd62025-08-20T02:02:57ZengRuhr-Universität BochumTransactions on Cryptographic Hardware and Embedded Systems2569-29252025-06-012025310.46586/tches.v2025.i3.81-114VeloFHE: GPU Acceleration for FHEW and TFHE BootstrappingShiyu Shen0Hao Yang1Zhe Liu2Ying Liu3Xianhui Lu4Wangchen Dai5Lu Zhou6Yunlei Zhao7Ray C. C. Cheung8City University of Hong Kong, Hong Kong, ChinaCity University of Hong Kong, Hong Kong, China,Zhejiang Lab, Hangzhou, ChinaKey Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, ChinaKey Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, ChinaSun Yat-sen University, Shenzhen, ChinaNanjing University of Aeronautics and Astronautics, Nanjing, ChinaFudan University, Shanghai, ChinaCity University of Hong Kong, Hong Kong, China Bit-wise Fully Homomorphic Encryption schemes like FHEW and TFHE offer efficient functional bootstrapping, enabling concurrent function evaluation and noise reduction. While advantageous for secure computations, these schemes suffer from high data expansion, posing significant performance challenges in practical ap- plications due to massive ciphertexts. To address these issues, we propose VeloFHE, a CUDA-accelerated design to enhance the efficiency of FHEW and TFHE schemes on GPUs. We develop a novel hybrid four-step Number Theoretic Transform (NTT) approach for fast polynomial multiplication. By decomposing large-scale NTTs into highly parallelizable submodules, incorporating cyclic and negacyclic convolutions, and introducing several memory-oriented optimizations, we significantly reduce both the computational complexity and memory requirements. For blind rotation, besides the gadget decomposition approach, we also apply a recent proposed modulus raising technique to both schemes to alleviate memory pressure. We further optimize it by refining computational flow to reduce noise from scaling and maintain accumulator compatibility. For key switching, we address input-output parallelism mismatches, and offloading suitable computations to the CPU, effectively hiding latency through asynchronous execution. Additionally, we explore batching in bootstrapping, de- veloping a general framework that accommodates both schemes with either gadget decomposition or modulus raising method. Our experimental results demonstrate significant performance improvements. The proposed NTT implementation shows over 35% improvement compared to recent GPU implementations. On an RTX 4090 GPU, we achieve speedups of 371.86x and 390.44x for FHEW and TFHE gate bootstrapping, respectively, compared to OpenFHE running on a 48-thread CPU at a 128-bit security level. The corresponding throughputs are 7,007 and 11,378 operations per second. Furthermore, relative to the state-of-the-art GPU implementation [XLK+25], our approach provides speedups of 2.56x, 2.24x, and 2.33x for TFHE gate bootstrapping, homomorphic evaluation of arbitrary functions, and homomorphic flooring operation, respectively. Our VeloFHE surpasses some current hardware designs, offering an effective solution for more practical and efficient privacy-preserving computations. https://tches.iacr.org/index.php/TCHES/article/view/12211Fully Homomorphic EncryptionBootstrappingFHEWTFHEGPU acceleration
spellingShingle	Shiyu Shen Hao Yang Zhe Liu Ying Liu Xianhui Lu Wangchen Dai Lu Zhou Yunlei Zhao Ray C. C. Cheung VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping Transactions on Cryptographic Hardware and Embedded Systems Fully Homomorphic Encryption Bootstrapping FHEW TFHE GPU acceleration
title	VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
title_full	VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
title_fullStr	VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
title_full_unstemmed	VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
title_short	VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping
title_sort	velofhe gpu acceleration for fhew and tfhe bootstrapping
topic	Fully Homomorphic Encryption Bootstrapping FHEW TFHE GPU acceleration
url	https://tches.iacr.org/index.php/TCHES/article/view/12211
work_keys_str_mv	AT shiyushen velofhegpuaccelerationforfhewandtfhebootstrapping AT haoyang velofhegpuaccelerationforfhewandtfhebootstrapping AT zheliu velofhegpuaccelerationforfhewandtfhebootstrapping AT yingliu velofhegpuaccelerationforfhewandtfhebootstrapping AT xianhuilu velofhegpuaccelerationforfhewandtfhebootstrapping AT wangchendai velofhegpuaccelerationforfhewandtfhebootstrapping AT luzhou velofhegpuaccelerationforfhewandtfhebootstrapping AT yunleizhao velofhegpuaccelerationforfhewandtfhebootstrapping AT raycccheung velofhegpuaccelerationforfhewandtfhebootstrapping

VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping

Similar Items