Enhancement of GPU-accelerated smoothed particle hydrodynamics (SPH) method with dynamic parallelism

An innovative GPU programming architecture leveraging CUDA Dynamic Parallelism (CDP) is introduced in this study, aiming to enhance the computational efficiency of Smoothed Particle Hydrodynamics (SPH) simulations. Compared with the conventional CPU-GPU collaborative framework, the Dynamic Paralleli...

Full description

Saved in:
Bibliographic Details
Main Authors: Liwen Xue, Shenglong Gu, Songdong Shao
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590123025028634
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An innovative GPU programming architecture leveraging CUDA Dynamic Parallelism (CDP) is introduced in this study, aiming to enhance the computational efficiency of Smoothed Particle Hydrodynamics (SPH) simulations. Compared with the conventional CPU-GPU collaborative framework, the Dynamic Parallelism can enable the dynamic device-side kernel launching and synchronization, effectively mitigate the latency caused by the CPU-GPU communication bottlenecks, thereby boosting the end-to-end computational throughput of SPH solvers. Within this framework, the neighboring particle search and particle interaction computations in SPH are parallelized on the GPU architectures. Furthermore, Dynamic Parallelism employs the refined task management via CUDA streams, enabling the direct on-device data synchronization and concurrent task scheduling, which bypass the traditional CPU/GPU-controlled barriers that impede the computational efficiency. Through two benchmark tests conducted on an NVIDIA GeForce RTX 4080 SUPER GPU, the proposed Dynamic Parallelism SPH implementations demonstrated an acceleration factor of approximately 1.5x - 3.0x in comparison with the conventional CPU-GPU SPH solvers in large-scale and dynamically evolving particle systems. Besides, the RMS errors between SPH simulations and experimental data were found in the range of 0.03 - 0.12 m/s for the dam break flow and 0.004 - 0.008 m/s for the water entry, respectively. Under same SPH algorithms, adopting the CDP architecture can achieve higher computational efficiency than the traditional CUDA technique. No other well-known SPH software uses this CDP concept, and this should be the first time CDP is used in GPU-based SPH simulations.
ISSN:2590-1230