A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)

<p>The cryosphere plays a crucial role in the Earth's climate system, making accurate sea-ice simulation essential for improving climate projections. To achieve higher-resolution simulations, graphics processing units (GPUs) have become increasingly appealing due to their higher floating-...

Full description

Saved in:
Bibliographic Details
Main Authors: R. Jendersie, C. Lessig, T. Richter
Format: Article
Language:English
Published: Copernicus Publications 2025-05-01
Series:Geoscientific Model Development
Online Access:https://gmd.copernicus.org/articles/18/3017/2025/gmd-18-3017-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850254445486538752
author R. Jendersie
R. Jendersie
C. Lessig
C. Lessig
T. Richter
author_facet R. Jendersie
R. Jendersie
C. Lessig
C. Lessig
T. Richter
author_sort R. Jendersie
collection DOAJ
description <p>The cryosphere plays a crucial role in the Earth's climate system, making accurate sea-ice simulation essential for improving climate projections. To achieve higher-resolution simulations, graphics processing units (GPUs) have become increasingly appealing due to their higher floating-point peak performance compared to central processing units (CPUs). However, harnessing the full theoretical performance of GPUs often requires significant effort in redesigning algorithms and careful implementation. Recently, several frameworks have emerged that aim to simplify general-purpose GPU programming. In this study, we evaluate multiple such frameworks, including CUDA, SYCL, Kokkos, and PyTorch, for the parallelization of neXtSIM-DG, a finite-element-based dynamical core for sea ice. Based on our assessment of usability and performance, CUDA demonstrates the best performance while Kokkos is a suitable option for its robust heterogeneous computing capabilities. Our complete implementation of the momentum equation using Kokkos achieves a 6-fold speedup on the GPU compared to our OpenMP-based CPU code, while maintaining competitiveness when run on the CPU. Additionally, we explore the use of lower-precision floating-point types on the GPU, showing that switching to single precision can further accelerate sea-ice codes.</p>
format Article
id doaj-art-2dde190a52934f19b44dc98cb13a7b5e
institution OA Journals
issn 1991-959X
1991-9603
language English
publishDate 2025-05-01
publisher Copernicus Publications
record_format Article
series Geoscientific Model Development
spelling doaj-art-2dde190a52934f19b44dc98cb13a7b5e2025-08-20T01:57:08ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032025-05-01183017304010.5194/gmd-18-3017-2025A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)R. Jendersie0R. Jendersie1C. Lessig2C. Lessig3T. Richter4Institute of Simulation and Graphics, Otto von Guericke University, Magdeburg, GermanyInstitute of Analysis and Numerics, Otto von Guericke University, Magdeburg, GermanyInstitute of Simulation and Graphics, Otto von Guericke University, Magdeburg, GermanyEuropean Centre for Medium-Range Weather Forecasts, Bonn, GermanyInstitute of Analysis and Numerics, Otto von Guericke University, Magdeburg, Germany<p>The cryosphere plays a crucial role in the Earth's climate system, making accurate sea-ice simulation essential for improving climate projections. To achieve higher-resolution simulations, graphics processing units (GPUs) have become increasingly appealing due to their higher floating-point peak performance compared to central processing units (CPUs). However, harnessing the full theoretical performance of GPUs often requires significant effort in redesigning algorithms and careful implementation. Recently, several frameworks have emerged that aim to simplify general-purpose GPU programming. In this study, we evaluate multiple such frameworks, including CUDA, SYCL, Kokkos, and PyTorch, for the parallelization of neXtSIM-DG, a finite-element-based dynamical core for sea ice. Based on our assessment of usability and performance, CUDA demonstrates the best performance while Kokkos is a suitable option for its robust heterogeneous computing capabilities. Our complete implementation of the momentum equation using Kokkos achieves a 6-fold speedup on the GPU compared to our OpenMP-based CPU code, while maintaining competitiveness when run on the CPU. Additionally, we explore the use of lower-precision floating-point types on the GPU, showing that switching to single precision can further accelerate sea-ice codes.</p>https://gmd.copernicus.org/articles/18/3017/2025/gmd-18-3017-2025.pdf
spellingShingle R. Jendersie
R. Jendersie
C. Lessig
C. Lessig
T. Richter
A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)
Geoscientific Model Development
title A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)
title_full A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)
title_fullStr A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)
title_full_unstemmed A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)
title_short A GPU parallelization of the neXtSIM-DG dynamical core (v0.3.1)
title_sort gpu parallelization of the nextsim dg dynamical core v0 3 1
url https://gmd.copernicus.org/articles/18/3017/2025/gmd-18-3017-2025.pdf
work_keys_str_mv AT rjendersie agpuparallelizationofthenextsimdgdynamicalcorev031
AT rjendersie agpuparallelizationofthenextsimdgdynamicalcorev031
AT clessig agpuparallelizationofthenextsimdgdynamicalcorev031
AT clessig agpuparallelizationofthenextsimdgdynamicalcorev031
AT trichter agpuparallelizationofthenextsimdgdynamicalcorev031
AT rjendersie gpuparallelizationofthenextsimdgdynamicalcorev031
AT rjendersie gpuparallelizationofthenextsimdgdynamicalcorev031
AT clessig gpuparallelizationofthenextsimdgdynamicalcorev031
AT clessig gpuparallelizationofthenextsimdgdynamicalcorev031
AT trichter gpuparallelizationofthenextsimdgdynamicalcorev031