PyPOD-GP: Using PyTorch for accelerated chip-level thermal simulation of the GPU

The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive...

Full description

Saved in:
Bibliographic Details
Main Authors: Neil He, Ming-Cheng Cheng, Yu Liu
Format: Article
Language:English
Published: Elsevier 2025-05-01
Series:SoftwareX
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352711025001141
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rising demand for high-performance computing (HPC) has made full-chip dynamic thermal simulation in many-core GPUs critical for optimizing performance and extending device lifespans. Proper orthogonal decomposition (POD) with Galerkin projection (GP) has shown to offer high accuracy and massive runtime improvements over direct numerical simulation (DNS). However, previous implementations of POD-GP use MPI-based libraries like PETSc and FEniCS and face significant runtime bottlenecks. We propose a PyTorch-based POD-GP library (PyPOD-GP), a GPU-optimized library for chip-level thermal simulation. PyPOD-GP achieves over 23.4× speedup in training and over 10× speedup in inference on a GPU with over 13,000 cores, with just 1.2% error over the device layer.
ISSN:2352-7110