Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

Solving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementatio...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hamish J. Macintosh, Jasmine E. Banks, Neil A. Kelson
Format:	Article
Language:	English
Published:	Wiley 2019-01-01
Series:	International Journal of Reconfigurable Computing
Online Access:	http://dx.doi.org/10.1155/2019/3679839
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849401540909989888
author	Hamish J. Macintosh Jasmine E. Banks Neil A. Kelson
author_facet	Hamish J. Macintosh Jasmine E. Banks Neil A. Kelson
author_sort	Hamish J. Macintosh
collection	DOAJ
description	Solving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementations of parallel linear system solvers, power consumption concerns are increasing in priority. This work presents the oclspkt routine. The oclspkt routine is a heterogeneous OpenCL implementation of the truncated SPIKE algorithm that can use FPGAs, GPUs, and CPUs to concurrently accelerate the solving of diagonally dominant tridiagonal linear systems. The routine is designed to solve tridiagonal systems of any size and can dynamically allocate optimised workloads to each accelerator in a heterogeneous environment depending on the accelerator’s compute performance. The truncated SPIKE FPGA solver is developed first for optimising OpenCL device kernel performance, global memory bandwidth, and interleaved host to device memory transactions. The FPGA OpenCL kernel code is then refactored and optimised to best exploit the underlying architecture of the CPU and GPU. An optimised TDMA OpenCL kernel is also developed to act as a serial baseline performance comparison for the parallel truncated SPIKE kernel since no FPGA tridiagonal solver capable of solving large tridiagonal systems was available at the time of development. The individual GPU, CPU, and FPGA solvers of the oclspkt routine are 110%, 150%, and 170% faster, respectively, than comparable device-optimised third-party solvers and applicable baselines. Assessing heterogeneous combinations of compute devices, the GPU + FPGA combination is found to have the best compute performance and the FPGA-only configuration is found to have the best overall estimated energy efficiency.
format	Article
id	doaj-art-8ce726d78eaf490790f1c238edb216fd
institution	Kabale University
issn	1687-7195 1687-7209
language	English
publishDate	2019-01-01
publisher	Wiley
record_format	Article
series	International Journal of Reconfigurable Computing
spelling	doaj-art-8ce726d78eaf490790f1c238edb216fd2025-08-20T03:37:44ZengWileyInternational Journal of Reconfigurable Computing1687-71951687-72092019-01-01201910.1155/2019/36798393679839Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUsHamish J. Macintosh0Jasmine E. Banks1Neil A. Kelson2School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Queensland 4001, AustraliaSchool of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Queensland 4001, AustraliaeResearch Office, Division of Research and Innovation, Queensland University of Technology, Brisbane, Queensland 4001, AustraliaSolving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementations of parallel linear system solvers, power consumption concerns are increasing in priority. This work presents the oclspkt routine. The oclspkt routine is a heterogeneous OpenCL implementation of the truncated SPIKE algorithm that can use FPGAs, GPUs, and CPUs to concurrently accelerate the solving of diagonally dominant tridiagonal linear systems. The routine is designed to solve tridiagonal systems of any size and can dynamically allocate optimised workloads to each accelerator in a heterogeneous environment depending on the accelerator’s compute performance. The truncated SPIKE FPGA solver is developed first for optimising OpenCL device kernel performance, global memory bandwidth, and interleaved host to device memory transactions. The FPGA OpenCL kernel code is then refactored and optimised to best exploit the underlying architecture of the CPU and GPU. An optimised TDMA OpenCL kernel is also developed to act as a serial baseline performance comparison for the parallel truncated SPIKE kernel since no FPGA tridiagonal solver capable of solving large tridiagonal systems was available at the time of development. The individual GPU, CPU, and FPGA solvers of the oclspkt routine are 110%, 150%, and 170% faster, respectively, than comparable device-optimised third-party solvers and applicable baselines. Assessing heterogeneous combinations of compute devices, the GPU + FPGA combination is found to have the best compute performance and the FPGA-only configuration is found to have the best overall estimated energy efficiency.http://dx.doi.org/10.1155/2019/3679839
spellingShingle	Hamish J. Macintosh Jasmine E. Banks Neil A. Kelson Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs International Journal of Reconfigurable Computing
title	Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
title_full	Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
title_fullStr	Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
title_full_unstemmed	Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
title_short	Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs
title_sort	implementing and evaluating an heterogeneous scalable tridiagonal linear system solver with opencl to target fpgas gpus and cpus
url	http://dx.doi.org/10.1155/2019/3679839
work_keys_str_mv	AT hamishjmacintosh implementingandevaluatinganheterogeneousscalabletridiagonallinearsystemsolverwithopencltotargetfpgasgpusandcpus AT jasmineebanks implementingandevaluatinganheterogeneousscalabletridiagonallinearsystemsolverwithopencltotargetfpgasgpusandcpus AT neilakelson implementingandevaluatinganheterogeneousscalabletridiagonallinearsystemsolverwithopencltotargetfpgasgpusandcpus

Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

Similar Items