Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture

Abstract One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this rega...

Full description

Saved in:
Bibliographic Details
Main Authors: Ashur Rafiev, Alex Yakovlev, Ghaith Tarawneh, Matthew F. Naylor, Simon W. Moore, David B. Thomas, Graeme M. Bragg, Mark L. Vousden, Andrew D. Brown
Format: Article
Language:English
Published: Wiley 2022-03-01
Series:IET Computers & Digital Techniques
Online Access:https://doi.org/10.1049/cdt2.12041
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832559619247964160
author Ashur Rafiev
Alex Yakovlev
Ghaith Tarawneh
Matthew F. Naylor
Simon W. Moore
David B. Thomas
Graeme M. Bragg
Mark L. Vousden
Andrew D. Brown
author_facet Ashur Rafiev
Alex Yakovlev
Ghaith Tarawneh
Matthew F. Naylor
Simon W. Moore
David B. Thomas
Graeme M. Bragg
Mark L. Vousden
Andrew D. Brown
author_sort Ashur Rafiev
collection DOAJ
description Abstract One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware‐implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade‐offs between the related overheads and performance factors on a message‐passing prototype machine with 49,152 RISC‐V threads distributed over 48 FPGAs (called the Partially Ordered Event‐Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model‐driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.
format Article
id doaj-art-46d70bd482054b00926b17b26650c27a
institution Kabale University
issn 1751-8601
1751-861X
language English
publishDate 2022-03-01
publisher Wiley
record_format Article
series IET Computers & Digital Techniques
spelling doaj-art-46d70bd482054b00926b17b26650c27a2025-02-03T01:29:40ZengWileyIET Computers & Digital Techniques1751-86011751-861X2022-03-01162-3718810.1049/cdt2.12041Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architectureAshur Rafiev0Alex Yakovlev1Ghaith Tarawneh2Matthew F. Naylor3Simon W. Moore4David B. Thomas5Graeme M. Bragg6Mark L. Vousden7Andrew D. Brown8School of Engineering Newcastle University Newcastle upon Tyne UKSchool of Engineering Newcastle University Newcastle upon Tyne UKSchool of Engineering Newcastle University Newcastle upon Tyne UKComputer Architecture Group Cambridge University Cambridge UKComputer Architecture Group Cambridge University Cambridge UKDepartment of Electrical and Electronic Engineering Imperial College London London UKElectronics and Computer Science University of Southampton Southampton UKElectronics and Computer Science University of Southampton Southampton UKElectronics and Computer Science University of Southampton Southampton UKAbstract One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware‐implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade‐offs between the related overheads and performance factors on a message‐passing prototype machine with 49,152 RISC‐V threads distributed over 48 FPGAs (called the Partially Ordered Event‐Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model‐driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.https://doi.org/10.1049/cdt2.12041
spellingShingle Ashur Rafiev
Alex Yakovlev
Ghaith Tarawneh
Matthew F. Naylor
Simon W. Moore
David B. Thomas
Graeme M. Bragg
Mark L. Vousden
Andrew D. Brown
Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
IET Computers & Digital Techniques
title Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
title_full Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
title_fullStr Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
title_full_unstemmed Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
title_short Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
title_sort synchronization in graph analysis algorithms on the partially ordered event triggered systems many core architecture
url https://doi.org/10.1049/cdt2.12041
work_keys_str_mv AT ashurrafiev synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT alexyakovlev synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT ghaithtarawneh synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT matthewfnaylor synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT simonwmoore synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT davidbthomas synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT graemembragg synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT marklvousden synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture
AT andrewdbrown synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture