Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture
Abstract One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this rega...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-03-01
|
Series: | IET Computers & Digital Techniques |
Online Access: | https://doi.org/10.1049/cdt2.12041 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832559619247964160 |
---|---|
author | Ashur Rafiev Alex Yakovlev Ghaith Tarawneh Matthew F. Naylor Simon W. Moore David B. Thomas Graeme M. Bragg Mark L. Vousden Andrew D. Brown |
author_facet | Ashur Rafiev Alex Yakovlev Ghaith Tarawneh Matthew F. Naylor Simon W. Moore David B. Thomas Graeme M. Bragg Mark L. Vousden Andrew D. Brown |
author_sort | Ashur Rafiev |
collection | DOAJ |
description | Abstract One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware‐implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade‐offs between the related overheads and performance factors on a message‐passing prototype machine with 49,152 RISC‐V threads distributed over 48 FPGAs (called the Partially Ordered Event‐Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model‐driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation. |
format | Article |
id | doaj-art-46d70bd482054b00926b17b26650c27a |
institution | Kabale University |
issn | 1751-8601 1751-861X |
language | English |
publishDate | 2022-03-01 |
publisher | Wiley |
record_format | Article |
series | IET Computers & Digital Techniques |
spelling | doaj-art-46d70bd482054b00926b17b26650c27a2025-02-03T01:29:40ZengWileyIET Computers & Digital Techniques1751-86011751-861X2022-03-01162-3718810.1049/cdt2.12041Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architectureAshur Rafiev0Alex Yakovlev1Ghaith Tarawneh2Matthew F. Naylor3Simon W. Moore4David B. Thomas5Graeme M. Bragg6Mark L. Vousden7Andrew D. Brown8School of Engineering Newcastle University Newcastle upon Tyne UKSchool of Engineering Newcastle University Newcastle upon Tyne UKSchool of Engineering Newcastle University Newcastle upon Tyne UKComputer Architecture Group Cambridge University Cambridge UKComputer Architecture Group Cambridge University Cambridge UKDepartment of Electrical and Electronic Engineering Imperial College London London UKElectronics and Computer Science University of Southampton Southampton UKElectronics and Computer Science University of Southampton Southampton UKElectronics and Computer Science University of Southampton Southampton UKAbstract One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware‐implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade‐offs between the related overheads and performance factors on a message‐passing prototype machine with 49,152 RISC‐V threads distributed over 48 FPGAs (called the Partially Ordered Event‐Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model‐driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.https://doi.org/10.1049/cdt2.12041 |
spellingShingle | Ashur Rafiev Alex Yakovlev Ghaith Tarawneh Matthew F. Naylor Simon W. Moore David B. Thomas Graeme M. Bragg Mark L. Vousden Andrew D. Brown Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture IET Computers & Digital Techniques |
title | Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture |
title_full | Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture |
title_fullStr | Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture |
title_full_unstemmed | Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture |
title_short | Synchronization in graph analysis algorithms on the Partially Ordered Event‐Triggered Systems many‐core architecture |
title_sort | synchronization in graph analysis algorithms on the partially ordered event triggered systems many core architecture |
url | https://doi.org/10.1049/cdt2.12041 |
work_keys_str_mv | AT ashurrafiev synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT alexyakovlev synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT ghaithtarawneh synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT matthewfnaylor synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT simonwmoore synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT davidbthomas synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT graemembragg synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT marklvousden synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture AT andrewdbrown synchronizationingraphanalysisalgorithmsonthepartiallyorderedeventtriggeredsystemsmanycorearchitecture |