Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Belarusian National Technical University
2022-12-01
|
Series: | Системный анализ и прикладная информатика |
Subjects: | |
Online Access: | https://sapi.bntu.by/jour/article/view/582 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832543676369207296 |
---|---|
author | O. N. Karasik A. A. Prihozhy |
author_facet | O. N. Karasik A. A. Prihozhy |
author_sort | O. N. Karasik |
collection | DOAJ |
description | Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problem |
format | Article |
id | doaj-art-e90b68184c5d4ec786431f306c1f1f03 |
institution | Kabale University |
issn | 2309-4923 2414-0481 |
language | English |
publishDate | 2022-12-01 |
publisher | Belarusian National Technical University |
record_format | Article |
series | Системный анализ и прикладная информатика |
spelling | doaj-art-e90b68184c5d4ec786431f306c1f1f032025-02-03T11:37:40ZengBelarusian National Technical UniversityСистемный анализ и прикладная информатика2309-49232414-04812022-12-0103576510.21122/2309-4923-2022-3-57-65434Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementationO. N. Karasik0A. A. Prihozhy1Belarusian National Technical UniversityBelarusian National Technical UniversityFinding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problemhttps://sapi.bntu.by/jour/article/view/582shortest pathfloyd-warshall algorithmblocked algorithmmultithreaded applicationmulti-core systemhierarchical cache memoryparallelismthroughput. |
spellingShingle | O. N. Karasik A. A. Prihozhy Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation Системный анализ и прикладная информатика shortest path floyd-warshall algorithm blocked algorithm multithreaded application multi-core system hierarchical cache memory parallelism throughput. |
title | Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation |
title_full | Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation |
title_fullStr | Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation |
title_full_unstemmed | Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation |
title_short | Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation |
title_sort | tuning block parallel all pairs shortest path algorithm for efficient multi core implementation |
topic | shortest path floyd-warshall algorithm blocked algorithm multithreaded application multi-core system hierarchical cache memory parallelism throughput. |
url | https://sapi.bntu.by/jour/article/view/582 |
work_keys_str_mv | AT onkarasik tuningblockparallelallpairsshortestpathalgorithmforefficientmulticoreimplementation AT aaprihozhy tuningblockparallelallpairsshortestpathalgorithmforefficientmulticoreimplementation |