Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation

Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its...

Full description

Saved in:
Bibliographic Details
Main Authors: O. N. Karasik, A. A. Prihozhy
Format: Article
Language:English
Published: Belarusian National Technical University 2022-12-01
Series:Системный анализ и прикладная информатика
Subjects:
Online Access:https://sapi.bntu.by/jour/article/view/582
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832543676369207296
author O. N. Karasik
A. A. Prihozhy
author_facet O. N. Karasik
A. A. Prihozhy
author_sort O. N. Karasik
collection DOAJ
description Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problem
format Article
id doaj-art-e90b68184c5d4ec786431f306c1f1f03
institution Kabale University
issn 2309-4923
2414-0481
language English
publishDate 2022-12-01
publisher Belarusian National Technical University
record_format Article
series Системный анализ и прикладная информатика
spelling doaj-art-e90b68184c5d4ec786431f306c1f1f032025-02-03T11:37:40ZengBelarusian National Technical UniversityСистемный анализ и прикладная информатика2309-49232414-04812022-12-0103576510.21122/2309-4923-2022-3-57-65434Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementationO. N. Karasik0A. A. Prihozhy1Belarusian National Technical UniversityBelarusian National Technical UniversityFinding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problemhttps://sapi.bntu.by/jour/article/view/582shortest pathfloyd-warshall algorithmblocked algorithmmultithreaded applicationmulti-core systemhierarchical cache memoryparallelismthroughput.
spellingShingle O. N. Karasik
A. A. Prihozhy
Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
Системный анализ и прикладная информатика
shortest path
floyd-warshall algorithm
blocked algorithm
multithreaded application
multi-core system
hierarchical cache memory
parallelism
throughput.
title Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_full Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_fullStr Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_full_unstemmed Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_short Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_sort tuning block parallel all pairs shortest path algorithm for efficient multi core implementation
topic shortest path
floyd-warshall algorithm
blocked algorithm
multithreaded application
multi-core system
hierarchical cache memory
parallelism
throughput.
url https://sapi.bntu.by/jour/article/view/582
work_keys_str_mv AT onkarasik tuningblockparallelallpairsshortestpathalgorithmforefficientmulticoreimplementation
AT aaprihozhy tuningblockparallelallpairsshortestpathalgorithmforefficientmulticoreimplementation