Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation

Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its...

Full description

Saved in:

Bibliographic Details
Main Authors:	O. N. Karasik, A. A. Prihozhy
Format:	Article
Language:	English
Published:	Belarusian National Technical University 2022-12-01
Series:	Системный анализ и прикладная информатика
Subjects:	shortest path floyd-warshall algorithm blocked algorithm multithreaded application multi-core system hierarchical cache memory parallelism throughput.
Online Access:	https://sapi.bntu.by/jour/article/view/582
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832543676369207296
author	O. N. Karasik A. A. Prihozhy
author_facet	O. N. Karasik A. A. Prihozhy
author_sort	O. N. Karasik
collection	DOAJ
description	Finding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problem
format	Article
id	doaj-art-e90b68184c5d4ec786431f306c1f1f03
institution	Kabale University
issn	2309-4923 2414-0481
language	English
publishDate	2022-12-01
publisher	Belarusian National Technical University
record_format	Article
series	Системный анализ и прикладная информатика
spelling	doaj-art-e90b68184c5d4ec786431f306c1f1f032025-02-03T11:37:40ZengBelarusian National Technical UniversityСистемный анализ и прикладная информатика2309-49232414-04812022-12-0103576510.21122/2309-4923-2022-3-57-65434Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementationO. N. Karasik0A. A. Prihozhy1Belarusian National Technical UniversityBelarusian National Technical UniversityFinding shortest paths in a weighted graph is one of the key problems in computer-science, which has numerous practical applications in multiple domains. This paper analyzes the parallel blocked all-pairs shortest path algorithm at the aim of evaluating the influence of the multi-core system and its hierarchical cache memory on the parameters of algorithm implementation depending on the size of the graph and the size of distance matrix’s block. It proposes a technique of tuning the block-size to the given multi-core system. The technique involves profiling tools in the tuning process and allows the increase of the parallel algorithm throughput. Computational experiments carried out on a rack server equipped with two Intel Xeon E5-2620 v4 processors of 8 cores and 16 hardware threads each have convincingly shown for various graph sizes that the behavior and parameters of the hierarchical cache memory operation don’t depend on the graph size and are determined only by the distance matrix’s block size. To tune the algorithm to the target multi-core system, the preferable block size can be found once for the graph size whose in-memory matrix representation is larger than the size of cache shared among all processor’s cores. Then this blocksize can be reused on graphs of bigger size for efficient solving the all-pairs shortest path problemhttps://sapi.bntu.by/jour/article/view/582shortest pathfloyd-warshall algorithmblocked algorithmmultithreaded applicationmulti-core systemhierarchical cache memoryparallelismthroughput.
spellingShingle	O. N. Karasik A. A. Prihozhy Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation Системный анализ и прикладная информатика shortest path floyd-warshall algorithm blocked algorithm multithreaded application multi-core system hierarchical cache memory parallelism throughput.
title	Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_full	Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_fullStr	Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_full_unstemmed	Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_short	Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation
title_sort	tuning block parallel all pairs shortest path algorithm for efficient multi core implementation
topic	shortest path floyd-warshall algorithm blocked algorithm multithreaded application multi-core system hierarchical cache memory parallelism throughput.
url	https://sapi.bntu.by/jour/article/view/582
work_keys_str_mv	AT onkarasik tuningblockparallelallpairsshortestpathalgorithmforefficientmulticoreimplementation AT aaprihozhy tuningblockparallelallpairsshortestpathalgorithmforefficientmulticoreimplementation

Tuning block-parallel all-pairs shortest path algorithm for efficient multi-core implementation

Similar Items