A scalable parallel sorting algorithm by regular sampling for big data

Sorting is one of the basic algorithms in computer science, and has been extensively used in a variety of applications.In the big data era, as the volumes of data increase rapidly, parallel sorting has attracted much attention.Existing parallel sorting algorithms suffered from excessive communicatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Ying WANG, Zhiguang CHEN, Yutong LU
Format: Article
Language:zho
Published: China InfoCom Media Group 2024-07-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2024021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850212284598583296
author Ying WANG
Zhiguang CHEN
Yutong LU
author_facet Ying WANG
Zhiguang CHEN
Yutong LU
author_sort Ying WANG
collection DOAJ
description Sorting is one of the basic algorithms in computer science, and has been extensively used in a variety of applications.In the big data era, as the volumes of data increase rapidly, parallel sorting has attracted much attention.Existing parallel sorting algorithms suffered from excessive communication overhead and load imbalance, making it difficult to scale massively.To solve above problems, a scalable parallel algorithm sorting by regular sampling (ScaPSRS) was proposed, which sampled the p-1 pivot elements to divide the entire data set into p disjoint subsets by all parallel processes, rather than by only one given process as PSRS did.Furthermore, ScaPSRS adopted a novel iterative update strategy of selecting pivots to guarantee that the workloads and data were evenly scheduled among the parallel processes, thus ensuring superior overall performance.A variety of experiments conducted on the Tianhe-Ⅱ supercomputer demonstrated that ScaPSRS succeeded in scaling to 32 000 cores and outperformed state-of-the-art works significantly.
format Article
id doaj-art-0da6f3e6cd194181bf2ddc4b394a4ad3
institution OA Journals
issn 2096-0271
language zho
publishDate 2024-07-01
publisher China InfoCom Media Group
record_format Article
series 大数据
spelling doaj-art-0da6f3e6cd194181bf2ddc4b394a4ad32025-08-20T02:09:22ZzhoChina InfoCom Media Group大数据2096-02712024-07-01108910567330299A scalable parallel sorting algorithm by regular sampling for big dataYing WANGZhiguang CHENYutong LUSorting is one of the basic algorithms in computer science, and has been extensively used in a variety of applications.In the big data era, as the volumes of data increase rapidly, parallel sorting has attracted much attention.Existing parallel sorting algorithms suffered from excessive communication overhead and load imbalance, making it difficult to scale massively.To solve above problems, a scalable parallel algorithm sorting by regular sampling (ScaPSRS) was proposed, which sampled the p-1 pivot elements to divide the entire data set into p disjoint subsets by all parallel processes, rather than by only one given process as PSRS did.Furthermore, ScaPSRS adopted a novel iterative update strategy of selecting pivots to guarantee that the workloads and data were evenly scheduled among the parallel processes, thus ensuring superior overall performance.A variety of experiments conducted on the Tianhe-Ⅱ supercomputer demonstrated that ScaPSRS succeeded in scaling to 32 000 cores and outperformed state-of-the-art works significantly.http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2024021parallel sorting;regular sampling;load balance;big data
spellingShingle Ying WANG
Zhiguang CHEN
Yutong LU
A scalable parallel sorting algorithm by regular sampling for big data
大数据
parallel sorting;regular sampling;load balance;big data
title A scalable parallel sorting algorithm by regular sampling for big data
title_full A scalable parallel sorting algorithm by regular sampling for big data
title_fullStr A scalable parallel sorting algorithm by regular sampling for big data
title_full_unstemmed A scalable parallel sorting algorithm by regular sampling for big data
title_short A scalable parallel sorting algorithm by regular sampling for big data
title_sort scalable parallel sorting algorithm by regular sampling for big data
topic parallel sorting;regular sampling;load balance;big data
url http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2024021
work_keys_str_mv AT yingwang ascalableparallelsortingalgorithmbyregularsamplingforbigdata
AT zhiguangchen ascalableparallelsortingalgorithmbyregularsamplingforbigdata
AT yutonglu ascalableparallelsortingalgorithmbyregularsamplingforbigdata
AT yingwang scalableparallelsortingalgorithmbyregularsamplingforbigdata
AT zhiguangchen scalableparallelsortingalgorithmbyregularsamplingforbigdata
AT yutonglu scalableparallelsortingalgorithmbyregularsamplingforbigdata