A scalable parallel sorting algorithm by regular sampling for big data
Sorting is one of the basic algorithms in computer science, and has been extensively used in a variety of applications.In the big data era, as the volumes of data increase rapidly, parallel sorting has attracted much attention.Existing parallel sorting algorithms suffered from excessive communicatio...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
China InfoCom Media Group
2024-07-01
|
| Series: | 大数据 |
| Subjects: | |
| Online Access: | http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2024021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850212284598583296 |
|---|---|
| author | Ying WANG Zhiguang CHEN Yutong LU |
| author_facet | Ying WANG Zhiguang CHEN Yutong LU |
| author_sort | Ying WANG |
| collection | DOAJ |
| description | Sorting is one of the basic algorithms in computer science, and has been extensively used in a variety of applications.In the big data era, as the volumes of data increase rapidly, parallel sorting has attracted much attention.Existing parallel sorting algorithms suffered from excessive communication overhead and load imbalance, making it difficult to scale massively.To solve above problems, a scalable parallel algorithm sorting by regular sampling (ScaPSRS) was proposed, which sampled the p-1 pivot elements to divide the entire data set into p disjoint subsets by all parallel processes, rather than by only one given process as PSRS did.Furthermore, ScaPSRS adopted a novel iterative update strategy of selecting pivots to guarantee that the workloads and data were evenly scheduled among the parallel processes, thus ensuring superior overall performance.A variety of experiments conducted on the Tianhe-Ⅱ supercomputer demonstrated that ScaPSRS succeeded in scaling to 32 000 cores and outperformed state-of-the-art works significantly. |
| format | Article |
| id | doaj-art-0da6f3e6cd194181bf2ddc4b394a4ad3 |
| institution | OA Journals |
| issn | 2096-0271 |
| language | zho |
| publishDate | 2024-07-01 |
| publisher | China InfoCom Media Group |
| record_format | Article |
| series | 大数据 |
| spelling | doaj-art-0da6f3e6cd194181bf2ddc4b394a4ad32025-08-20T02:09:22ZzhoChina InfoCom Media Group大数据2096-02712024-07-01108910567330299A scalable parallel sorting algorithm by regular sampling for big dataYing WANGZhiguang CHENYutong LUSorting is one of the basic algorithms in computer science, and has been extensively used in a variety of applications.In the big data era, as the volumes of data increase rapidly, parallel sorting has attracted much attention.Existing parallel sorting algorithms suffered from excessive communication overhead and load imbalance, making it difficult to scale massively.To solve above problems, a scalable parallel algorithm sorting by regular sampling (ScaPSRS) was proposed, which sampled the p-1 pivot elements to divide the entire data set into p disjoint subsets by all parallel processes, rather than by only one given process as PSRS did.Furthermore, ScaPSRS adopted a novel iterative update strategy of selecting pivots to guarantee that the workloads and data were evenly scheduled among the parallel processes, thus ensuring superior overall performance.A variety of experiments conducted on the Tianhe-Ⅱ supercomputer demonstrated that ScaPSRS succeeded in scaling to 32 000 cores and outperformed state-of-the-art works significantly.http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2024021parallel sorting;regular sampling;load balance;big data |
| spellingShingle | Ying WANG Zhiguang CHEN Yutong LU A scalable parallel sorting algorithm by regular sampling for big data 大数据 parallel sorting;regular sampling;load balance;big data |
| title | A scalable parallel sorting algorithm by regular sampling for big data |
| title_full | A scalable parallel sorting algorithm by regular sampling for big data |
| title_fullStr | A scalable parallel sorting algorithm by regular sampling for big data |
| title_full_unstemmed | A scalable parallel sorting algorithm by regular sampling for big data |
| title_short | A scalable parallel sorting algorithm by regular sampling for big data |
| title_sort | scalable parallel sorting algorithm by regular sampling for big data |
| topic | parallel sorting;regular sampling;load balance;big data |
| url | http://www.j-bigdataresearch.com.cn/thesisDetails#10.11959/j.issn.2096-0271.2024021 |
| work_keys_str_mv | AT yingwang ascalableparallelsortingalgorithmbyregularsamplingforbigdata AT zhiguangchen ascalableparallelsortingalgorithmbyregularsamplingforbigdata AT yutonglu ascalableparallelsortingalgorithmbyregularsamplingforbigdata AT yingwang scalableparallelsortingalgorithmbyregularsamplingforbigdata AT zhiguangchen scalableparallelsortingalgorithmbyregularsamplingforbigdata AT yutonglu scalableparallelsortingalgorithmbyregularsamplingforbigdata |