EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throug...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/7/3693 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850188252266364928 |
|---|---|
| author | Zhe Wang Yao Shen Zhou Lei |
| author_facet | Zhe Wang Yao Shen Zhou Lei |
| author_sort | Zhe Wang |
| collection | DOAJ |
| description | With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throughput limitations. Since CPU-based platforms struggle to support real-time big data analysis, the GPU is introduced to support real-time GA analysis. Most GPU GA algorithms are based on hashing methods, and these algorithms experience performance degradation when the load factor of the hash table is too high or when the data volume exceeds the GPU memory capacity limit. This paper proposes an efficient hash-based GPU-accelerated groupby aggregation algorithm (EGA) that addresses these limitations. EGA features different designs for different scenarios: single-pass EGA (SP-EGA) maintains high efficiency when data fit in the GPU memory, while multipass EGA (MP-EGA) supports GA for data exceeding the GPU memory capacity. EGA demonstrates significant acceleration: SP-EGA outperforms SOTA hash-based GPU algorithms by 1.16–5.39× at load factors >0.90 and surpasses SOTA sort-based GPU methods by 1.30–2.48×. MP-EGA achieves 6.45–29.12× speedup over SOTA CPU implementations. |
| format | Article |
| id | doaj-art-7d1501130dfd440cb143d28ec44299fc |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-7d1501130dfd440cb143d28ec44299fc2025-08-20T02:15:55ZengMDPI AGApplied Sciences2076-34172025-03-01157369310.3390/app15073693EGA: An Efficient GPU Accelerated Groupby Aggregation AlgorithmZhe Wang0Yao Shen1Zhou Lei2Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaDepartment of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaDepartment of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaWith the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throughput limitations. Since CPU-based platforms struggle to support real-time big data analysis, the GPU is introduced to support real-time GA analysis. Most GPU GA algorithms are based on hashing methods, and these algorithms experience performance degradation when the load factor of the hash table is too high or when the data volume exceeds the GPU memory capacity limit. This paper proposes an efficient hash-based GPU-accelerated groupby aggregation algorithm (EGA) that addresses these limitations. EGA features different designs for different scenarios: single-pass EGA (SP-EGA) maintains high efficiency when data fit in the GPU memory, while multipass EGA (MP-EGA) supports GA for data exceeding the GPU memory capacity. EGA demonstrates significant acceleration: SP-EGA outperforms SOTA hash-based GPU algorithms by 1.16–5.39× at load factors >0.90 and surpasses SOTA sort-based GPU methods by 1.30–2.48×. MP-EGA achieves 6.45–29.12× speedup over SOTA CPU implementations.https://www.mdpi.com/2076-3417/15/7/3693GPUhashgroupby aggregation |
| spellingShingle | Zhe Wang Yao Shen Zhou Lei EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm Applied Sciences GPU hash groupby aggregation |
| title | EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm |
| title_full | EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm |
| title_fullStr | EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm |
| title_full_unstemmed | EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm |
| title_short | EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm |
| title_sort | ega an efficient gpu accelerated groupby aggregation algorithm |
| topic | GPU hash groupby aggregation |
| url | https://www.mdpi.com/2076-3417/15/7/3693 |
| work_keys_str_mv | AT zhewang egaanefficientgpuacceleratedgroupbyaggregationalgorithm AT yaoshen egaanefficientgpuacceleratedgroupbyaggregationalgorithm AT zhoulei egaanefficientgpuacceleratedgroupbyaggregationalgorithm |