EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm

With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throug...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhe Wang, Yao Shen, Zhou Lei
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3693
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850188252266364928
author Zhe Wang
Yao Shen
Zhou Lei
author_facet Zhe Wang
Yao Shen
Zhou Lei
author_sort Zhe Wang
collection DOAJ
description With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throughput limitations. Since CPU-based platforms struggle to support real-time big data analysis, the GPU is introduced to support real-time GA analysis. Most GPU GA algorithms are based on hashing methods, and these algorithms experience performance degradation when the load factor of the hash table is too high or when the data volume exceeds the GPU memory capacity limit. This paper proposes an efficient hash-based GPU-accelerated groupby aggregation algorithm (EGA) that addresses these limitations. EGA features different designs for different scenarios: single-pass EGA (SP-EGA) maintains high efficiency when data fit in the GPU memory, while multipass EGA (MP-EGA) supports GA for data exceeding the GPU memory capacity. EGA demonstrates significant acceleration: SP-EGA outperforms SOTA hash-based GPU algorithms by 1.16–5.39× at load factors >0.90 and surpasses SOTA sort-based GPU methods by 1.30–2.48×. MP-EGA achieves 6.45–29.12× speedup over SOTA CPU implementations.
format Article
id doaj-art-7d1501130dfd440cb143d28ec44299fc
institution OA Journals
issn 2076-3417
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-7d1501130dfd440cb143d28ec44299fc2025-08-20T02:15:55ZengMDPI AGApplied Sciences2076-34172025-03-01157369310.3390/app15073693EGA: An Efficient GPU Accelerated Groupby Aggregation AlgorithmZhe Wang0Yao Shen1Zhou Lei2Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaDepartment of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaDepartment of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, ChinaWith the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throughput limitations. Since CPU-based platforms struggle to support real-time big data analysis, the GPU is introduced to support real-time GA analysis. Most GPU GA algorithms are based on hashing methods, and these algorithms experience performance degradation when the load factor of the hash table is too high or when the data volume exceeds the GPU memory capacity limit. This paper proposes an efficient hash-based GPU-accelerated groupby aggregation algorithm (EGA) that addresses these limitations. EGA features different designs for different scenarios: single-pass EGA (SP-EGA) maintains high efficiency when data fit in the GPU memory, while multipass EGA (MP-EGA) supports GA for data exceeding the GPU memory capacity. EGA demonstrates significant acceleration: SP-EGA outperforms SOTA hash-based GPU algorithms by 1.16–5.39× at load factors >0.90 and surpasses SOTA sort-based GPU methods by 1.30–2.48×. MP-EGA achieves 6.45–29.12× speedup over SOTA CPU implementations.https://www.mdpi.com/2076-3417/15/7/3693GPUhashgroupby aggregation
spellingShingle Zhe Wang
Yao Shen
Zhou Lei
EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
Applied Sciences
GPU
hash
groupby aggregation
title EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
title_full EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
title_fullStr EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
title_full_unstemmed EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
title_short EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm
title_sort ega an efficient gpu accelerated groupby aggregation algorithm
topic GPU
hash
groupby aggregation
url https://www.mdpi.com/2076-3417/15/7/3693
work_keys_str_mv AT zhewang egaanefficientgpuacceleratedgroupbyaggregationalgorithm
AT yaoshen egaanefficientgpuacceleratedgroupbyaggregationalgorithm
AT zhoulei egaanefficientgpuacceleratedgroupbyaggregationalgorithm