EGA: An Efficient GPU Accelerated Groupby Aggregation Algorithm

With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throug...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhe Wang, Yao Shen, Zhou Lei
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3693
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the exponential growth of big data, efficient groupby aggregation (GA) has become critical for real-time analytics across industries. GA is a key method for extracting valuable information. Current CPU-based solutions (such as large-scale parallel processing platforms) face computational throughput limitations. Since CPU-based platforms struggle to support real-time big data analysis, the GPU is introduced to support real-time GA analysis. Most GPU GA algorithms are based on hashing methods, and these algorithms experience performance degradation when the load factor of the hash table is too high or when the data volume exceeds the GPU memory capacity limit. This paper proposes an efficient hash-based GPU-accelerated groupby aggregation algorithm (EGA) that addresses these limitations. EGA features different designs for different scenarios: single-pass EGA (SP-EGA) maintains high efficiency when data fit in the GPU memory, while multipass EGA (MP-EGA) supports GA for data exceeding the GPU memory capacity. EGA demonstrates significant acceleration: SP-EGA outperforms SOTA hash-based GPU algorithms by 1.16–5.39× at load factors >0.90 and surpasses SOTA sort-based GPU methods by 1.30–2.48×. MP-EGA achieves 6.45–29.12× speedup over SOTA CPU implementations.
ISSN:2076-3417