Novel quality of service-oriented Spark job scheduler

Spark is a widely-used big data computing framework to process and analyze the explosive-growing data. The cloud can provide on-demand and pay-as-you-go computing resources to satisfy the users’ requirements. Currently, many organizations have deployed big data computing clusters on the cloud. These...

Full description

Saved in:

Bibliographic Details
Main Authors:	HE Yulin, MO Peiheng, Fournier-Viger Philippe, HUANG Zhexue
Format:	Article
Language:	zho
Published:	China InfoCom Media Group 2025-01-01
Series:	大数据
Subjects:	quality of service Spark job scheduler cloud environment deep reinforcement learning
Online Access:	http://www.j-bigdataresearch.com.cn/thesisDetails?columnId=109252703&Fpath=home&index=0
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Spark is a widely-used big data computing framework to process and analyze the explosive-growing data. The cloud can provide on-demand and pay-as-you-go computing resources to satisfy the users’ requirements. Currently, many organizations have deployed big data computing clusters on the cloud. These clusters are required to efficiently handle the Spark job scheduling problem so as to meet the QoS requirements of various users, such as reducing the cost of resource usage and shortening the job response time. However, most of the existing methods don’t consider the requirements of multiple users together, and fail to take into account the characteristics of Spark cluster environmentsand workloads. To address the above-mentioned challenge, a new Spark job scheduler based on DRL technology was designed to adapt to multiple QoS requirements by modeling the job scheduling problem of Spark clusters deployed in the cloud. A DRL cluster simulation environment was built to train the core DRL Agent of job scheduler. In the scheduling environment, training methods based on absolute deep <italic>Q</italic>-network and a combination of proximal policy optimization and generalized advantage estimation were implemented, enabling DRL agent to adaptively learn the characteristics of different types of jobs as well as the characteristics of dynamic and bursty cluster environments. This enables rational scheduling of Spark jobs to reduce the total usage cost of the cluster and shorten the average response time of jobs. Testing results of DRL Agent on the benchmark suite show that compared with other existing Spark job scheduling solutions, the newly designed DRL Agent job scheduler in this paper has significant advantages in terms of total cluster usage cost, average job response time and QoS achievement rate, which confirming the feasibility and effectiveness of the job scheduler designed in this paper.
ISSN:	2096-0271

Novel quality of service-oriented Spark job scheduler

Similar Items