Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU

One objective of GPU scheduling in cloud systems is to minimize the completion times of given deep learning models. This is important for deep learning in cloud environments because deep learning workloads require a lot of time to finish, and misallocation of these workloads can create a huge increa...

Full description

Saved in:
Bibliographic Details
Main Authors: Abuda Chad Ferrino, Tae Young Choe
Format: Article
Language:English
Published: University North 2025-01-01
Series:Tehnički Glasnik
Subjects:
Online Access:https://hrcak.srce.hr/file/480464
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850103067463122944
author Abuda Chad Ferrino
Tae Young Choe
author_facet Abuda Chad Ferrino
Tae Young Choe
author_sort Abuda Chad Ferrino
collection DOAJ
description One objective of GPU scheduling in cloud systems is to minimize the completion times of given deep learning models. This is important for deep learning in cloud environments because deep learning workloads require a lot of time to finish, and misallocation of these workloads can create a huge increase in job completion time. Difficulties of GPU scheduling come from a diverse type of parameters including model architectures and GPU types. Some of these model architectures are CPU-intensive rather than GPU-intensive which creates a different hardware requirement when training different models. The previous GPU scheduling research had used a small set of parameters, which did not include CPU parameters, which made it difficult to reduce the job completion time (JCT). This paper introduces an improved GPU scheduling approach that reduces job completion time by predicting execution time and various resource consumption parameters including GPU Utilization%, GPU Memory Utilization%, GPU Memory, and CPU Utilization%. The experimental results show that the proposed model improves JCT by up to 40.9% on GPU Allocation based on Computing Efficiency compared to Driple.
format Article
id doaj-art-0c2e900834784fd0a1ce9c9fdaeafc41
institution DOAJ
issn 1846-6168
1848-5588
language English
publishDate 2025-01-01
publisher University North
record_format Article
series Tehnički Glasnik
spelling doaj-art-0c2e900834784fd0a1ce9c9fdaeafc412025-08-20T02:39:38ZengUniversity NorthTehnički Glasnik1846-61681848-55882025-01-0119346147210.31803/tg-20240112104444Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPUAbuda Chad Ferrino0Tae Young Choe1Department of Computer AI Convergence Engineering, Kumoh National Institute of Technology, 61 Daehak-ro, Gumi-si, Gyeongsangbuk-do, 39177, Republic of KoreaDepartment of Computer AI Convergence Engineering, Kumoh National Institute of Technology, 61 Daehak-ro, Gumi-si, Gyeongsangbuk-do, 39177, Republic of KoreaOne objective of GPU scheduling in cloud systems is to minimize the completion times of given deep learning models. This is important for deep learning in cloud environments because deep learning workloads require a lot of time to finish, and misallocation of these workloads can create a huge increase in job completion time. Difficulties of GPU scheduling come from a diverse type of parameters including model architectures and GPU types. Some of these model architectures are CPU-intensive rather than GPU-intensive which creates a different hardware requirement when training different models. The previous GPU scheduling research had used a small set of parameters, which did not include CPU parameters, which made it difficult to reduce the job completion time (JCT). This paper introduces an improved GPU scheduling approach that reduces job completion time by predicting execution time and various resource consumption parameters including GPU Utilization%, GPU Memory Utilization%, GPU Memory, and CPU Utilization%. The experimental results show that the proposed model improves JCT by up to 40.9% on GPU Allocation based on Computing Efficiency compared to Driple.https://hrcak.srce.hr/file/480464cloud computingconvolutional neural networkdeep learningGPU job schedulingperformance estimation
spellingShingle Abuda Chad Ferrino
Tae Young Choe
Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU
Tehnički Glasnik
cloud computing
convolutional neural network
deep learning
GPU job scheduling
performance estimation
title Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU
title_full Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU
title_fullStr Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU
title_full_unstemmed Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU
title_short Efficient Deep Learning Job Allocation in Cloud Systems by Predicting Resource Consumptions including GPU and CPU
title_sort efficient deep learning job allocation in cloud systems by predicting resource consumptions including gpu and cpu
topic cloud computing
convolutional neural network
deep learning
GPU job scheduling
performance estimation
url https://hrcak.srce.hr/file/480464
work_keys_str_mv AT abudachadferrino efficientdeeplearningjoballocationincloudsystemsbypredictingresourceconsumptionsincludinggpuandcpu
AT taeyoungchoe efficientdeeplearningjoballocationincloudsystemsbypredictingresourceconsumptionsincludinggpuandcpu