Leveraging Pre-Built Catalogs and Object-Level Scheduling to Eliminate I/O Bottlenecks in HPC Environments

Modern High-Performance Computing (HPC) environments face mounting challenges due to the shift from large to small file datasets, along with an increasing number of users and parallelized applications. As HPC systems rely on Parallel File Systems (PFS), such as Lustre for data processing, performanc...

Full description

Saved in:
Bibliographic Details
Main Authors: Seoyeong Lee, Junghwan Park, Yoochan Kim, Safdar Jamil, Awais Khan, Seung Woo Son, Jae-Kook Lee, Do-Sik An, Taeyoung Hong, Youngjae Kim
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10935346/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Modern High-Performance Computing (HPC) environments face mounting challenges due to the shift from large to small file datasets, along with an increasing number of users and parallelized applications. As HPC systems rely on Parallel File Systems (PFS), such as Lustre for data processing, performance bottlenecks stemming from Object Storage Target (OST) contention have become a significant concern. Existing solutions, such as LADS with its object-level scheduling approach, fall short in large-scale HPC environments due to their inability to effectively address metadata I/O bottlenecks and the growing number of I/O processes. This study highlights the pressing need for a comprehensive solution that tackles both OST contention and metadata I/O challenges in diverse HPC workloads. To address these challenges, we propose SwiftLoad, an object-level I/O scheduling framework that leverages a metadata catalog to enhance the performance and efficiency of parallel HPC utilities. The adoption of the metadata catalog mitigates the metadata I/O bottlenecks that commonly occur in HPC utilities, a challenge that is particularly pronounced in object-level I/O scheduling. SwiftLoad addresses OST contention and the uneven distribution of I/O processes across different OSTs through mathematical modeling and incorporates a Loader Configuration Module to regulate the number of I/O processes. Evaluated with two representative utilities&#x2014;data deduplication profiling and data augmentation&#x2014;SwiftLoad achieved performance improvements of up to <inline-formula> <tex-math notation="LaTeX">$5.63\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$11.0\times $ </tex-math></inline-formula>, respectively, on a production supercomputer.
ISSN:2169-3536