Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems

Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to mod...

Full description

Saved in:
Bibliographic Details
Main Authors: Edson Ramiro Lucas Filho, George Savva, Lun Yang, Kebo Fu, Jianqiang Shen, Herodotos Herodotou
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/17/4/170
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849713368576819200
author Edson Ramiro Lucas Filho
George Savva
Lun Yang
Kebo Fu
Jianqiang Shen
Herodotos Herodotou
author_facet Edson Ramiro Lucas Filho
George Savva
Lun Yang
Kebo Fu
Jianqiang Shen
Herodotos Herodotou
author_sort Edson Ramiro Lucas Filho
collection DOAJ
description Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).
format Article
id doaj-art-837fbb03dc9047cc9c4bdf7ba2c64528
institution DOAJ
issn 1999-5903
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj-art-837fbb03dc9047cc9c4bdf7ba2c645282025-08-20T03:13:58ZengMDPI AGFuture Internet1999-59032025-04-0117417010.3390/fi17040170Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage SystemsEdson Ramiro Lucas Filho0George Savva1Lun Yang2Kebo Fu3Jianqiang Shen4Herodotos Herodotou5Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, CyprusDepartment of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, CyprusHuawei Technologies Co., Ltd., Shenzhen 518100, ChinaHuawei Technologies Co., Ltd., Shenzhen 518100, ChinaHuawei Technologies Co., Ltd., Shenzhen 518100, ChinaDepartment of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, CyprusModern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).https://www.mdpi.com/1999-5903/17/4/170multi-tiered data storage systemsstreaming machine learningworkload patterns
spellingShingle Edson Ramiro Lucas Filho
George Savva
Lun Yang
Kebo Fu
Jianqiang Shen
Herodotos Herodotou
Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
Future Internet
multi-tiered data storage systems
streaming machine learning
workload patterns
title Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
title_full Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
title_fullStr Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
title_full_unstemmed Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
title_short Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
title_sort employing streaming machine learning for modeling workload patterns in multi tiered data storage systems
topic multi-tiered data storage systems
streaming machine learning
workload patterns
url https://www.mdpi.com/1999-5903/17/4/170
work_keys_str_mv AT edsonramirolucasfilho employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems
AT georgesavva employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems
AT lunyang employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems
AT kebofu employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems
AT jianqiangshen employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems
AT herodotosherodotou employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems