Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems
Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to mod...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Future Internet |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-5903/17/4/170 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849713368576819200 |
|---|---|
| author | Edson Ramiro Lucas Filho George Savva Lun Yang Kebo Fu Jianqiang Shen Herodotos Herodotou |
| author_facet | Edson Ramiro Lucas Filho George Savva Lun Yang Kebo Fu Jianqiang Shen Herodotos Herodotou |
| author_sort | Edson Ramiro Lucas Filho |
| collection | DOAJ |
| description | Modern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average). |
| format | Article |
| id | doaj-art-837fbb03dc9047cc9c4bdf7ba2c64528 |
| institution | DOAJ |
| issn | 1999-5903 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Future Internet |
| spelling | doaj-art-837fbb03dc9047cc9c4bdf7ba2c645282025-08-20T03:13:58ZengMDPI AGFuture Internet1999-59032025-04-0117417010.3390/fi17040170Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage SystemsEdson Ramiro Lucas Filho0George Savva1Lun Yang2Kebo Fu3Jianqiang Shen4Herodotos Herodotou5Department of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, CyprusDepartment of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, CyprusHuawei Technologies Co., Ltd., Shenzhen 518100, ChinaHuawei Technologies Co., Ltd., Shenzhen 518100, ChinaHuawei Technologies Co., Ltd., Shenzhen 518100, ChinaDepartment of Electrical Engineering and Computer Engineering and Informatics, Cyprus University of Technology, Limassol 3036, CyprusModern multi-tiered data storage systems optimize file access by managing data across a hybrid composition of caches and storage tiers while using policies whose decisions can severely impact the storage system’s performance. Recently, different Machine-Learning (ML) algorithms have been used to model access patterns from complex workloads. Yet, current approaches train their models offline in a batch-based approach, even though storage systems are processing a stream of file requests with dynamic workloads. In this manuscript, we advocate the streaming ML paradigm for modeling access patterns in multi-tiered storage systems as it introduces various advantages, including high efficiency, high accuracy, and high adaptability. Moreover, representative file access patterns, including temporal, spatial, length, and frequency patterns, are identified for individual files, directories, and file formats, and used as features. Streaming ML models are developed, trained, and tested on different file system traces for making two types of predictions: the next offset to be read in a file and the future file hotness. An extensive evaluation is performed with production traces provided by Huawei Technologies, showing that the models are practical, with low memory consumption (<1.3 MB) and low training delay (<1.8 ms per training instance), and can make accurate predictions online (0.98 F1 score and 0.07 MAE on average).https://www.mdpi.com/1999-5903/17/4/170multi-tiered data storage systemsstreaming machine learningworkload patterns |
| spellingShingle | Edson Ramiro Lucas Filho George Savva Lun Yang Kebo Fu Jianqiang Shen Herodotos Herodotou Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems Future Internet multi-tiered data storage systems streaming machine learning workload patterns |
| title | Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems |
| title_full | Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems |
| title_fullStr | Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems |
| title_full_unstemmed | Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems |
| title_short | Employing Streaming Machine Learning for Modeling Workload Patterns in Multi-Tiered Data Storage Systems |
| title_sort | employing streaming machine learning for modeling workload patterns in multi tiered data storage systems |
| topic | multi-tiered data storage systems streaming machine learning workload patterns |
| url | https://www.mdpi.com/1999-5903/17/4/170 |
| work_keys_str_mv | AT edsonramirolucasfilho employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems AT georgesavva employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems AT lunyang employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems AT kebofu employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems AT jianqiangshen employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems AT herodotosherodotou employingstreamingmachinelearningformodelingworkloadpatternsinmultitiereddatastoragesystems |