Open Power System Datasets and Open Simulation Engines: A Survey Toward Machine Learning Applications

A major factor behind the success of machine learning (ML) models in multiple domains is the availability and accessibility of large, labeled, and well-organized datasets for training and benchmarking. In comparison, power grid datasets face three major challenges: (i) real-world data is often restr...

Full description

Saved in:
Bibliographic Details
Main Authors: Ignacio Aravena, Chih-Che Sun, Ranyu Shi, Subir Majumder, Weihang Yan, Jhi-Young Joo, Le Xie, Jiyu Wang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Access Journal of Power and Energy
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11015807/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A major factor behind the success of machine learning (ML) models in multiple domains is the availability and accessibility of large, labeled, and well-organized datasets for training and benchmarking. In comparison, power grid datasets face three major challenges: (i) real-world data is often restricted by regulatory constraints, privacy reasons, or security concerns, making it difficult to obtain and work with; (ii) synthetic datasets, which are created to address these limitations, often have incomplete information and are released using specialized tools, making them inaccessible to the broader community; and, (iii) input-output datasets are difficult to generate through simulation for non-experts because open-source simulators are not known outside the power system community. This survey addresses these challenges by serving as an entry point to publicly available datasets and simulators for researchers venturing in this area. We review the current landscape of open-source power network data, machine models, consumer demand profiles, renewable generation data, and inverter models. We also examine open-source power system simulators, which are crucial for generating high-quality, high-fidelity power grid datasets. We aim to provide a foundation for overcoming data scarcity and advance towards a structured web of datasets and simulators to support the development of ML for power systems.
ISSN:2687-7910