A deep q-networks model for optimising decision-making process in the context of energy transition modelling

Ambitious decarbonization mandates and rising volatility in energy markets demand decision-support tools that can learn, adapt, and generalize beyond the static optimization frameworks that dominate contemporary energy transition modelling. Addressing this gap, we articulate and validate a Deep Q-Ne...

Full description

Saved in:
Bibliographic Details
Main Authors: Ana Tănăsescu, Cristian Bucur, Jean Vasile Andrei, Bogdan-George Tudorică, Dorel Mihai Paraschiv, Dorel Dușmănescu
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Environmental Research Communications
Subjects:
Online Access:https://doi.org/10.1088/2515-7620/adf530
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Ambitious decarbonization mandates and rising volatility in energy markets demand decision-support tools that can learn, adapt, and generalize beyond the static optimization frameworks that dominate contemporary energy transition modelling. Addressing this gap, we articulate and validate a Deep Q-Network agent that learns investment policies in a custom OpenAI Gym environment tailored to the dynamics of national energy mixes. The environment is parameterized with an open Eurostat panel (2012–2021, 24 indicators, 430 country–year records), cleaned to 173 observations and compressed to a 16-dimensional state vector via PCA after k-NN imputation, outlier filtering, and min-max normalization. The agent selects among marginal allocations to renewables, fossil fuels, or nuclear capacity. Rewards balance renewable-share growth against carbon-intensity reduction, and episodes span 25 simulated years. After 1 200 training episodes (≈5 min on a Google TPU), the Deep Q-Network converges within 200 episodes and outperforms a myopic greedy baseline by 4.98 utility units (average reward = –19.34 ± 11.06 versus –24.32 ± 10.09), while delivering a 27% improvement in CO _2 -intensity metrics (0.421 versus 0.331 t CO _2 /MWh). From this paper, researchers can gain an extensible benchmark that integrates publicly reproducible data, an openly specified environment, and a modular MLOps pipeline. Practitioners—policy-makers, regulators, utilities, and clean-energy investors - obtain an interpretable agent whose learned policies expose the long-run trade-offs between renewable deployment speed and carbon-abatement efficacy. By demonstrating that deep reinforcement learning can produce stable, near-optimal strategies under structural uncertainty and high-dimensional state spaces, the study furnishes both methodological and practical impetus for embedding adaptive, data-driven intelligence in future energy-transition analyses.
ISSN:2515-7620