Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings
2026-06-01 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors explore how to better control energy use in homes that have solar panels and batteries using a type of AI called deep reinforcement learning (DRL). Because DRL is usually hard to understand, they created a method to explain how these AI systems make decisions. They tested their approach on both simulated and real data and found that certain types of DRL (on-policy algorithms like A2C and PPO) work better for managing energy. Their explainable method not only helps save on electricity costs but also makes it clear why the AI chooses certain actions.
Deep Reinforcement LearningExplainable AIPhotovoltaic PanelsEnergy Storage SystemsOn-policy AlgorithmsAdvantage Actor Critic (A2C)Proximal Policy Optimization (PPO)Battery ManagementDynamic Electricity Pricing
Authors
Hallah Shahid Butt, Qiong Huang, Gökhan Demirel, Kevin Förderer, Erfan Tajalli-Ardekani, Simnon Waczowicz, Luigi Spatafora, Veit Hagenmeyer, Benjamin Schäfer
Abstract
The increasing integration of renewable energy sources into power systems, particularly in buildings equipped with photovoltaic (PV) panels and energy storage systems, introduces significant complexity in energy systems. Volatile power generation, varying electricity tariffs, and increased entities, e.g., PV systems, and heat pumps, have increased the complexity and made the system harder to operate. This leads to the demand for additional control and optimization routes including data-based controls, such as reinforcement learning. While deep reinforcement learning (DRL) has emerged as a promising solution to optimize building operations in dynamic and ever more complex environments, its black-box nature impedes user trust and practical adoption. This paper presents a framework for explainable deep reinforcement learning (XRL) applied to energy management in residential buildings. We demonstrate its usage on both synthetic data but also on real-world data from the Living Lab Energy Campus (LLEC) at KIT. We train and compare both on-policy and off-policy DRL agents on an expanded state space that incorporates real-time measurements (demand, PV generation, battery power, state of charge), external signals (dynamic electricity price, local weather data), calendrical and holiday indicators, and forecasts for demand and price. Our experimental results indicate that on-policy algorithms, particularly Advantage Actor Critic (A2C) and Proximal Policy Optimization (PPO), outperform off-policy methods in terms of cumulative rewards and policy stability. To explain these models, we employ post-hoc interpretation techniques to elaborate the learned control policies. Our findings demonstrate that the XRL framework not only reduces electricity costs through optimal battery management, but also provides transparent, actionable insights into the agent's decision-making process.