Conference Paper

A Centralised Soft Actor Critic Deep Reinforcement Learning Approach to District Demand Side Management through CityLearn

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This is why some studies have implemented their solutions on open environments that allow researches to share, replicate, and compare their designed models. An example includes the work made by Kathirgamanathan et al. [58], where a centralized Soft Actor-Critic (SAC) algorithm was implemented to flatten and smooth the aggregated curve of the electrical demand of a district using the CityLearn environment [119]. The proposed multi-objective cost function consisted of the peak electricity demand, the average daily electricity peak demand, ramping, the load factor, and the net electricity consumption of the district over the evaluation period. ...
... In this case study, we used the CityLearn environment and replicated the methodology presented by Kathirgamanathan et al. [58]. However, instead of using a centralized SAC agent, we implemented a centralized DDPG agent. ...
... The hyperparameters used for training the DDPG algorithm can be seen in Table 6. The reward function was designed based on the peak consumption penalization as presented in [58], with incentive charging during the nights. This reward function is defined based on Eq 7 and Eq 8. ...
Article
Full-text available
This paper presents a review of up-to-date Machine Learning (ML) techniques applied to photovoltaic (PV) systems, with a special focus on deep learning. It examines the use of ML applied to control, islanding detection, management, fault detection and diagnosis, forecasting irradiance and power generation, sizing, and site adaptation in PV systems. The contribution of this work is three fold: first, we review more than 100 research articles, most of them from the last five years, that applied state-of-the-art ML techniques in PV systems; second, we review resources where researchers can find open data-sets, source code, and simulation environments that can be used to test ML algorithms; third, we provide a case study for each of one of the topics with open-source code and data to facilitate researchers interested in learning about these topics to introduce themselves to implementations of up-to-date ML techniques applied to PV systems. Also, we provide some directions, insights, and possibilities for future development.
... A bi-level deep reinforcement learning method for appliance scheduling was recently presented in research. Additionally, it included EV and energy storage charge and discharge schedules (Kathirgamanathan et al., 2020). The Dijkstra algorithm was used in (Dey et al., 2022) to solve a load scheduling problem. ...
Article
To enhance the low reliability of supply that has resulted in an increasingly serious energy crisis and environmental problems, extensive research on new clean renewable energy and energy management technologies with high effectiveness, low cost, and environmental friendliness is required. Demand-side management systems are effective tools for managing renewable energy. Unfortunately, the intermittent nature of renewable energy is the principal drawback of renewable energy sources. This necessitates the development of intelligent energy management systems to increase system reliability and improve efficiency. Demand-side energy management systems are an excellent choice for several reasons. Firstly, they enable consumers to actively monitor and control their energy usage, leading to significant cost savings through reduced consumption during peak hours and improved overall efficiency. Recent advancements in demand-side energy management represent a significant shift towards more intelligent, flexible, and sustainable energy management practices, empowering consumers and utilities alike to optimize energy usage and contribute to a more resilient and efficient energy system. Demand-side management challenges and demand response categories are covered in this paper. It also introduces and analyzes the fundamental control strategies of demand response. Finally, it gives a description of the present difficulties and potential future developments in the creation of novel high efficiency and multilevel control mechanisms.
... Glatt et al. [20] developed an energymanagement system for controlling energy storages of individual buildings in a microgrid using a decentralized actor-critic reinforcement learning algorithm and a centralized critic. Kathirgamanathan et al. [21] designed and tuned an RL algorithm to flatten and smooth the aggregated electricity demand curve of a building district. The latter two used CityLearn as a simulation environment. ...
Article
Full-text available
For the energy transition in the residential sector, heat pumps are a core technology for decarbonizing thermal energy production for space heating and domestic hot water. Electricity generation from on-site photovoltaic (PV) systems can also contribute to a carbon-neutral building stock. However, both will increase the stress on the electricity grid. This can be reduced by using appropriate control strategies to match electricity consumption and production. In recent years, artificial intelligence-based approaches such as reinforcement learning (RL) have become increasingly popular for energy-system management. However, the literature shows a lack of investigation of RL-based controllers for multi-family building energy systems, including an air source heat pump, thermal storage, and a PV system, although this is a common system configuration. Therefore, in this study, a model of such an energy system and RL-based controllers were developed and simulated with physical models and compared with conventional rule-based approaches. Four RL algorithms were investigated for two objectives, and finally, the soft actor–critic algorithm was selected for the annual simulations. The first objective, to maintain only the required temperatures in the thermal storage, could be achieved by the developed RL agent. However, the second objective, to additionally improve the PV self-consumption, was better achieved by the rule-based controller. Therefore, further research on the reward function, hyperparameters, and advanced methods, including long short-term memory layers, as well as a training for longer time periods than six days are suggested.
... • AMPC [19]: An adaptive Model-Predictive-Control mehtod. • SAC [21]: A Soft Actor-Critic mehtod that uses all agents with decentralization. • ES [22]: Evolution-Strategy mehtod with adaptive covariance matrix. ...
Preprint
Full-text available
Continuous greenhouse gas emissions are causing global warming and impacting the habitats of many animals. Researchers in the field of electric power are making efforts to mitigate this situation. Operating and maintaining the power grid in an economic, low-carbon, and stable is challenging. To address the issue, we propose a grid dispatching technique that combines prediction technology, reinforcement learning, and optimization technology. Prediction technology can forecast future power demand and solar power generation, while reinforcement learning and optimization technology can make charging and discharging decisions for energy storage devices based on current and future grid conditions. In the power system, the aggregation of distributed energy resources increases uncertainty, particularly due to the fluctuating generation of renewable energy. This requires the use of advanced predictive control techniques to ensure long-term economic and decarbonization goals. In this paper, we present a real-time dispatching framework that integrates deep learning-based prediction, reinforcement learning-based decision-making, and stochastic optimization techniques. The framework can rapidly adapt to target uncertainty caused by various factors in real-time data distribution and control processes. The proposed framework achieved global Champion in the NeurIPS Challenge 2022 competition and demonstrated its effectiveness in practical scenarios of intelligent building energy management.
... • AMPC [26]: An adaptive Model-Predictive-Control method. • SAC [28]: A Soft Actor-Critic method that uses all agents with decentralization. • ES [29]: Evolution-Strategy method with adaptive covariance matrix. ...
Article
Full-text available
The emission of greenhouse gases is a major contributor to global warming. Carbon emissions from the electricity industry account for over 40% of the total carbon emissions. Researchers in the field of electric power are making efforts to mitigate this situation. Operating and maintaining the power grid in an economic, low-carbon, and stable environment is challenging. To address the issue, we propose a grid dispatching technique that combines deep learning-based forecasting technology, reinforcement learning, and optimization technology. Deep learning-based forecasting can forecast future power demand and solar power generation, while reinforcement learning and optimization technology can make charging and discharging decisions for energy storage devices based on current and future grid conditions. In the optimization method, we simplify the complex electricity environment to speed up the solution. The combination of proposed deep learning-based forecasting and stochastic optimization with online data augmentation is used to address the uncertainty of the dispatch system. A multi-agent reinforcement learning method is proposed to utilize team reward among energy storage devices. At last, we achieved the best results by combining reinforcement and optimization strategies. Comprehensive experiments demonstrate the effectiveness of our proposed framework.
... In particular implicit MBRL, where the entire procedure (e.g., model learning and planning) is optimized for optimal policy computation (Moerland, Broekens, and Jonker 2020). However, unlike existing works (e.g., (Tamar et al. 2016;Karkus, Hsu, and Lee 2017;Racanière et al. 2017;Guez et al. 2018;Schrittwieser et al. 2020) that build a model based on (recurrent/convolutional) neural networks (NNs) with primary restrictions to discrete state and action space, our method learns how to plan by solving optimization and adapting its parameters; hence, it is amenable to a wide range of applications with continuous states and actions. The present work is closely related to (Ghadimi, Perkins, and Powell 2020;Agrawal et al. 2020), which also use convex optimization as a policy class to handle uncertainty. ...
Article
Modern power systems will have to face difficult challenges in the years to come: frequent blackouts in urban areas caused by high peaks of electricity demand, grid instability exacerbated by the intermittency of renewable generation, and climate change on a global scale amplified by increasing carbon emissions. While current practices are growingly inadequate, the pathway of artificial intelligence (AI)-based methods to widespread adoption is hindered by missing aspects of trustworthiness. The CityLearn Challenge is an exemplary opportunity for researchers from multi-disciplinary fields to investigate the potential of AI to tackle these pressing issues within the energy domain, collectively modeled as a reinforcement learning (RL) task. Multiple real-world challenges faced by contemporary RL techniques are embodied in the problem formulation. In this paper, we present a novel method using the solution function of optimization as policies to compute the actions for sequential decision-making, while notably adapting the parameters of the optimization model from online observations. Algorithmically, this is achieved by an evolutionary algorithm under a novel trajectory-based guidance scheme. Formally, the global convergence property is established. Our agent ranked first in the latest 2021 CityLearn Challenge, being able to achieve superior performance in almost all metrics while maintaining some key aspects of interpretability.
... The virtual community is created in CityLearn, an OpenAI Gym environment that was created for the benchmarking of RBC, MPC and RLC algorithms for DR studies [33]. CityLearn has been used extensively as a reference environment to demonstrate incentive-based DR [44], collaborative DR [45], coordinated energy management [46], benchmarking DR algorithms [47] and voltage regulation [48]. ...
Article
Building and power generation decarbonization present new challenges in electric grid reliability as a result of renewable energy source intermittency and increase in grid load caused by end-use electrification. To restore reliability, grid-interactive efficient buildings can provide grid flexibility services through demand response. Reinforcement learning is well-suited for energy management in grid-interactive efficient buildings as it is able to adapt to unique building characteristics compared to rule-based control and model predictive control. Yet, factors hindering the adoption of reinforcement learning in real-world applications include its sample inefficiency during training, control security and generalizability. Here we address these challenges by proposing the MERLIN framework for the training, evaluation, deployment and transfer of control policies for distributed energy resources in grid-interactive communities for different levels of data availability. We utilize a real-world community smart meter dataset to show that while independently trained battery control policies can learn unique occupant behavior and provide up to 60% performance improvement at the district level, transfer learning provides comparable building and district level performance while reducing training costs.
... Another study [2] used a single agent centralized RL controller along with utilizing an incentive-based DR program to control thermal storage of multiple buildings to optimize the energy use. [10] also applied a single agent centralized RL based on SAC on the CityLearn environment for optimizing peak demands. It has limitations of the manual reward shaping that limits the generalization ability of the RL controller. ...
Chapter
Full-text available
The proliferation of distributed energy sources (DESs) in buildings offers a unique opportunity for demand response (DR) programs to reduce peak energy demands effectively. The DR program improves building energy demands by shifting peak demands to the period of peak renewable energy generation or exploiting energy storage systems (ESSs) during peak demand. However, managing supply and demand with penetration of DES becomes a significant challenge. This chapter explores incorporating a DR program into a Multi-agent Reinforcement Learning (MA-RL) controller to flatten the peak demands of a cluster of buildings containing DES. An MA-RL controller based on SAC is employed to address the coordinated control of DES of buildings. We designed a novel reward function based on the DR program for the MA-RL controller. The performance of the proposed reward function is evaluated against the baseline RL MARLISA. Our results show that the proposed MA-RL controller with a novel reward function outperforms the baseline method.KeywordsDemand side managementDemand responseBuilding energy management systemCityLearnMulti-agent reinforcement learning
... The digital twin is created in CityLearn, an OpenAI Gym environment that was created for the benchmarking of RL algorithms for demand response studies [34]. CityLearn has been used extensively as a reference environment to demonstrate incentive-based DR [44], collaborative DR [45], coordinated energy management [46,47,48], benchmarking DR algorithms [49,50,51] and voltage regulation [52]. ...
Preprint
Full-text available
The decarbonization of buildings presents new challenges for the reliability of the electrical grid as a result of the intermittency of renewable energy sources and increase in grid load brought about by end-use electrification. To restore reliability, grid-interactive efficient buildings can provide flexibility services to the grid through demand response. Residential demand response programs are hindered by the need for manual intervention by customers. To maximize the energy flexibility potential of residential buildings, an advanced control architecture is needed. Reinforcement learning is well-suited for the control of flexible resources as it is able to adapt to unique building characteristics compared to expert systems. Yet, factors hindering the adoption of RL in real-world applications include its large data requirements for training, control security and generalizability. Here we address these challenges by proposing the MERLIN framework and using a digital twin of a real-world 17-building grid-interactive residential community in CityLearn. We show that 1) independent RL-controllers for batteries improve building and district level KPIs compared to a reference RBC by tailoring their policies to individual buildings, 2) despite unique occupant behaviours, transferring the RL policy of any one of the buildings to other buildings provides comparable performance while reducing the cost of training, 3) training RL-controllers on limited temporal data that does not capture full seasonality in occupant behaviour has little effect on performance. Although, the zero-net-energy (ZNE) condition of the buildings could be maintained or worsened as a result of controlled batteries, KPIs that are typically improved by ZNE condition (electricity price and carbon emissions) are further improved when the batteries are managed by an advanced controller.
... In particular implicit MBRL, where the entire procedure (e.g., model learning and planning) is optimized for optimal policy computation (Moerland, Broekens, and Jonker 2020). However, unlike existing works (e.g., (Tamar et al. 2016;Karkus, Hsu, and Lee 2017;Racanière et al. 2017;Guez et al. 2018;Schrittwieser et al. 2020) that build a model based on (recurrent/convolutional) neural networks (NNs) with primary restrictions to discrete state and action space, our method learns how to plan by solving optimization and adapting its parameters; hence, it is amenable to a wide range of applications with continuous states and actions. The present work is closely related to (Ghadimi, Perkins, and Powell 2020;Agrawal et al. 2020), which also use convex optimization as a policy class to handle uncertainty. ...
Preprint
Full-text available
Modern power systems will have to face difficult challenges in the years to come: frequent blackouts in urban areas caused by high power demand peaks, grid instability exacerbated by intermittent renewable generation, and global climate change amplified by rising carbon emissions. While current practices are growingly inadequate, the path to widespread adoption of artificial intelligence (AI) methods is hindered by missing aspects of trustworthiness. The CityLearn Challenge is an exemplary opportunity for researchers from multiple disciplines to investigate the potential of AI to tackle these pressing issues in the energy domain, collectively modeled as a reinforcement learning (RL) task. Multiple real-world challenges faced by contemporary RL techniques are embodied in the problem formulation. In this paper, we present a novel method using the solution function of optimization as policies to compute actions for sequential decision-making, while notably adapting the parameters of the optimization model from online observations. Algorithmically, this is achieved by an evolutionary algorithm under a novel trajectory-based guidance scheme. Formally, the global convergence property is established. Our agent ranked first in the latest 2021 CityLearn Challenge, being able to achieve superior performance in almost all metrics while maintaining some key aspects of interpretability.
... With the rapid growth in this field, the research community is contributing to accelerating research and implementation efforts for large-scale BEMS integration with RL. CityLearn is a recently developed environment that has been notably leveraged by many of the reviewed studies [75,76,78,123,129]. CityLearn is a python-based, open-source environment based on the OpenAI gym for conducting multi-agent RL-based BEMS simulations in cities. ...
Article
Full-text available
Owing to the high energy demand of buildings, which accounted for 36% of the global share in 2020, they are one of the core targets for energy-efficiency research and regulations. Hence, coupled with the increasing complexity of decentralized power grids and high renewable energy penetration, the inception of smart buildings is becoming increasingly urgent. Data-driven building energy management systems (BEMS) based on deep reinforcement learning (DRL) have attracted significant research interest, particularly in recent years, primarily owing to their ability to overcome many of the challenges faced by conventional control methods related to real-time building modelling, multi-objective optimization, and the generalization of BEMS for efficient wide deployment. A PRISMA-based systematic assessment of a large database of 470 papers was conducted to review recent advancements in DRL-based BEMS for different building types, their research directions, and knowledge gaps. Five building types were identified: residential, offices, educational, data centres, and other commercial buildings. Their comparative analysis was conducted based on the types of appliances and systems controlled by the BEMS, renewable energy integration, DR, and unique system objectives other than energy, such as cost, and comfort. Moreover, it is worth considering that only approximately 11% of the recent research considers real system implementations.
... We desire to provide an efficient and effective solution that would overcome the manual trading drawbacks by building a Trading Bot [5]. Our strategy employs three actor critic models: Advanced Actor Critic(A2C) [6], [7], Twin Delayed DDPG (TD3) [8], [9] and Soft Actor Critic (SAC) [10]. This trading bot which will be efficient enough to automatically trade on its own based on different market conditions and user approach throughout the time period with continuous modifications to ensure the best trade return for the specified period.. ...
Chapter
Full-text available
This paper proposes an automated trading strategy using reinforcement learning. The stock market has become one of the largest financial institutions. These institutions embrace machine learning solutions based on artificial intelligence for market monitoring, credit quality, fraud detection, and many other areas. We desire to provide an efficient and effective solution that would overcome the manual trading drawbacks by building a Trading Bot. In this paper, we will propose a stock trading strategy that uses reinforcement learning algorithms to maximize the profit. The strategy employs three actor critic models: Advanced Actor Critic(A2C), Twin Delayed DDPG (TD3) and Soft Actor Critic (SAC). Our strategy picks the most optimal model based on the current market situation. The performance of our trading bot is evaluated and compared with Markowitz portfolio theory.
... CityLearn has been used extensively as a reference environment to demonstrate incentive-based DR [6], collaborative DR [17], coordinated energy management [30,22], or benchmarking RL algorithms [9,32]. ...
Article
Building upon prior research that highlighted the need for standardizing environments for building control research, and inspired by recently introduced challenges for real life reinforcement learning (RL) control, here we propose a non-exhaustive set of nine real world challenges for RL control in grid-interactive buildings (GIBs). We argue that research in this area should be expressed in this framework in addition to providing a standardized environment for repeatability. Advanced controllers such as model predictive control (MPC) and RL control have both advantages and disadvantages that prevent them from being implemented in real world problems. Comparisons between the two are rare, and often biased. By focusing on the challenges, we can investigate the performance of the controllers under a variety of situations and generate a fair comparison. As a demonstration, we implement the offline learning challenge in CityLearn, an OpenAI Gym environment for the easy implementation of RL agents in a demand response setting to reshape the aggregated curve of electricity demand by controlling the energy storage of a diverse set of buildings in a district, and study the impact of different levels of domain knowledge and complexity of RL algorithms. We show that the sequence of operations utilized in a rule based controller (RBC) used for offline training affects the performance of the RL agents when evaluated on a set of four energy flexibility metrics. Longer offline learning from an optimized RBC leads to improved performance in the long run. RL agents that learn from a simplified RBC risk poorer performance as the offline learning period increases. We also observe no impact on performance from information sharing amongst agents. We call for a more interdisciplinary effort of the research community to address the real world challenges, and unlock the potential of GIB controllers.
... SAC has recently been applied to multi-agent problems [33,34], but the advantages compared to other algorithms were not highlighted. ...
Article
Full-text available
Controlling heating, ventilation and air-conditioning (HVAC) systems is crucial to improving demand-side energy efficiency. At the same time, the thermodynamics of buildings and uncertainties regarding human activities make effective management challenging. While the concept of model-free reinforcement learning demonstrates various advantages over existing strategies, the literature relies heavily on value-based methods that can hardly handle complex HVAC systems. This paper conducts experiments to evaluate four actor-critic algorithms in a simulated data centre. The performance evaluation is based on their ability to maintain thermal stability while increasing energy efficiency and on their adaptability to weather dynamics. Because of the enormous significance of practical use, special attention is paid to data efficiency. Compared to the model-based controller implemented into EnergyPlus, all applied algorithms can reduce energy consumption by at least 10% by simultaneously keeping the hourly average temperature in the desired range. Robustness tests in terms of different reward functions and weather conditions verify these results. With increasing training, we also see a smaller trade-off between thermal stability and energy reduction. Thus, the Soft Actor Critic algorithm achieves a stable performance with ten times less data than on-policy methods. In this regard, we recommend using this algorithm in future experiments, due to both its interesting theoretical properties and its practical results.
... There have been a few recent examples where the SAC algorithm has been applied to the building energy management problem. Kathirgamanathan et al. [29] and Pinto et al. [30] used the algorithm to optimally control a cluster of buildings using the CityLearn environment [15]. The CityLearn environment is an OpenAI environment which allows the control of domestic hot water and chilled water storage in a district environment. ...
Preprint
Full-text available
This research is concerned with the novel application and investigation of `Soft Actor Critic' (SAC) based Deep Reinforcement Learning (DRL) to control the cooling setpoint (and hence cooling loads) of a large commercial building to harness energy flexibility. The research is motivated by the challenge associated with the development and application of conventional model-based control approaches at scale to the wider building stock. SAC is a model-free DRL technique that is able to handle continuous action spaces and which has seen limited application to real-life or high-fidelity simulation implementations in the context of automated and intelligent control of building energy systems. Such control techniques are seen as one possible solution to supporting the operation of a smart, sustainable and future electrical grid. This research tests the suitability of the SAC DRL technique through training and deployment of the agent on an EnergyPlus based environment of the office building. The SAC DRL was found to learn an optimal control policy that was able to minimise energy costs by 9.7% compared to the default rule-based control (RBC) scheme and was able to improve or maintain thermal comfort limits over a test period of one week. The algorithm was shown to be robust to the different hyperparameters and this optimal control policy was learnt through the use of a minimal state space consisting of readily available variables. The robustness of the algorithm was tested through investigation of the speed of learning and ability to deploy to different seasons and climates. It was found that the SAC DRL requires minimal training sample points and outperforms the RBC after three months of operation and also without disruption to thermal comfort during this period. The agent is transferable to other climates and seasons although further retraining or hyperparameter tuning is recommended.
... Since its inception [10], CityLearn has been used by multiple researchers to control building microgrids. The authors of [24] implemented a centralized soft-actor critic deep RL approach for demand response in a micro-grid using CityLearn. In [25], the author proposed and implemented in CityLearn an energy pricing agent with the objective of shifting peak electricity demand towards periods of peak renewable energy generation. ...
Preprint
Full-text available
Rapid urbanization, increasing integration of distributed renewable energy resources, energy storage, and electric vehicles introduce new challenges for the power grid. In the US, buildings represent about 70% of the total electricity demand and demand response has the potential for reducing peaks of electricity by about 20%. Unlocking this potential requires control systems that operate on distributed systems, ideally data-driven and model-free. For this, reinforcement learning (RL) algorithms have gained increased interest in the past years. However, research in RL for demand response has been lacking the level of standardization that propelled the enormous progress in RL research in the computer science community. To remedy this, we created CityLearn, an OpenAI Gym Environment which allows researchers to implement, share, replicate, and compare their implementations of RL for demand response. Here, we discuss this environment and The CityLearn Challenge, a RL competition we organized to propel further progress in this field.
Article
This research is concerned with the novel application and investigation of ‘Soft Actor Critic’ based deep reinforcement learning to control the cooling setpoint (and hence cooling loads) of a large commercial building to harness energy flexibility. The research is motivated by the challenge associated with the development and application of conventional model-based control approaches at scale to the wider building stock. Soft Actor Critic is a model-free deep reinforcement learning technique that is able to handle continuous action spaces and which has seen limited application to real-life or high-fidelity simulation implementations in the context of automated and intelligent control of building energy systems. Such control techniques are seen as one possible solution to supporting the operation of a smart, sustainable and future electrical grid. This research tests the suitability of the technique through training and deployment of the agent on an EnergyPlus based environment of the office building. The agent was found to learn an optimal control policy that was able to minimise energy costs by 9.7% compared to the default rule-based control scheme and was able to improve or maintain thermal comfort limits over a test period of one week. The algorithm was shown to be robust to the different hyperparameters and this optimal control policy was learnt through the use of a minimal state space consisting of readily available variables. The robustness of the algorithm was tested through investigation of the speed of learning and ability to deploy to different seasons and climates. It was found that the agent requires minimal training sample points and outperforms the baseline after three months of operation and also without disruption to thermal comfort during this period. The agent is transferable to other climates and seasons although further retraining or hyperparameter tuning is recommended.
Article
Building controls are becoming more important and complicated due to the dynamic and stochastic energy demand, on-site intermittent energy supply, as well as energy storage, making it difficult for them to be optimized by conventional control techniques. Reinforcement Learning (RL), as an emerging control technique, has attracted growing research interest and demonstrated its potential to enhance building performance while addressing some limitations of other advanced control techniques, such as model predictive control. This study conducted a comprehensive review of existing studies that applied RL for building controls. It provided a detailed breakdown of the existing RL studies that use a specific variation of each major component of the Reinforcement Learning: algorithm, state, action, reward, and environment. We found RL for building controls is still in the research stage with limited applications (11%) in real buildings. Three significant barriers prevent the adoption of RL controllers in actual building controls: (1) the training process is time consuming and data demanding, (2) the control security and robustness need to be enhanced, and (3) the generalization capabilities of RL controllers need to be improved using approaches such as transfer learning. Future research may focus on developing RL controllers that could be used in real buildings, addressing current RL challenges, such as accelerating training and enhancing control robustness, as well as developing an open-source testbed and dataset for performance benchmarking of RL controllers.
Conference Paper
Demand response has the potential of reducing peaks of electricity demand by about 20% in the US, where buildings represent roughly 70% of the total electricity demand. Buildings are dynamic systems in constant change (i.e. occupants' behavior, refurbishment measures), which are costly to model and difficult to coordinate with other urban energy systems. Reinforcement learning is an adaptive control algorithm that can control these urban energy systems relying on historical and real-time data instead of models. Plenty of research has been conducted in the use of reinforcement learning for demand response applications in the last few years. However, most experiments are difficult to replicate, and the lack of standardization makes the performance of different algorithms difficult, if not impossible, to compare. In this demo, we introduce a new framework, CityLearn, based on the OpenAI Gym Environment, which will allow researchers to implement, share, replicate, and compare their implementations of reinforcement learning for demand response applications more easily. The framework is open source and modular, which allows researchers to modify and customize it, e.g., by adding additional storage, generation, or energy-consuming systems.
Article
Buildings account for about 40% of the global energy consumption. Renewable energy resources are one possibility to mitigate the dependence of residential buildings on the electrical grid. However, their integration into the existing grid infrastructure must be done carefully to avoid instability, and guarantee availability and security of supply. Demand response, or demand-side management, improves grid stability by increasing demand flexibility, and shifts peak demand towards periods of peak renewable energy generation by providing consumers with economic incentives. This paper reviews the use of reinforcement learning, a machine learning algorithm, for demand response applications in the smart grid. Reinforcement learning has been utilized to control diverse energy systems such as electric vehicles, heating ventilation and air conditioning (HVAC) systems , smart appliances, or batteries. The future of demand response greatly depends on its ability to prevent consumer discomfort and integrate human feedback into the control loop. Reinforcement learning is a potentially model-free algorithm that can adapt to its environment, as well as to human preferences by directly integrating user feedback into its control logic. Our review shows that, although many papers consider human comfort and satisfaction, most of them focus on single-agent systems with demand-independent electricity prices and a stationary environment. However, when electricity prices are modelled as demand-dependent variables, there is a risk of shifting the peak demand rather than shaving it. We identify a need to further explore reinforcement learning to coordinate multi-agent systems that can participate in demand response programs under demand-dependent electricity prices. Finally, we discuss directions for future research, e.g., quantifying how RL could adapt to changing urban conditions such as building refurbishment and urban or population growth.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Data-driven Predictive Control for Unlocking Building Energy Flexibility
  • Kathirgamanathan Anjukan