A high-level block diagram of the actor-critic reinforcement learning architecture is shown. This shows the general flow of state observations and reward signals between the algorithm and the environment, the critic's update and its value estimate, which is used by the policy in it's policy gradient updates. 

A high-level block diagram of the actor-critic reinforcement learning architecture is shown. This shows the general flow of state observations and reward signals between the algorithm and the environment, the critic's update and its value estimate, which is used by the policy in it's policy gradient updates. 

Source publication
Article
Full-text available
Energy optimization in buildings by controlling the Heating Ventilation and Air Conditioning (HVAC) system is being researched extensively. In this paper, a model-free actor-critic Reinforcement Learning (RL) controller is designed using a variant of artificial recurrent neural networks called Long-Short-Term Memory (LSTM) networks. Optimization of...

Context in source publication

Context 1
... control output is then interpreted as the room cooling temperature set point T c , which will be passed onto the simulation platform. Figure 8 shows the RL controller architecture setup. We see that the discrete control action output is computed by the actor, given the state observation. ...

Similar publications

Article
Full-text available
An investigation was carried out to develop a road network plan for the Sylhet Agricultural University (SAU). The road system of a university plays a crucial role in the campus stability. As the university is extending it's academic and structural dimension, so this attempt was carried to plan a commensurate road network through the campus to facil...
Preprint
Full-text available
Objective: To assess the effectiveness of aerosol filtration by portable air cleaning devices with high efficiency particulate air (HEPA) filters used in addition to standard building heating ventilation and air-conditioning (HVAC). Methods: Test rooms, including a hospital single-patient room, were filled with test aerosol to simulate aerosol move...
Preprint
Full-text available
Building loads consume roughly 40% of the energy produced in developed countries, a significant part of which is invested towards building temperature-control infrastructure. Therein, renewable resource-based microgrids offer a greener and cheaper alternative. This communication explores the possible co-design of microgrid power dispatch and buildi...
Article
Full-text available
Building operations represent a significant percentage of the total primary energy consumed in most countries due to the proliferation of Heating, Ventilation and Air-Conditioning (HVAC) installations in response to the growing demand for improved thermal comfort. Reducing the associated energy consumption while maintaining comfortable conditions i...
Article
Full-text available
Because of irreversibility on building construction, building energy efficiency design is more depended on simulation technology. Ministry of Housing and Urban-Rural Development of China (MOHURD) also stated that China's building energy consumption accounted for 27.5% of the total energy consumption in 2012. Energy consumption is simulated based on...

Citations

... Works that consider both temperature and humidity usually involve sophisticated measured of thermal comfort such as PMV [103] or PPD [104], which are calculated by the building energy simulator used as the environment. Qiu et al. [80] use wet bulb temperature, which is a function of temperature and humidity. ...
Article
Full-text available
Reinforcement learning has emerged as a potentially disruptive technology for control and optimization of HVAC systems. A reinforcement learning agent takes actions, which can be direct HVAC actuator commands or setpoints for control loops in building automation systems. The actions are taken to optimize one or more targets, such as indoor air quality, energy consumption and energy cost. The agent receives feedback from the HVAC systems to quantify how well these targets have been achieved. The feedback is captured by a reward function designed by the developer of the reinforcement learning agent. A few reviews have focused on the reward aspect of reinforcement learning applications for HVAC. However, there is a lack of reviews that assess how the actions of the reinforcement learning agent have been formulated, and how this impacts the possibilities to achieve various optimization targets in single zone or multi-zone buildings. The aim of this review is to identify the action formulations in the literature and to assess how the choice of formulation impacts the level of abstraction at which the HVAC systems are considered. Our methodology involves a search string in the Web of Science database and a list of selection criteria applied to each article in the search results. For each selected article, a three-tier categorization of the selected articles has been performed. Firstly, the applicability of the approach to buildings with one or more zones is considered. Secondly, the articles are categorized by the type of action taken by the agent, such as a binary, discrete or continuous action. Thirdly, the articles are categorized by the aspects of the indoor environment being controlled, namely temperature, humidity or air quality. The main result of the review is this three-tier categorization that reveals the community’s emphasis on specific HVAC applications, as well as the readiness to interface the reinforcement learning solutions to HVAC systems. The article concludes with a discussion of trends in the field as well as challenges that require further research.
... Visual object tracking has gained more attention in computer vision in the last decade than ever before. There have been numerous successful studies on various tracking benchmarks [18,19,26]. Classification-based trackers have also been proposed, which may be referred to as tracking-by-detection or tracking-by-classification [27][28][29][30][31]. ...
... In general, the classification model is trained offline using manually labeled pictures before being utilized for online or real-time tracking operations. Numerous neural network-based trackers utilize these concepts [11,16,32,33] throughout the development of their approach, and they provide effective outcomes when compared to classic trackers [12,13,34] and achieve state-of-the-art results [11,26]. The concept of using correlation filters to resolve the inadequate representation of convolutional and handcrafted features is retained [5,15,35]. ...
Article
Full-text available
The complexity of object tracking models among hardware applications has become a more in-demand task to accomplish with multifunctional algorithm skills in various indeterminable environment tracking conditions. Experimenting with the virtual realistic simulator brings new dependencies and requirements, which may cause problems while experimenting with runtime processing. The goal of this paper is to present an object tracking framework that differs from the most advanced tracking models by experimenting with virtual environment simulation (Aerial Informatics and Robotics Simulation—AirSim, City Environ) using one of the Deep Reinforcement Learning Models named as Deep Q-Learning algorithms. Our proposed network examines the environment using a deep reinforcement learning model to regulate activities in the virtual simulation environment and utilizes sequential pictures from the realistic VCE (Virtual City Environ) model as inputs. Subsequently, the deep reinforcement network model was pretrained using multiple sequential training image sets and fine-tuned for adaptability during runtime tracking. The experimental results were outstanding in terms of speed and accuracy. Moreover, we were unable to identify any results that could be compared to the state-of-the-art methods that use deep network-based trackers in runtime simulation platforms, since this testing experiment was conducted on the two public datasets VisDrone2019 and OTB-100, and achieved better performance among compared conventional methods.
... The deep deterministic policy (DDP) algorithm, as introduced by [35,36] in which the control scheme of the agent's action is utilized in the reward shaping for optimal scheduling to improve the convergence rate. In [37], the energy consumption of HVAC and thermal comfort are optimized based on the actor-critic deep RL framework. Deep actor-critic control has been fairly successfully applied to various policies of agents control systems [38,39]. ...
Article
The heating, ventilating and air conditioning (HVAC) systems energy demand can be reduced by manipulating indoor conditions within the comfort range, which relates to control performance and, simultaneously, achieves peak load shifting toward off-peak hours. Reinforcement learning (RL) is considered a promising technique to solve this problem without an analytical approach, but it has been unable to overcome the awkwardness of an extremely large action space in the real world; it would be quite hard to converge to a set point. The core of the problem with RL is its state space and action space of multi-agent action for building and HVAC systems that have an extremely large amount of training data sets. This makes it difficult to create weights layers accurately of the black-box model. Despite the efforts of past works carried out on deep RL, there are still drawback issues that have not been dealt with as part of the basic elements of large action space and the large-scale nonlinearity due to high thermal inertia. The hybrid deep clustering of multi-agent reinforcement learning (HDCMARL) has the ability to overcome these challenges since the hybrid deep clustering approach has a higher capacity for learning the representation of large space and massive data. The framework of RL agents is a greedy iterative trained and organized as a hybrid layer clustering structure to be able to deal with a non-convex, non-linear and non-separable objective function. The parameters of the hybrid layer are optimized by using the Quasi-Newton (QN) algorithm for fast response signals of agents. That is to say, the main motivation is that the state and action space of multi-agent actions for building HVAC controls are exploding, and the proposed method can overcome this challenge and achieve 32% better performance in energy savings and 21% better performance in thermal comfort than PID.
... In the context of building energy management, DRL control was applied to supply water temperature control for heating systems in office buildings [25,26], thermal storage charging and discharging [27], control of the operational parameters of a compression chiller [28], control of the temperature setpoint of a thermal storage unit [29], control of indoor temperature or humidity setpoint [30][31][32], regulation of heat fluxes provided to zones in commercial buildings [33], control of domestic water heaters [34]. ...
Article
Full-text available
This paper proposes a comparison between an online and offline Deep Reinforcement Learning (DRL) formulation with a Model Predictive Control (MPC) architecture for energy management of a cold-water buffer tank linking an office building and a chiller subject to time-varying energy prices, with the objective of minimizing operating costs. The intrinsic model-free approach of DRL is generally lost in common implementations for energy management, as they are usually pre-trained offline and require a surrogate model for this purpose. Simulation results showed that the online-trained DRL agent, while requiring an initial 4 weeks adjustment period achieving a relatively poor performance (160% higher cost), it converged to a control policy almost as effective as the model-based strategies (3.6% higher cost in the last month). This suggests that the DRL agent trained online may represent a promising solution to overcome the barrier represented by the modelling requirements of MPC and offline-trained DRL approaches.
... Owing to the above-mentioned attributes of the RL methodology, it has been employed in various process control and optimization problems ( Wang et al., 2017;Pandian and Noel, 2018;Ma et al., 2019;Lambert et al., 2019;Shafi et al., 2020;Pi et al., 2020;Zhu et al., 2020;Powell et al., 2020;Li et al., 2021;Bao et al., 2021;Dogru et al., 2021c ). Several works have reported the use of RL for PID tuning in simulated environments ( Brujeni et al., 2010;El Hakim et al., 2013;Kofinas and Dounis, 2019;Sun et al., 2019;Shipman and Coetzee, 2019;Sedighizadeh and Rezazadeh, 2008;Brujeni et al., 2010;Carlucho et al., 2017 ). ...
Article
Many industrial processes utilize proportional-integral-derivative (PID) controllers due to their practicality and often satisfactory performance. The proper controller parameters depend highly on the operational conditions and process uncertainties. This study combines the recent developments in computer sciences and control theory to address the tuning problem. It formulates the PID tuning problem as a reinforcement learning task with constraints. The proposed scheme identifies an initial approximate step-response model and lets the agent learn dynamics off-line from the model with minimal effort. After achieving a satisfactory training performance on the model, the agent is fine-tuned on-line on the actual process to adapt to the real dynamics, thereby minimizing the training time on the real process and avoiding unnecessary wear, which can be beneficial for industrial applications. This sample efficient method is tested and demonstrated through a pilot-scale multi-modal tank system. The performance of the method is verified through setpoint tracking and disturbance regulatory experiments.
... When the number of weather input parameters is reduced, the developer can perform faster experiments at the cost of accuracy: The identification of reference days always introduces simplifications [95]. With the same occurrence of annual tests-34%-another approach is to test the hottest and coldest days recorded in the weather data, or both, as in [40,65,90,96]. Detailed information on how reference days are selected is absent. ...
... [92,101] investigate the application of a control to a data center; Du et al. [90] investigate its application to an airport; and Glorennec et al. [42] investigate its application to a highly glazed building. Moreover, as also highlighted in [29], the tested construction is, in multiple cases, equal to an building affiliated with the authors [33,47,51,89,96]. Nevertheless, the adoption of specific buildings hinders the generalization of the results to different markets. ...
... In addition to reporting the performance metrics, it is common to include some plots showing the profile of the controlled quantities and some related parameters, such as heating or cooling power [33,35,36,44,72,86], storage state of charge [88], PMV [11,62,63,69], fuel tariff [58,87], or environmental variables [96]. Indoor air temperature profiles are given in all the analyzed studies and are used to demonstrate the adherence to comfort requirements, especially when a comfort metric is not computed [17,33,43,61,82,88,91]. ...
Article
Full-text available
In the last few decades, researchers have shown that advanced building controllers can reduce energy consumption without negatively impacting occupants’ wellbeing and help to manage building systems, which are becoming increasingly complex. Nevertheless, the lack of benefit awareness and demonstration projects undermines stakeholders’ trust, justifying the reluctance to approve new controls in the industry. Therefore, it is necessary to support the development of controls through solid arguments testifying to the performance gain that can be achieved. However, the absence of standardized and systematic testing methods limits the generalization of results and the ability to make fair cross-study comparisons. This study presents an overview of the different benchmarking approaches used to assess control performance. Our goal is to highlight trends, limitations, and controversies through analytics to support the definition of best practices, which remains a widely discussed topic in this research area. We aim to focus on simulation-based benchmarking, which is regarded as a promising solution to overcome the time and cost requirements related to field or hardware-in-the-loop testing. We identify and investigate four key steps relating to virtual benchmarking: defining the key performance indicators, specifying the reference control, characterizing the test scenarios, and post-processing the results. This work confirmed the expected heterogeneity, underlined recurrent features with the help of analytics, and recognized limits and open challenges.
... Given the scope of this work, in this section we focus on reviewing the growing body of research using RL-based algorithms to control buildings and DERs [13,15,25,26]. Researchers used DRL controllers to achieve different objectives: authors of [27][28][29][30][31][32][33][34][35][36] aimed at improving thermal comfort of building occupants while minimizing energy consumption, whereas others [37][38][39][40][41][42][43][44][45][46][47] aimed at improving energy efficiency and reducing cost. Some recent work also has looked at using DRL for better load management through peak reduction, load shifting, and better scheduling [18,48,49]. ...
... Some recent work also has looked at using DRL for better load management through peak reduction, load shifting, and better scheduling [18,48,49]. The actions taken by DRL controllers include generating supervisory control setpoints for the HVAC systems (e.g., variable air volume [VAV] boxes, chiller plants, room thermostats), such as temperature [27,28,[30][31][32][33][37][38][39]41,42,49], flow rate [29,31,42,49] and fan speed setpoints [45]. Supervisory control of electrical batteries by setting the charge/discharge rate [35,44,46,47] is also common. ...
... The reward functions used in the literature typically track the objective of the DRL algorithm, hence penalizing occupant discomfort and rewarding for energy savings (and corresponding cost savings) [18,[27][28][29][30][31][32][33][34][35][37][38][39][40][41][42][43][45][46][47][48][49]. Control of on-site energy storage often includes an additional penalty to discourage battery depreciation [18,44,47]. ...
Article
Behind-the-meter distributed energy resources (DERs), including building solar photovoltaic (PV) technology and electric battery storage, are increasingly being considered as solutions to support carbon reduction goals and increase grid reliability and resiliency. However, dynamic control of these resources in concert with traditional building loads, to effect efficiency and demand flexibility, is not yet commonplace in commercial control products. Traditional rule-based control algorithms do not offer integrated closed-loop control to optimize across systems, and most often, PV and battery systems are operated for energy arbitrage and demand charge management, and not for the provision of grid services. More advanced control approaches, such as MPC control have not been widely adopted in industry because they require significant expertise to develop and deploy. Recent advances in deep reinforcement learning (DRL) offer a promising option to optimize the operation of DER systems and building loads with reduced setup effort. However, there are limited studies that evaluate the efficacy of these methods to control multiple building subsystems simultaneously. Additionally, most of the research has been conducted in simulated environments as opposed to real buildings. This paper proposes a DRL approach that uses a deep deterministic policy gradient algorithm for integrated control of HVAC and electric battery storage systems in the presence of on-site PV generation. The DRL algorithm, trained on synthetic data, was deployed in a physical test building and evaluated against a baseline that uses the current best-in-class rule-based control strategies. Performance in delivering energy efficiency, load shift, and load shed was tested using price-based signals. The results showed that the DRL-based controller can produce cost savings of up to 39.6% as compared to the baseline controller, while maintaining similar thermal comfort in the building. The project team has also integrated the simulation components developed during this work as an OpenAIGym environment and made it publicly available so that prospective DRL researchers can leverage this environment to evaluate alternate DRL algorithms.
... Regarding the energy sector, Ruelens et al. [31] highlighted that typical environments in energy management are not fully observable and encoded past observations into an autoencoder. Wang et al. [36] used LSTM in an RL actor-critic algorithm, whereas Zhang et al. [39] used LSTM in a model-based RL algorithm to learn environmental dynamics. Sequence-to-sequence models [6][7][8] and Bayesian networks [16,28] were applied to make predictions in model predictive control. ...
Conference Paper
Automatic control of energy systems is affected by the uncertainties of multiple factors, including weather, prices and human activities. The literature relies on Markov-based control, taking only into account the current state. This impacts control performance, as previous states give additional context for decision making. We present two ways to learn non-Markovian policies, based on recurrent neural networks and variational inference. We evaluate the methods on a simulated data centre HVAC control task. The results show that the off-policy stochastic latent actor-critic algorithm can maintain the temperature in the predefined range within three months of training without prior knowledge while reducing energy consumption compared to Markovian policies by more than 5%.
... These studies have already proved the feasibility and applicability of RL approaches to the various problems in the contexts of DR and BEMS. For example, [24][25][26] developed RL-based controllers to control HVAC systems. On the other hand, water heater control still remains largely untapped, except for only a few studies [17]. ...
Article
Full-text available
Electric water heaters represent 14% of the electricity consumption in residential buildings. An average household in the United States (U.S.) spends about USD 400–600 (0.45 ¢/L–0.68 ¢/L) on water heating every year. In this context, water heaters are often considered as a valuable asset for Demand Response (DR) and building energy management system (BEMS) applications. To this end, this study proposes a model-free deep reinforcement learning (RL) approach that aims to minimize the electricity cost of a water heater under a time-of-use (TOU) electricity pricing policy by only using standard DR commands. In this approach, a set of RL agents, with different look ahead periods, were trained using the deep Q-networks (DQN) algorithm and their performance was tested on an unseen pair of price and hot water usage profiles. The testing results showed that the RL agents can help save electricity cost in the range of 19% to 35% compared to the baseline operation without causing any discomfort to end users. Additionally, the RL agents outperformed rule-based and model predictive control (MPC)-based controllers and achieved comparable performance to optimization-based control.
... After determining the optimal combination of α and c on datasets A and B, the proposed algorithm is compared with user-based collaborative filtering algorithm [20], item-based collaborative filtering algorithm [21], user-based K-nearest neighbor recommendation algorithm [22], and trust-based matrix decomposition recommendation algorithm [23,24], respectively. e comparison results of RMSE are shown in Figure 6. ...
Article
Full-text available
At present, there is a serious disconnect between online teaching and offline teaching in English MOOC large-scale hybrid teaching recommendation platform, which is mainly due to the problems of cold start and matrix sparsity in the recommendation algorithm, and it is difficult to fully tap the user's interest characteristics because it only considers the user's rating but neglects the user's personalized evaluation. In order to solve the above problems, this paper proposes to use reinforcement learning thought and user evaluation factors to realize the online and offline hybrid English teaching recommendation platform. First, the idea of value function estimation in reinforcement learning is introduced, and the difference between user state value functions is used to replace the previous similarity calculation method, thus alleviating the matrix sparsity problem. The learning rate is used to control the convergence speed of the weight vector in the user state value function to alleviate the cold start problem. Second, by adding the learning of the user evaluation vector to the value function estimation of the state value function, the state value function of the user can be estimated approximately and the discrimination degree of the target user can be reflected. Experimental results show that the proposed recommendation algorithm can effectively alleviate the cold start and matrix sparsity problems existing in the current collaborative filtering recommendation algorithm and can dig deep into the characteristics of users' interests and further improve the accuracy of scoring prediction.