Article

Online pricing of demand response based on long short-term memory and reinforcement learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Incentive-based demand response is playing an increasingly important role in ensuring the safe operation of the power grid and reducing system costs, and advances in information and communications technology have made it possible to implement it online. However, in regions where incentive-based demand response has never been implemented, the response behavior of customers is unknown, in this case, how to quickly and accurately set the incentive price is a challenge for service providers. This paper proposes a pricing method that combines long short-term memory networks and reinforcement learning to solve the pricing problem of service providers when the customers’ response behavior is unknown. Taking the total profit of all response time slots in one day as the optimization goal, long and short-term memory networks are used to learn the relationship between customers’ response behavior and incentive price, and reinforcement learning is used to explore and determine the optimal price. The results show that the combination of these two methods can perform virtual exploration of the optimal price, which solves the disadvantage that reinforcement learning can only rely on delayed rewards to perform exploration in the real scene, thereby speeding up the process of setting the optimal price. In addition, because the influence of the incentive prices combination of different time slots on the profit of the service provider is considered, the negative effect of myopia optimization is avoided.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this paper, we focus on pricebased incentives: a system operator broadcasts prices, users respond by adjusting their consumption to minimize their individual costs, the operator adjusts the prices based on the user responses, etc. Ideally, this iterative interaction should converge to an optimal solution that balances user cost and system performance. The major obstacle is that the operator typically lacks access to users' cost functions, either due to privacy concerns or because users themselves rely on complex or black-box control strategies (e.g., reinforcement learning) [6]- [8]. This limits the effectiveness of many pricing schemes and makes theoretical analysis difficult. ...
... Based on (8), we now illustrate the incentive pricing mechanism in more detail. As shown in Fig. 1, we consider a two-timescale design of incentive pricing mechanism, where individual users solve (4) for x * i (p i ) much faster than the system operator updates the price ⃗ p via (8). This timescale separation allows users to consider the price signal ⃗ p as static when solving for ⃗ ...
... x * (⃗ p) almost immediately by solving (5) and then the system operator updates the price ⃗ p according to (8) in response to the current demand profile ⃗ x * (⃗ p). It should be intuitively clear that (8) provides users with incentives to align their own interests with social welfare, given that adjustments to ⃗ p intend to reduce the difference between the marginal cost of the individual user quantified by ⃗ p and the marginal cost of the system operator characterized by ∂J(⃗ x * (⃗ p)). ...
Preprint
Full-text available
Incentive-based coordination mechanisms for distributed energy consumption have shown promise in aligning individual user objectives with social welfare, especially under privacy constraints. Our prior work proposed a two-timescale adaptive pricing framework, where users respond to prices by minimizing their local cost, and the system operator iteratively updates the prices based on aggregate user responses. A key assumption was that the system cost need to smoothly depend on the aggregate of the user demands. In this paper, we relax this assumption by considering the more realistic model of where the cost are determined by solving a DCOPF problem with constraints. We present a generalization of the pricing update rule that leverages the generalized gradients of the system cost function, which may be nonsmooth due to the structure of DCOPF. We prove that the resulting dynamic system converges to a unique equilibrium, which solves the social welfare optimization problem. Our theoretical results provide guarantees on convergence and stability using tools from nonsmooth analysis and Lyapunov theory. Numerical simulations on networked energy systems illustrate the effectiveness and robustness of the proposed scheme.
... Based on the technique, nonschedulable load forecasting model was developed in order to enhance the effectiveness and performance of the energy management system [8]. The authors in [9] used long shortterm memory networks and reinforcement learning to solve the pricing problem, which speeded up the process of setting the optimal price. Fuzzy reinforcement encoder adversarial neural network was used to categorize the tracked energy data [10], whereas the data mining methods, such as stationary data stream mining, could be used to improve the situational awareness in the smart grid [11]. ...
... Forecasting sequence of energy consumption [8]. Customers' behavior and incentive pricing relationship learning [9] Evaluation of operating state for smart meter [12] Meter operation error detection [24] Clustering A technique to cluster items in different partition according to the distance form cluster centroids. This method is flexible in application. ...
... Determine the optimal price [9] Categorise the tracked data [10] Stream Mining Algorithms A technique to extract knowledge structures from continuous and rapid data set. It has strong real-time processing capabilities, while the algorithm's decision-making process is difficult to explain. ...
Article
Full-text available
Smart meter is a key component in the energy grid. In order to detect the abnormal smart meter in a topological low voltage energy system, the orthogonal matching pursuit and Bollinger Band followed by the recursive model are proposed in this paper. Our model consists of two major components, including a data filter and an error estimation module. The data filter employs orthogonal matching pursuit and Bollinger Band to identify abnormal data from the meter reading data set. The data, which is significantly different from the orthogonal matching pursuit recover estimate and Bollinger bands, will be classified as abnormal data and removed from the data set. Then a recursive model is proposed to calculate the meter error. The meter error can be obtained by solving a linear equation constructed from the meter reading data. The experimental results present the performance under different scenarios. When the number of submeters is less than 100 and the line loss rate in the system is less than 8%, the accuracy of error estimation is higher than 90%. Overall, the proposed error estimation method provides a new idea to detect a set of smart meters by estimating the meter error. In the experimental results, the average absolute error and root mean square error obtained by our method are 1.30% and 1.09%, respectively, which are the lowest values compared with the classical methods. This suggests that our method has a distinct advantage, which provides higher practicability and efficiency, compared with the traditional on-site inspection and the machine learning techniques.
... However, the DSO and MGs are owned by different organizations, therefore the DSO usually has no authority to directly command the MGs. An effective way is to motivate the MGs through pricing signals [3], while the real-time prices could be applied to handle the high uncertainty of loads and renewable energy sources [4]. Moreover, the MGs usually require a reference price sequence for a long time horizon (e.g. ...
... In recent years, RL has been successfully applied to power systems [8]. Online pricing algorithms of demand response and for MGs based on RL are developed in [4], [5]. However, in this paper, the DSO agent decides the price sequence for a much longer time horizon at each time instant. ...
... To solve the pricing problem, the RL algorithm is applied to solve dynamic pricing and energy consumption scheduling problem in [38]. An RL-based method for online pricing of demand response is developed in [4]. An RL-based gametheoretic approach is developed to solve the pricing problem for networked MGs in [39]. ...
Article
Full-text available
Coordinating the microgrids (MGs) in the distribution network is a critical task for the distribution system operator (DSO), which could be achieved by setting prices as incentive signals. The high uncertainty of loads and renewable resources motivates the DSO to adopt real-time prices. The MGs require reference price sequences for a long time horizon in advance to make generation plans. However, due to privacy concerns in practice, the MGs may not provide adequate information for the DSO to build a closed-form model. This causes challenges to the implementation of the conventional model-based methods. In this paper, the framework of the coordination system through real-time prices is proposed. In this bi-level framework, the DSO sets real-time reference price sequences as the incentive signals, based on which the MGs make the generation and charging plan. The model-free reinforcement learning (RL) is applied to optimize the pricing policy when the response behavior of the MGs is unknown to the DSO. To deal with the large action space of this problem, the reference policy is incorporated into the RL algorithm for efficiency improvement. The numerical result shows that the minimized cost obtained by the developed model-free RL algorithm is close to the model-based method while the private information is preserved.
... In recent years, a novel approach using artificial intelligence algorithms [13][14][15] has been introduced for DR potential assessment. Grounded in the long short-term memory neural network, a study [14] introduces a methodology for predicting the adjustable potential of DR, hinging on the elastic incentive price. ...
... In recent years, a novel approach using artificial intelligence algorithms [13][14][15] has been introduced for DR potential assessment. Grounded in the long short-term memory neural network, a study [14] introduces a methodology for predicting the adjustable potential of DR, hinging on the elastic incentive price. In [15], a combination of reinforcement learning and the regional randomization method is employed to assess the potential of electric water heaters. ...
Article
Full-text available
Demand response (DR) can ensure electricity supply security by shifting or shedding loads, which plays an important role in a power system with a high proportion of renewable energy sources. Industrial loads are vital participants in DR, but it is difficult to assess DR potential because of many complex factors. In this paper, a new method based on fuzzy control is given to assess the DR potential of industrial loads. A complete assessment framework including four steps is presented. Firstly, the industrial load data are preprocessed to mitigate the influence of noisy and transmission losses, and then the K-means algorithm considering the optimal cluster number is used to calculate baseline load of industrial load. Subsequently, an open-loop fuzzy controller is designed to predict the response factor of different industrial loads. Three strongly correlated indicators, namely peak load rate, electricity intensity, and load flexibility, are selected as the input of fuzzy control, which represents response willingness. Finally, the baseline load of diverse clustering scenarios and the response factor are used to calculate the DR potential of different industrial loads. The proposed method takes into account both economic and technical factors comprehensively, and thus, the results better represent the available DR potential in real-world situations. To demonstrate the effectiveness of the proposed method, the case of a medium-sized city in China is studied. The simulation focuses on the top eight industrial types, and the results show they can contribute about 189 MW available DR potential.
... However, because of the complexity of the relationships between the impacting variables, the mathematical model is unable to fully capture these relationships. In the context of the data-driven assessment methodology, many scholars have studied advanced algorithms to predict user response potential, such as Shi R. et al. [25] constructing a support vector machine model, Kong X. et al. [26] constructing a mixed density network model, Shirsat A. et al. [27] and Kong, X. et al. [28] both constructing neural network algorithms, and Zhang Y. et al. [29] proposing a distributed modeling method based on the fully distributed alternating direction multiplier method. They have different choices of input variables for the model. ...
... Therefore, this article takes the PCPUO into account based on six high-frequency CVs and ultimately selects seven CVs as candidate CVs. According to the DR implementation plan released by China and the related literature, the majority of the literature takes into account the subsidy price factor [28,29,31]. As a result, this article will prioritize the consideration of subsidy price, while other influencing factors will be determined through the RF feature selection algorithm. ...
Article
Full-text available
In China, the inversion between peak periods of wind and photovoltaic (PV) power (WPVP) generation and peak periods of electricity demand leads to a mismatch between electricity demand and supply, resulting in a significant loss of WPVP. In this context, this article proposes an improved demand response (DR) strategy to enhance the consumption of WPVP. Firstly, we use feature selection methods to screen variables related to response quantity and, based on the results, establish a response potential prediction model using random forest algorithm. Then, we design a subsidy price update formula and the subsidy price constraint conditions that consider user response characteristics and predict the response potential of users under differentiated subsidy price. Subsequently, after multiple iterations of the price update formula, the final subsidy and response potential of the user can be determined. Finally, we establish a user ranking sequence based on response potential. The case analysis shows that differentiated price strategy and response potential prediction model can address the shortcomings of existing DR strategies, enabling users to declare response quantity more reasonably and the grid to formulate subsidy price more fairly. Through an improved DR strategy, the consumption rate of WPVP has increased by 12%.
... The accuracy of the electricity load forecasting has a significant role in the power system. Two main approaches in forecasting are statistical models and machine learning models [4][5][6][7][8][9][10]. Researchers are interested in providing comprehensive models to achieve better performance than the numerous existing models. ...
... RL solves decision-making problems while LSTM is for forecasting problems. These two models were combined by Konga et al. [7] to solve the online electricity pricing problem. The problem is that to adjust the customer's electricity usage. ...
Article
Full-text available
Electricity load forecasting is an essential operation of the power system. Deep learning is used to improve accurate electricity load forecasting. In this study, combining Long short-term memory and reinforcement learning are proposed to encourage the advantage of a single approach for forecasting. Importance input features, including the mutual feature of electricity load, are used to increase accuracy. First, multi-time series input can handle by Long short-term memory and the addition of features supports to the load feature will make the model better efficient. Because the LSTM model is quite complex, choosing a good set of hyperparameters is difficult. Therefore, the purpose of using reinforcement learning is to optimize hyper-parameters of the Long short-term memory model. The proposed model is the combination of Long-short term memory and reinforcement learning. The proposed model will be applied in two electricity load data sets, the real-life data of Vietnam Electricity and the other public data set. In one day ahead forecasting, the proposed model archives superior performance than the benchmark.
... The application of RL in a power system is reviewed in [19], [20], and it has been used as a promising alternative to solve the decision-making optimization problem in electricity markets [14]. A conventional RL algorithm, called Q-learning, has been employed to solve the optimal pricing or incentive rating problem [21]- [23]. Qlearning is realized by look-up tables, therefore it will suffer severely from the curse of dimensionality, when solving largescale optimization problems [24]. ...
... However, some existing works, e.g. [16], [17], [22], [23], and [26], ignore the time series information in the environmental state. Even though the authors in [13] introduced such time series information in the environmental state, they did not design an effective representation network to deal with this important information. ...
Article
Full-text available
The joint optimization problem of energy procurement and retail pricing for an electricity retailer is converted into separately determining the optimal procurement strategy and optimal pricing strategy, under the "price-taker" assumption. The aggregate energy consumption of end use customers (EUCs) is predicted to solve for the optimal procurement strategy vis a long short-term memory (LSTM)-based supervised learning method. The optimal retail pricing problem is formulated as a Markov decision process (MDP), which can be solved by using deep reinforcement learning (DRL) algorithms. However, the performance of existing DRL approaches may deteriorate due to their insufficient ability to extract discriminative features from the time-series vectors in the environmental states. We propose a novel deep deterministic policy gradient (DDPG) network structure with a shared LSTM-based representation network that fully exploits the Actor's and Critic's losses. The designed shared representation network and the joint loss function can enhance the environment perception capability of the proposed approach and further improve the optimization performance, resulting in a more profitable pricing strategy. Numerical simulations demonstrate the effectiveness of the proposed approach.
... Strategic behavior on the consumer side is also studied by Kong et al. (2020), utilizing long short-term memory networks (LSTMs) to predict a customer's response in a RL market simulation framework. In their conclusion, they highlight that the blind exploration of RL is not sufficient to learn daily changes in their practical model of load reduction. ...
Article
Full-text available
Revenue management (RM) plays a vital role to optimize sales processes in real-life applications under incomplete information. The prediction of consumer demand and the anticipation of price reactions of competitors became key factors in RM to be able to apply classical dynamic programming (DP) methods for expected long-term reward maximization. Modern model-free deep Reinforcement Learning (RL) approaches are able to derive optimized policies without explicit estimations of underlying model dynamics. However, RL algorithms typically require either vast amounts of training data or a suitable synthetic model to be trained on. As existing studies focus on one group of algorithms only, the relation between established DP approaches and new RL techniques is opaque. To address this issue, in this paper, we use a dynamic pricing framework for an airline ticket market to compare state-of-the-art RL algorithms and data-driven versions of classic DP methods regarding (i) performance and (ii) required data to each other. For the DP techniques, we use estimations of market dynamics to be able to compare their performance and data consumption against RL methods. The numerical results of our experiments, which include monopoly as well as duopoly markets, allow to study how the different approaches’ performances relate to each other in exemplary settings. In both setups, we find that with few data (about 10 episodes) fitted DP methods were highly competitive; with medium amounts of data (about 100 episodes) DP methods got outperformed by RL, where PPO provided the best results. Given large amounts of training data (about 1000 episodes), the best RL algorithms, i.e., TD3, DDPG, PPO, and SAC, performed similarly achieving about 90% and more of the optimal solution.
... Simulation results demonstrated that this approach could use price to maximize charging facility utilization while ensuring an appropriate quality of service. Kong et al. (2020) combined long short-term memory (LSTM) networks with RL in a smart grid pricing problem. LSTM was able to learn the customer behaviour response, speeding up the price optimization and facilitating effective price exploration. ...
Article
We consider how to facilitate a dynamically-priced premium service option that enables customer parties to shorten their wait in a queue. Offering such a service requires that some of a business’s capacity be reserved continuously and kept ready for premium customers. In tandem with capacity reservation, pricing must be coordinated. Hence, a joint dynamic pricing and capacity allocation problem lies at the heart of this service. We propose a conceptual solution architecture and employ Proximal Policy Optimization (PPO) for dynamic pricing and capacity allocation to maximize total revenue. Simulation experiments over multiple scenarios compared PPO against a human-engineered policy and a baseline policy having no premium option. The human-engineered policy led to significantly greater revenues than the baseline policy in each scenario, illustrating the potential increase in revenues afforded by this concept. The PPO agent substantially improved upon the human-engineered policy advantage, with improvements ranging from 28% to 161%.
... Compared to the model-based methods, model-free deep reinforcement learning (DRL) can perceive high-dimensional state through deep neural networks (DNNs) and can make near-optimal decisions without relying on prior knowledge and under the conditions of incomplete environmental information [13], [14]. In [15], the optimal price of an incentive-based DR (IBDR) is solved by DRL-based method with long and short-term memory (LSTM) networks. In [16], the deep deterministic policy gradient (DDPG) algorithm is applied to solve the dynamic pricing strategy of an electric vehicle aggregator (EVA) in the DR market and spot market. ...
Article
Full-text available
The increasing penetration of renewable energy sources poses significant challenges for modern power systems, particularly in supply-demand balance and peak regulation. Load aggregators (LAs) play a crucial role by integrating small to medium-sized loads and coordinating demand response (DR). This study introduces a joint bidding and incentivizing model for a price-maker LA participating in a peak-regulation ancillary service market (PRM) and developing an incentive-based demand response (IBDR), where the LA’s objective is to maximize its long-term cumulative payoff. In order to solve this complex joint decision-making optimization problem more effectively and efficiently, a model-free multi-task multi-agent deep reinforcement learning-based (MTMA-DRL-based) method incorporating a shared, centralized prioritized experience replay buffer (PERB) is proposed. Case studies in real-world settings confirm that the proposed model effectively captures the interdependence between bidding price, bidding quantity, and incentive price decisions. The proposed MTMA-DRL-based method is also proven to outperform existing methods.
... Reference [18] considered uncertainties in load demand, spot prices, electricity user privacy, and line flow and proposed an optimization algorithm for residential user demand response using deep reinforcement learning. Reference [19] proposed an optimization solution that combines long-term memory networks and reinforcement learning algorithms to design demand response incentive strategies for electricity retailers when user response behavior is unknown. Reference [20] introduced an incentive-based real-time compensation setting strategy using deep learning and reinforcement learning algorithms to improve the operational security and economics of the power system. ...
Article
Full-text available
An integrated zero-carbon power plant aggregates uncontrollable green energy, adjustable load, and storage energy resources into an entity in a grid-friendly manner. Integrated zero-carbon power plants have a strong demand response potential that needs further study. However, existing studies ignore the green value of renewable energy in power plants when participating in demand response programs. This paper proposed a mathematical model to optimize the operation of an integrated zero-carbon power plant considering the green value. A demand response mechanism is proposed for the independent system operator and the integrated zero-carbon power plants. The Stackelberg gaming process among these entities and an algorithm based on dichotomy are studied to find the demand response equilibrium. Case studies verify that the mechanism activates the potential of the integrated zero-carbon power plant to realize the load reduction target.
... Research suggests using price incentives and deep learning algorithms to improve load profiles and consumer response depths [13,14]. Techniques like recurrent neural networks and resilient distributed controllers mitigate risks like fake data injection attacks [15][16][17]. Price-based DSR, the participation of retailers with energy storage systems, and innovative schemes like coupon-based response have been explored as well [18][19][20]. Additionally, incorporating consumer psychology and blockchain technology offers promising avenues for advancing DSR programs, ensuring their effectiveness and sustainability [21,22]. ...
Article
Full-text available
With the energy structure transition and the development of the green power market, the role of electricity retailers in multi-market green power trading has become more and more important. Particularly in China, where aggressive green energy policies and rapid market transformations provide a distinct context for such studies, the challenges are pronounced. Under demand-side response, electricity retailers face the uncertainty of users’ electricity consumption and incentives, which complicates decision-making processes. The purpose of this paper is to explore the optimization decision-making problem of multi-market green power trading for electricity retailers under demand-side response, with a special focus on the Chinese market due to its leadership in implementing substantial green energy initiatives and its potential to set precedents for global practices. We first construct a two-party benefit optimization model, which comprehensively considers the profit objectives for electricity retailers and utility maximization for users. Then, the model is solved by the Lagrange multiplier method and distributed subgradient algorithm to obtain the optimal solution. Finally, the effectiveness of the incentive optimization strategy under the multi-market to promote green power consumption and improve the profit of electricity retailers is verified by arithmetic simulation. The results of this study show that the incentive optimization strategy under multi-market, particularly within the Chinese context, is expected to provide a reference for electricity retailers to develop more flexible and effective trading strategies in the green power market and to contribute to the process of promoting green power consumption globally.
... Multiple fields, such as energy markets, insurance industries, and sponsored search auctions, regard pricing problems as a series of decision-making problems [25]- [29]. The trading process can be modeled as MDPs or constrained MDPs [27], [28], and various algorithms like Q-learning [25], deep deterministic policy gradient (DDPG) [27], long short-term memory(LSTM) [26] are introduced to support the construction of pricing mechanism. Many numerical simulations and experimental validation demonstrate that RL-based mechanisms show high performance and robustness in solving complex pricing problems, achieving fairness and revenue maximum by setting proper objectives [25], [28]. ...
... RL has the advantage of requiring no prior knowledge of the system dynamics and can be adopted in a model-free manner allowing for easier implementation in a practical setting compared to conventional optimization approaches. Moreover, deep reinforcement learning (DRL), which combines the function approximation abilities of deep learning with RL, has been successfully employed in developing several demand response programs [16][17][18][19]. DRL can help realize multiple control objectives, which can be exploited to perform joint operations like maintaining thermal comfort in buildings while reducing power consumption with demand response [20]. ...
... Multiple fields, such as energy markets, insurance industries, and sponsored search auctions, regard pricing problems as a series of decision-making problems [25]- [29]. The trading process can be modeled as MDPs or constrained MDPs [27], [28], and various algorithms like Q-learning [25], deep deterministic policy gradient (DDPG) [27], long short-term memory(LSTM) [26] are introduced to support the construction of pricing mechanism. Many numerical simulations and experimental validation demonstrate that RL-based mechanisms show high performance and robustness in solving complex pricing problems, achieving fairness and revenue maximum by setting proper objectives [25], [28]. ...
Preprint
Full-text available
Data trading has been hindered by privacy concerns associated with user-owned data and the infinite reproducibility of data, making it challenging for data owners to retain exclusive rights over their data once it has been disclosed. Traditional data pricing models relied on uniform pricing or subscription-based models. However, with the development of Privacy-Preserving Computing techniques, the market can now protect the privacy and complete transactions using progressively disclosed information, which creates a technical foundation for generating greater social welfare through data usage. In this study, we propose a novel approach to modeling multi-round data trading with progressively disclosed information using a matchmaking-based Markov Decision Process (MDP) and introduce a Social Welfare-optimized Data Pricing Mechanism (SWDPM) to find optimal pricing strategies. To the best of our knowledge, this is the first study to model multi-round data trading with progressively disclosed information. Numerical experiments demonstrate that the SWDPM can increase social welfare 3 times by up to 54\% in trading feasibility, 43\% in trading efficiency, and 25\% in trading fairness by encouraging better matching of demand and price negotiation among traders.
... For instance, authors in [37], [38] performed optimal energy management catering cost and discomfort minimization in SPG. Authors developed deep learning models in [39]- [41] for online pricing of DR smart home energy management, and commercial buildings optimization, respectively. Similarly, several studies simultaneously catered energy cost and PADR by optimal power scheduling in SPG [42]- [44]. ...
Article
Full-text available
Demand response (DR) shaves peak energy consumption and drives energy conservation to ensure reliable operation of power grid.With the emergence of the smart power grid (SPG), DR has become increasingly popular and highly contributes to energy optimization. On this note, in this work, DR is adopted for scheduling home appliances to reduce utility bill payment, peak to average demand ratio (PADR), and discomfort. First, home appliances are classified into two categories according to time and power flexibility: time-flexible and power-flexible. Secondly, the demand-side users power usage scheduling problem is modelled as per the user priority and modes of operation considering demand and supply. Finally, the energy consumption scheduler (ECS) is developed to adjust the time and power of both types of appliances under different operation modes to acquire desired tradeoff between utility bills payment and discomfort, and PADR and discomfort. Simulation results depict that employing the proposed ECS benefits demand-side users by minimizing their utility bills payment, PADR, and achieving the desired tradeoff between utility bill payment and discomfort, and PADR and discomfort. Results illustrate that developed reduced utility bill payment and alleviated PADR without compromising comfort by 28% and 21%, respectively, compared to without scheduling case.
... Inspired by the MPC literature where forecasts are crucial, RNN were used to provide predictions of various exogenous variables [26]- [29] as additional information to the RL controllers. Another idea consists in using an RNN as (or at least as part of) a world model, that is a data-driven simulation environment in which policies can be trained using the usual RL algorithms [30]- [32]. ...
Article
Full-text available
With increasing electricity prices, cost savings through load shifting are becoming increasingly important for energy end users. While dynamic pricing encourages customers to shift demand to low price periods, the non-stationary and highly volatile nature of electricity prices poses a significant challenge to energy management systems. In this paper, we investigate the flexibility potential of data centres by optimising heating, ventilation, and air conditioning systems with a general model-free reinforcement learning approach. Since the soft actor-critic algorithm with feed-forward networks did not work satisfactorily in this scenario, we propose instead a parameterisation with a recurrent neural network architecture to successfully handle spot-market price data. The past is encoded into a hidden state, which provides a way to learn the temporal dependencies in the observations and highly volatile rewards. The proposed method is then evaluated in experiments on a simulated data centre. Considering real temperature and price signals over multiple years, the results show a cost reduction compared to a proportional, integral and derivative controller while maintaining the temperature of the data centre within the desired operating ranges. In this context, this work demonstrates an innovative and applicable reinforcement learning approach that incorporates complex economic objectives into agent decision-making. The proposed control method can be integrated into various Internet of things-based smart building solutions for energy management.
... We have deployed the following strategies: Dutch, and Vickrey auctions. We ran simulations with 2,4,6,8,10,12,14, and 20 bidders. The number of auction rounds for Dutch was limited to ten, with each round having a one-minute timeout. ...
Article
Full-text available
In the smart grid, electricity price is a key element for all participants in the electric power industry. To meet the smart grid’s various goals, Demand-Response (DR) control aims to change the electricity consumption behavior of consumers based on dynamic pricing or financial benefits. DR methods are divided into centralized and distributed control based on the communication model. In centralized control, consumers communicate directly with the power company, without communicating among themselves. In distributed control, consumer interactions offer data to the power utility about overall consumption. Online auctions are distributed systems with several software agents working on behalf of human buyers and sellers. The coordination model chosen can have a substantial impact on the performance of these software agents. Based on the fair energy scheduling method, we examined Vickrey and Dutch auctions and coordination models in an electronic marketplace both analytically and empirically. The number of software agents and the number of messages exchanged between these agents were all essential indicators. For the simulation, GridSim was used, as it is an open-source software platform that includes capabilities for application composition, resource discovery information services, and interfaces for assigning applications to resources. We concluded that Dutch auctions are better than Vickrey auctions in a supply-driven world where there is an abundance of power. In terms of equity, Dutch auctions are more equitable than Vickrey auctions. This is because Dutch auctions allow all bidders to compete on an equal footing, with each bidder having the same opportunity to win the item at the lowest possible price. In contrast, Vickrey auctions can lead to outcomes that favor certain bidders over others, as bidders may submit bids that are higher than necessary to increase their chances of winning.
... Limitations in the RL and DR studies are highlighted here, including methods for comparing methodologies and categorizing algorithms and their benefits. In [57], the method is framed under the scenario in which the long-term response of consumers is unknown, thus, the author proposes an online pricing method, where long short-term memory (LSTM) networks are combined with a reinforcement learning approach to perform the virtual exploration. In addition, LSTM networks are used to predict the response of the consumer, and through reinforcement learning, the response of the consumer is framed to find the best prices to maximize total benefit and avoid the adverse effects of myopic optimization on RL. ...
Article
Full-text available
International agreements support the modernization of electricity networks and renewable energy resources (RES). However, these RES affect market prices due to resource variability (e.g., solar). Among the alternatives, Demand Response (DR) is presented as a tool to improve the balance between electricity supply and demand by adapting consumption to available production. In this sense, this work focuses on developing a DR model that combines price and incentive-based demand response models (P-B and I-B) to efficiently manage consumer demand with data from a real San Juan—Argentina distribution network. In addition, a price scheme is proposed in real time and by the time of use in relation to the consumers’ influence in the peak demand of the system. The proposed schemes increase load factor and improve demand displacement compared to a demand response reference model. In addition, the proposed reinforcement learning model improves short-term and long-term price search. Finally, a description and formulation of the market where the work was implemented is presented.
... The author also sheds light on the problems associated with DR implementations across United States of America, China and developed cities of Europe. Reference [4] implements a pricing mechanism that combines long short-term memory (LSTM) models and RL to eradicate the pricing problem of service providers when the consumers' response behavior is not known. In [5], an incentive based DR program with deep learning and RL is proposed, whereas in [6], an hour ahead DR algorithm for home energy management system (HEMS) is implemented. ...
Article
Full-text available
The demand for energy around the world continues to increase at a very high rate. To sufficiently supply this high demand, it is imperative to employ efficient methods so that the total costs for fulfilling such high demand in energy are minimized. To achieve this ambitious goal, this paper proposes a multi-agent reinforcement learning system for time of use pricing based combined demand response and voltage control. For this purpose, a long short term memory network is employed for day-ahead load forecasting in order to remove future uncertainties. The Q-learning algorithm is used which is a model free algorithm and hence, doesn’t require the agent(s) to have prior knowledge of the environment. The role of reinforcement learning in this work is very important since it allows the agent(s) to determine their respective optimal behavior(s) autonomously without explicit training by the end user. To allow effective cooperation among multiple agents, each household is controlled by its own agent, whereas all the household agents are directed by a master agent or service provider. Accordingly, the voltage control agent serves the purpose of checking voltage level violations in the system and removing them through optimal decision making. The proposed system yields very good results, whereby, not only is the overall cost of electricity reduced, but voltage level violations are also removed from the entire system. The implementation of this mechanism reduces the total average aggregated load demand from 5.23 kW to 3.86 kW, while reducing the total aggregated average cost from 94.01 Rs to 60.80 Rs, thanks to the proposed effective multi-agent based system.
Article
Both multi-time slot (day-ahead) pricing and single-time slot (real-time) pricing are vital parts of real-time pricing. The widespread utilization of renewable energy sources has increased flexibility and uncertainty of the grid system. A single pricing strategy is unable to meet the demand of the grid. This paper design a hybrid pricing strategy for smart grids that combines real-time and day-ahead pricing. This strategy considers multi-source energy generation on the supply side, also, the distributed energy generation and load transfer on the demand side. Within the framework of the Markov Decision Process, a bi-level stochastic model for real-time demand response is formulated to maximise the benefits of both the supply and demand sides. Subsequently, a deep deterministic policy gradient algorithm relying on prioritized experience replay is used to formulate a real-time price plan and user’s power consumption. Through the information interaction between the upper and lower levels, the real-time prices are decided adaptively. Meanwhile, the optimal strategy of power supply and consumption are obtained. Our simulation results demonstrate that the proposed hybrid pricing strategy guarantee the benefits of both the supply and demand sides, while achieving the balance between supply and demand. Keywords Hybrid pricingBi-level programmingReinforcement learningPer-DDPGDemand response
Article
In the context of user-side demand response, flexible resources in buildings such as air conditioners and electric vehicles are characterized by small individual capacities, large aggregate scales, and geographically dispersed distributions, necessitating integration by intelligence buildings (IRs). However, the optimization scheduling of IR clusters often involves detailed energy consumption data, posing privacy issues such as revealing household routines. The traditional aggregator-IRs bi-level architecture typically employs centralized or game-theoretic strategies for optimization scheduling, which struggle to balance efficiency and privacy simultaneously. To address this issue, this paper proposes a bi-level optimization scheduling strategy that balances efficiency and privacy. First, deep reinforcement learning models are established for both the aggregator and the IRs to address efficiency. Then, the trained demand response models of the IRs are encapsulated into strategy black boxes and uploaded to the aggregator’s deep reinforcement learning model. Throughout this process, the aggregator remains unaware of the user-side data, thus protecting user privacy. Additionally, considering that training IR strategy black box models is a parallel and similar process, this paper introduces the paradigm of federated learning to reduce learning costs and improve training efficiency on the IRs side. Furthermore, an adaptive clustering federated deep reinforcement learning method is proposed to address the heterogeneity of the IRs. Finally, case studies demonstrate the feasibility and effectiveness of the proposed method.
Article
This paper investigates the load scheduling problem within a residential microgrid, where the microgrid operator (MO) is regarded as a trusted third-party that provides a limited information exchange for all residential customers. We consider both power exchange between the microgrid and the utility grid and local energy trading between customers, where a pricing model based on the inclining block rate (IBR) is applied to generate the local trading prices (LTPs). A privacy-preserving load scheduling method based on multi-agent deep reinforcement learning (MADRL) is proposed, which can simultaneously reduce the energy cost of the customers and the peak load of the microgrid. To preserve privacy, the proposed method adopts the centralized training with decentralized execution (CTDE) technique so that the trained agents of each customer can schedule their energy consumption only based on the local information. Finally, case studies based on real-world residential load data are presented, and the results show that the proposed method can efficiently reduce the electricity cost of customers and the difference between daily peak and valley load of the microgrid.
Article
Distributed energy resources, especially residential behind-the-meter photovoltaics (BTM PV), have been playing increasingly important roles in modern smart grids. Residential netload, which is closely tied with customers' gross load consumption and weather, is usually the only data available for the market operator in a local electricity market (LEM). This paper seeks to design customized prices for an LEM that consists of an agent, BTM PV, energy storage (ES), prosumers, and consumers. The LEM agent who owns a community-scale ES system is responsible for operating the market, determining the internal price, and facilitating the energy sharing within the community. A hierarchical energy trading infrastructure is considered, where the LEM agent acts as the mediator between the external utility grid and customers. A two-stage decision-making framework, including both look-ahead ES scheduling and real-time customized price design, is developed for the agent's profit maximization. Besides, the impacts of netload forecasting and BTM PV disaggregation are also investigated. The customer's consumption behavior is modeled as a utility maximization problem. Compared with the benchmark uniform price design, it is found that the customized pricing scheme could further improve the LEM agent's profit by 4% to 130%, depending on the weather conditions and seasonal load patterns.
Article
Integrated demand response (IDR) has been recognized as an effective approach to alleviate supply–demand imbalance in integrated energy systems (IESs). However, complex response characteristics in demand side, including substitute effect, time-varying characteristics, and uncertainties, coupled with incomplete information, have been main obstacles to predict consumer's response behavior accurately and map out effective incentive strategies. This paper proposes an improved incentive-based IDR model based on cross-elasticity theory, behavioral economics, and multi-stage rebound theory to deal with the complex characteristics, with a recursive moving window linear regression algorithm based on maximum likelihood estimation to cope with incomplete information. Our IDR model is mathematically expressed as a bi-level stochastic optimization problem, which is transformed into an equivalent nonlinear convex optimization problem to solve it efficiently. Simulation results verify merits of our model in enhancing effectiveness of incentive strategies, accuracy of consumer behavior estimation, and capability of risk management of multi-energy aggregators (MEAs), which is conducive to decreasing total cost as well as power deviation and increasing consumer's utility.
Article
Full-text available
The high level of uncertainty of renewable energy sources generation creates differences between electricity supply and demand, endangering the reliable operation of the power system. Demand response has gained significant attention as a means to cope with uncertainty of renewable energy sources. Demand response of residential and service sector consumers, when accumulated and managed by aggregators, can play a role in existing electricity markets. This paper addresses the question to what extent aggregator-mediated demand response can be used to deal with the impacts of the uncertainty of solar generation. Uncertain solar generation leads to imbalances of an aggregator. These imbalances can be reduced by shifting flexible loads, which is called demand response for internal balancing. The aim of this paper is to assess the impact of demand response from loads in residential and service sectors for internal balancing to reduce the imbalances of an aggregator, caused by uncertain solar generation. For this purpose, a Model Predictive Control model which minimizes the imbalances of the aggregator through load shifting is presented. The model is applied to a realistic case study in the Netherlands. The results show that demand response for internal balancing succeeds in reducing imbalances. Even though this is favorable from the power system's perspective, economic analysis shows that the aggregator is not financially incentivized to implement demand response for internal balancing.
Article
Full-text available
Increasing energy efficiency of thermostatically controlled loads has the potential to substantially reduce domestic energy demand. However, optimizing the efficiency of thermostatically controlled loads requires either an existing model or detailed data from sensors to learn it online. Often, neither is practical because of real-world constraints. In this paper, we demonstrate that this problem can benefit greatly from multi-agent learning and collaboration. Starting with no thermostatically controlled load specific information, the multi-agent modelling and control framework is evaluated over an entire year of operation in a large scale pilot in The Netherlands, constituting over 50 houses, resulting in energy savings of almost 200 kW h per household (or 20% of the energy required for hot water production). Theoretically, these savings can be even higher, a result also validated using simulations. In these experiments, model accuracy in the multi-agent frameworks scales linearly with the number of agents and provides compelling evidence for increased agency as an alternative to additional sensing, domain knowledge or data gathering time. In fact, multi-agent systems can accelerate learning of a thermostatically controlled load's behaviour by multiple orders of magnitude over single-agent systems, enabling active control faster. These findings hold even when learning is carried out in a distributed manner to address privacy issues arising from multi-agent cooperation.
Article
Full-text available
A new heuristic demand response technique for consumption scheduling of appliances in order to decrease peak to average ratio (PAR) of power demand is introduced. The proposed technique uses a hopping scheme to schedule the appliances with flexible schedule without the need to obtain individual consumption of appliances thereby providing a high level of consumers’ confidentiality. The proposed demand response is built on simple mathematical equations that significantly simplify advanced metering infrastructure (AMI) as well as communication requirements. In the stochastic programming, energy consumption scheduler (ECS) embedded in AMI defines appliances’ consumption vector based on the information vector that is provided in real time by network’s control center. To show the effectiveness of the proposed scheme, its performance in reducing the peak to average and energy retail price is evaluated numerically and compared to idealistic as well as practical benchmarks.
Article
Full-text available
In this paper, we consider the profit-maximizing demand response of an energy load in the real-time electricity market. An energy load has the flexibility of shifting its energy usage in time, and therefore is in perfect position to exploit the volatile real-time market price through demand response. We show that the profit-maximizing demand response strategy can be obtained by solving a finite-horizon Markov decision process (MDP) problem, which requires extremely high computational complexity due to the continuous state and action spaces. To tackle the high computational complexity, we propose a dual approximate approach that transforms the MDP problem into a linear programming problem. Then, a row-generation based solution algorithm is proposed to solve the problem efficiently. We demonstrate through extensive simulations that the proposed method significantly reduces the computational complexity of the optimal MDP problem, while incurring marginal performance loss. More interestingly, the proposed demand response strategy also alleviates the supply-demand imbalance in the power grid, and even reduces the bills of other market participants. On average, the proposed quadratic approximation and improved row generation algorithm (QARG) increases the energy customer's profit by 55.9% and saves the bills of other utilities by 80.2% comparing with the benchmark algorithms.
Article
Full-text available
This study proposes a cooperative multi-agent system for managing the energy of a stand-alone microgrid. The multi-agent system learns to control the components of the microgrid so as this to achieve its purposes and operate effectively, by means of a distributed, collaborative reinforcement learning method in continuous actions-states space. Stand-alone microgrids present challenges regarding guaranteeing electricity supply and increasing the reliability of the system under the uncertainties introduced by the renewable power sources and the stochastic demand of the consumers. In this article we consider a microgrid that consists of power production, power consumption and power storage units: the power production group includes a Photovoltaic source, a fuel cell and a diesel generator; the power consumption group includes an electrolyzer unit, a desalination plant and a variable electrical load that represent the power consumption of a building; the power storage group includes only the Battery bank. We conjecture that a distributed multi-agent system presents specific advantages to control the microgrid components which operate in a continuous states and actions space: For this purpose we propose the use of fuzzy Q-Learning methods for agents representing microgrid components to act as independent learners, while sharing state variables to coordinate their behavior. Experimental results highlight both the effectiveness of individual agents to control system components, as well as the effectiveness of the multi-agent system to guarantee electricity supply and increase the reliability of the microgrid.
Article
Full-text available
Within a framework of assessment of demand response as an efficient flexibility resource for electric power systems, the main objective of this paper is to present an empirical methodology to obtain a full characterization of residential consumers’ flexibility in response to economic incentives. The aim of the proposed methodology is to assist a hypothetical demand response provider in the task of quantifying flexibility of a real population of consumers during a supposed trial that would precede a large-scale implementation of a demand response program. For this purpose, mere average values of predictable responsiveness do not provide meaningful information about the uncertainties associated to human behavior so a probabilistic characterization of this flexibility based on Quantile Regression (QR) is suggested. The proposed usage of QR to model individual observed flexibility provides a concise parametric representation of consumers that allows a straightforward application of classification methods to partition the sample of consumers into categories of similar flexibility. The modelling approach presented here also depicts a full picture of uncertainty and variability of the expected flexibility and enables the definition of two specific risk measures for the context of demand response that have been denominated flexibility at risk (FaR) and conditional flexibility at risk (CFaR). The application of the methodology to a case study based on a real demand response experience in Spain illustrates the potential of the method to capture the complexity and variability of consumer responsiveness. The particular case study presented here shows non-intuitive shapes in the individual conditional distribution functions of flexibility and a potential high variability between different individual flexibility profiles. It also demonstrates the possible decisive influence that interaction effects between socio-economic factors, such as the number of occupants, the business as usual electricity consumption and the education level of consumers, may have on demand responsiveness.
Article
Full-text available
Coordinated operation of distributed thermostatic loads such as heat pumps and air conditioners can reduce energy costs and prevents grid congestion , while maintaining room temperatures in the comfort range set by consumers. This paper furthers efforts towards enabling thermostatically controlled loads (TCLs) to participate in real-time retail electricity markets under a transactive control paradigm. An agent-based approach is used to develop an effective and low complexity demand response control scheme for TCLs. The proposed scheme adjusts aggregated thermostatic loads according to real-time grid conditions under both heating and cooling modes. A case study is presented showing the method reduces consumer electricity costs by over 10% compared to uncoordinated operation. Keywords Market-based control, resource allocation, smart grid, thermostatically controlled loads Highlights • Market-based control for flexible loads based on transactive paradigm • Thermostatically controlled loads with comfort range instead of set-point temperature • Demand response with fast reaction to market price fluctuations
Article
Full-text available
This paper presents a predictive energy management strategy for a parallel hybrid electric vehicle (HEV) based on velocity prediction and reinforcement learning (RL). The design procedure starts with modeling the parallel HEV as a systematic control-oriented model and defining a cost function. Fuzzy encoding and nearest neighbor approaches are proposed to achieve velocity prediction, and a finite-state Markov chain (MC) is exploited to learn transition probabilities of power demand. To determine the optimal control behaviors and power distribution between two energy sources, a novel RL-based energy management strategy is introduced. For comparison purposes, the two velocity prediction processes are examined by RL using the same realistic driving cycle. The look-ahead energy management strategy is contrasted with shortsighted and dynamic programming (DP)-based counterparts, and further validated by hardware-in-the-loop (HIL) test. The results demonstrate that the RL-optimized control is able to significantly reduce fuel consumption and computational time.
Article
Full-text available
We study a demand response problem from operator's perspective with realistic settings, in which the operator faces uncertainty and limited communication. Specifically, the operator does not know the cost function of consumers and cannot have multiple rounds of information exchange with consumers. We formulate an optimization problem for the operator to minimize its operational cost considering time-varying demand response targets and responses of consumers. We develop a joint online learning and pricing algorithm. In each time slot, the operator sends out a price signal to all consumers and estimates the cost functions of consumers based on their noisy responses. We measure the performance of our algorithm using regret analysis and show that our online algorithm achieves logarithmic regret with respect to the operating horizon. In addition, our algorithm employs linear regression to estimate the aggregate response of consumers, making it easy to implement in practice. Simulation experiments validate the theoretic results and show that the performance gap between our algorithm and the offline optimality decays quickly.
Article
Full-text available
Without demand-side management, an increase in the number of electric vehicles (EVs) could result in overloads on distribution feeders. Aggregators could optimally manage the charging/discharging of the EVs, to not only maximize the consumers' welfare in response to real-time prices and accommodate their needs for transportation, but also to keep the distribution network within its operating limits. This paper pro-poses a decentralized framework in which the aggregator seeks to maximize its profits while the consumers minimize their costs in response to time-varying prices, and additional incentives provided to mitigate potential overloads in the distribution system. Test results show that a large penetration of EV penetration can then be managed without violating the capacity of the distribution network.
Article
Full-text available
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Article
Owing to countless advancements in information technology, all residential consumers, in any part of the world, are empowered to contribute to demand response (DR) programs, to manage their electricity usage and to cut associated expenses using a suitable energy management system. Solutions based on consumer participation in meeting the growing electricity demand are carried out through different programs, and incentive-based demand response (IBDR) programs play an important role in these circumstances. However, introducing such programs to any new market needs consumers' willingness along with a good policy support. This study assesses the willingness and interest of consumers to participate in different IBDR programs and the associated need for developing policies, based on the consumers’ feedback, in a subsidized electricity market such as that of Kuwait. A survey was conducted to get feedback from consumers on three different IBDR programs and four incentive schemes. After establishing the association between incentivization and load reduction, and identifying consumers’ choice on the most preferred IBDR programs and incentive schemes, the results were used to assess the need for different policy strategies for a typical subsidized market. The results of this study can be taken as a reference for formulating policies and programs for similar markets. The analysis on the impact of the programs indicates that by implementing IBDR programs, in addition to the financial benefit to both consumers and implementer, Kuwait can maintain its reserve capacity without any further addition of power plants.
Article
In this paper, an intelligent multi-microgrid (MMG) energy management method is proposed based on deep neural network (DNN) and model-free reinforcement learning techniques. In the studied problem, multiple microgrids are connected to a main distribution system and they purchase power from the distribution system to maintain local consumption. From the perspective of the distribution system operator (DSO), the target is to decrease the demand-side peak-to-average ratio (PAR), and to maximize the profit from selling energy. To protect user privacy, DSO learns the multi-microgrid response by implementing a deep neural network (DNN) without direct access to user’s information. Further, the DSO selects its retail pricing strategy via a Monte Carlo method from reinforcement learning, which optimizes the decision based on prediction. The simulation results from the proposed data-driven deep learning method, as well as comparisons with conventional model-based methods, substantiate the effectiveness of the proposed approach in solving power system problems with partial or uncertain information.
Article
Short-term load forecasting (STLF) plays an important role in the planning and operation of power systems. However, with the wide use of distributed generations (DGs) and smart devices in smart grid environment, it brings new requirements on the accuracy, quickness and intelligence of STLF. To address this problem, a novel short-term load forecasting method based on attention mechanism (AM), rolling update (RU) and bi-directional long short-term memory (Bi-LSTM) neural network is proposed. Firstly, RU is utilized to update the data in real time, making the input data of the model more effective. Secondly, influence weights are assigned through AM to highlight the effective characteristics of the input variables. Thirdly, a Bi-LSTM is used for model training, and the predicted load values are obtained through the linear transformation layer and softmax layer. Finally, the actual data sets from the New South Wales (NSW) and the Victoria (VIC) in Australia are employed to verify the validity of the method. The results show that the introduction of AM and RU into forecasting model can improve the prediction accuracy. Compared with traditional Bi-LSTM model, both the mean absolute percentage error (MAPE) and the root mean square error (RMSE) of Bi-LSTM model with AM and RU have declined in the load forecasting for the two data sets. And it proves that the proposed method has higher accuracy, less computation time and better generalization ability.
Article
The practice of disclosing price of electricity before consumption (dynamic pricing) is essential to promote aggregator-based demand response in smart and connected communities. However, both practitioners and researchers have expressed fear that wild fluctuations in demand response resulting from dynamic pricing may adversely affect the stability of both the network and the market. This paper presents a comprehensive methodology guided by a data-driven learning model to develop stable and coordinated strategies for both dynamic pricing as well as demand response. The methodology is designed to learn offline without interfering with network operations. Application of the methodology is demonstrated using simulation results from a sample 5-bus PJM network. Results show that it is possible to arrive at stable dynamic pricing and demand response strategies that can reduce cost to the consumers as well as improve network load balance.
Article
The overall model for the hybrid electric tracked vehicle is built in detail. • Fast Q-learning algorithm is applied to derive energy management strategy. • An efficient online energy management strategy update framework is constructed. • Hardware-in-loop simulation experiment is conducted to validate the performance. • The strategy improves fuel economy and has potential for real-time applications. Hardware-in-loop simulation bench A B S T R A C T The energy management approach of hybrid electric vehicles has the potential to overcome the increasing energy crisis and environmental pollution by reducing the fuel consumption. This paper proposes an online updating energy management strategy to improve the fuel economy of hybrid electric tracked vehicles. As the basis of the research, the overall model for the hybrid electric tracked vehicle is built in detail and validated through the field experiment. To accelerate the convergence rate of the control policy calculation, a novel reinforcement learning algorithm called fast Q-learning is applied which improves the computational speed by 16%. The cloud-computation is presented to afford the main computation burden to realize the online updating energy management strategy in hardware-in-loop simulation bench. The Kullback-Leibler divergence rate to trigger the update of the control strategy is designed and realized in hardware-in-loop simulation bench. The simulation results show that the fuel consumption of the fast Q-learning based online updating strategy is 4.6% lower than that of stationary strategy, and is close to that of dynamic programming strategy. Besides, the computation time of the proposed method is only 1.35 s which is much shorter than that of dynamic programming based method. The results indicate that the proposed energy management strategy can greatly improve the fuel economy and have the potential to be applied in the real-time application. Moreover, the adaptability of the online energy management strategy is validated in three realistic driving schedules.
Article
Given the increasing prevalence of smart grids, the introduction of demand-side participation and distributed energy resources (DERs) has great potential for eliminating peak loads, if incorporated within a single framework such as a virtual power plant (VPP). In this paper, we develop a data mining-driven incentive-based demand response (DM-IDR) scheme to model electricity trading between a VPP and its participants, which induces load curtailment of consumers by offering them incentives and also makes maximum utilization of DERs. As different consumers exhibit different attitudes toward incentives, it is both essential and practical to provide flexible incentive rate strategies (IRSs) for consumers, thus respecting their unique requirements. To this end, our DM-IDR scheme first employs data mining techniques (e.g., clustering and classification) to divide consumers into different categories by their bid-offers. Next, from the perspective of VPP, the proposed scheme is formulated as an optimization problem to minimize VPP operation costs as well as guarantee consumer's interests. The experimental results demonstrate that through offering different IRSs to categorized consumers, the DM-IDR scheme induces more load reductions; this mitigates critical load, further decreases VPP operation costs and improves consumer profits.
Article
This paper investigates the energy management problem in the field of energy Internet (EI) with inter-disciplinary techniques. The concept of EI has been proposed for a while. However, there still exist many fundamental and technical issues that have not been fully investigated. In this paper, a new energy regulation issue is considered based on the operational principles of EI. Multiple targets are considered along with constraints. Then, the practical energy management problem is formulated as a constrained optimal control problem. Notably, no explicit mathematical model for power of renewable power generation devices and loads is utilized. Due to the complexity of this problem, conventional methods appear to be inapplicable. To obtain the desired control scheme, a model-free deep reinforcement learning algorithm is applied. A practical solution is obtained, and the feasibility as well as the performance of the proposed method are evaluated with numerical simulations.
Article
Balancing electricity generation and consumption is essential for smoothing the power grids. Any mismatch between energy supply and demand would increase costs to both the service provider and customers and may even cripple the entire grid. This paper proposes a novel real-time incentive-based demand response algorithm for smart grid systems with reinforcement learning and deep neural network, aiming to help the service provider to purchase energy resources from its subscribed customers to balance energy fluctuations and enhance grid reliability. In particular, to overcome the future uncertainties, deep neural network is used to predict the unknown prices and energy demands. After that, reinforcement learning is adopted to obtain the optimal incentive rates for different customers considering the profits of both service provider and customers. Simulation results show that this proposed incentive-based demand response algorithm induces demand side participation, promotes service provider and customers profitabilities, and improves system reliability by balancing energy resources, which can be regarded as a win-win strategy for both service provider and customers.
Article
Buildings account for about 40% of the global energy consumption. Renewable energy resources are one possibility to mitigate the dependence of residential buildings on the electrical grid. However, their integration into the existing grid infrastructure must be done carefully to avoid instability, and guarantee availability and security of supply. Demand response, or demand-side management, improves grid stability by increasing demand flexibility, and shifts peak demand towards periods of peak renewable energy generation by providing consumers with economic incentives. This paper reviews the use of reinforcement learning, a machine learning algorithm, for demand response applications in the smart grid. Reinforcement learning has been utilized to control diverse energy systems such as electric vehicles, heating ventilation and air conditioning (HVAC) systems , smart appliances, or batteries. The future of demand response greatly depends on its ability to prevent consumer discomfort and integrate human feedback into the control loop. Reinforcement learning is a potentially model-free algorithm that can adapt to its environment, as well as to human preferences by directly integrating user feedback into its control logic. Our review shows that, although many papers consider human comfort and satisfaction, most of them focus on single-agent systems with demand-independent electricity prices and a stationary environment. However, when electricity prices are modelled as demand-dependent variables, there is a risk of shifting the peak demand rather than shaving it. We identify a need to further explore reinforcement learning to coordinate multi-agent systems that can participate in demand response programs under demand-dependent electricity prices. Finally, we discuss directions for future research, e.g., quantifying how RL could adapt to changing urban conditions such as building refurbishment and urban or population growth.
Article
In this study, a customer reward scheme is proposed to build an effective demand response program for improving demand elasticity. First, an objective function has been formulated based on the market operation and an optimal incentive price has been derived from this objective function. Second, the incentive price is employed as a part of a reward scheme to encourage customers to reduce their electricity demand to a certain level during peak hours. Two typical customer response scenarios are studied to investigate the impact of customer response sensitivity on the loss of utilities' and customers' profits. Finally, a dataset for the state of New South Wales, Australia is employed as a case study to examine the effectiveness of the proposed scheme. The obtained results show that the proposed scheme can help improve the elasticity of demand significantly thereby reducing the associated financial risk greatly. Moreover, the proposed scheme allows customers to get involved voluntarily and maximise their profits with minimum sacrifice of their comfort levels.
Article
By advancement and vogue of smart grid technologies, there is a strong attitude toward utilizing different strategies for participating in demand response (DR) programs in electricity markets. DR programs can be classified into two main categories namely incentive-based programs (IBPs) and time-based rate programs (TBRPs). In this paper, an improved incentive-based DR (IBDR) model is proposed. In our proposed IBP, the concept of elasticity is improved where it depends not only on the electricity price, but also is a function of consumption hour and customer type. In this program, the incentive value which is paid to the participating consumers is not a fix value and relates to the peak intensity of each hour. The proposed IBP can participate in both of day-ahead and intra-day electricity markets. The property of considering intra-day market enables consumers to provide maximum DR if possible. The proposed model is implemented on peak load curve of Spanish electricity market and a 200-unit residential complex. Different scenarios are considered to show effectiveness of the proposed DR model from various aspects including peak shaving as well as economic indices.
Article
With the modern advanced information and communication technologies in smart grid systems, demand response (DR) has become an effective method for improving grid reliability and reducing energy costs due to the ability to react quickly to supply-demand mismatches by adjusting flexible loads on the demand side. This paper proposes a dynamic pricing DR algorithm for energy management in a hierarchical electricity market that considers both service provider's (SP) profit and customers’ (CUs) costs. Reinforcement learning (RL) is used to illustrate the hierarchical decision-making framework, in which the dynamic pricing problem is formulated as a discrete finite Markov decision process (MDP), and Q-learning is adopted to solve this decision-making problem. Using RL, the SP can adaptively decide the retail electricity price during the on-line learning process where the uncertainty of CUs’ load demand profiles and the flexibility of wholesale electricity prices are addressed. Simulation results show that this proposed DR algorithm, can promote SP profitability, reduce energy costs for CUs, balance energy supply and demand in the electricity market, and improve the reliability of electric power systems, which can be regarded as a win-win strategy for both SP and CUs.
Article
A key component to understanding demand response programs design is elasticity, which reflects customer reaction to economic offers. In this work, customer elasticity for Incentive Based Demand Response (IBDR) programs is estimated using data from two nation wide surveys and integrated with a detailed residential load model. In addition, incentive based elasticity is calculated at the individual appliance level since this is more effective for operations than at an aggregate value for a feeder. The concept of appliance base elasticity is derived from various contributions of each appliance in the aggregate load signal and the necessity of use for the customer. Results show that the needed customer incentive for certain loads, such as, lighting and washing is less than HVAC, but since the HVAC energy share in total load is much higher generally, it has greater elasticity. Considering the important role of HVAC in the aggregate load signal, the elasticity is studied in more detail using estimates of different thermostat settings. Analysis shows that elasticity of HVAC decreases while average power increases. To disaggregate the load signal for each appliance, a constrained non-negative matrix factorization (CNMF) method is proposed. In addition, this method is used to decompose the HVAC signal to identify different thermostat settings.
Article
Integration of different energy resources at the customers’ premises in recent years, advances in Information and Communication Technologies (ICTs), and Advanced Metering Infrastructure (AMI) systems are becoming attractive tools for developing new real time demand response at the supplier side and the management of energy resources at the customers’ side. The management programs can be classified as; smart grid management from the supplier side and intelligent Energy, ‘‘i-Energy” from the consumer energy management side. There are two types of programs; (i) time based program like real time pricing and (ii) incentive based demand response. A combination of these two programs is proposed in this paper with the merged program to be real time with incentive demand response. The time based demand response programs can be improved by using smart metering infrastructure and different resources. The incentive demand comes from the feasibility of providing new concept known as ‘i-energy’ at the customers’ sides. To achieve this, Smart Meters (SMs) and different resources at the customers’ premises using this concept are applied. By integrating different resources at the customers’ premises, using the i-energy concept, can change the limitation given in the time based program. The first developed program at the supplier side depends on purchasing MW from the customers who participate in the program. The second contribution is the ‘‘i-Energy” management technique at the customers’ side that is based on congestion and potential games through strategy of load control using different resources. Revenue for different participants in the program from the commercial and industrial sectors, at different levels of reduction and different usage of different resources, is discussed.
Article
Enduring expansions and advancement in the knowledge driven energy-economy era promise a power system that is cost-effectively competent, environmentally responsive, flawless and operationally acquiescent. This imminent power system will depend on latest communication technologies, computational easiness, and monitoring and control strategies attributable to the consumers or end users. Amid the numerous advances related to this development, a strategic and significant component is demand side management (DSM). Demand response can play an important and pertinent role in the milieu of DSM, and incentive-based demand response program (IBDRP) is getting customer focus in many parts of the world. This study tries to bring out major aspects to be considered while introducing IBDRP in retail electricity markets. Document analysis, a well-known qualitative research methodology, is used for analyzing various IBDRP schemes practiced globally, with the help of the ATLAS.ti software and utilizing the snowball sampling technique. For better understanding, the results of the axial coding are arranged in three stages: pre-implementation stage, implementation stage, and post-implementation stage. Based on the importance and their frequency of appearance, ten most prominent codes were identified, and the strength of their interrelationships is presented on a multilevel scale with three different ranks. One of the major and important outcomes of this research is that, with the help of the proposed conceptual framework, a proper implementation framework can be built, which can be used for developing most appropriate IBDRP for any retail electricity market.
Article
Many current studies on smart grid in electricity market are indicative of the key role it plays in electricity generation, distribution, retailing and end-user management. Demand response programs (DRPs) can be used to lower high energy prices in wholesale electricity markets, and ensure the security of power systems when at risk. The concern of most researchers in this field is to further unearth the potential of smart grid in the direction of demand response (DR) through enhanced demand side management (DSM) centering on the behavior of the end-user. Our model proposes a more effective way in using incentive based demand response program to help residential customers derive more benefits from smart grid. The constrained non-linear programming (CNLP) model optimizes residential consumption of electricity by shaving of load at peak and increasing of load at off-peak to help generators reduce production cost at peak times and increase revenue at of off-peaks. The model uses a credit function to regulate consumption and reward end-users for load shedding and load shifting at peaks and also at off-peaks reward end-users for increasing load. The simulated results show that, high consumption appliances are best used at dawn, midday, and at night, if consumers want to cut down cost. The corresponding effect is that, generating and distribution companies derive the right revenue from their investments by not producing beyond their capacity during peaks at high cost and maintain constant power supply in and environmentally friendly manner.
Article
A novel two-level scheduling method was proposed in this paper, which helps an aggregator optimally schedule its flexible thermostatically controlled loads with renewable energy to arbitrage in the intraday electricity market. The proposed method maximizes the economic benefits of all the prosumers in the aggregation, and naturally helps balance intra-hour differences between supply and demand of the bulk power systems because the prices of the intraday electricity market reflects the need of the bulk power systems. In the proposed two-level scheduling, the upper level is a model predictive control optimization, of which the objective function is to minimize the sum of energy and capacity cost of imbalances and the constraints are thermal constraints based on a proposed energy-balanced model, while the lower level adopts the typical temperature priority list (TPL) control. Simulation results verified the validity of the proposed method and evaluated the effects of important influencing factors. In the base case, 41.64% imbalance cost was saved compared to the reference TPL-based control. Moreover, three further conclusions were drawn: (a) the proposed method mainly saves the imbalance cost by reducing imbalance peak, thus being suitable for places with high capacity price for imbalances; (b) parameter heterogeneity affects the performance of the proposed method, and average value method performs well only with low heterogeneity; (c) the performance of the proposed method worsens with the increase of forecast uncertainty, but keeps better than that of typical TPL-based control unless the forecast uncertainty gets very strong.
Article
We consider the setting in which an electric power utility seeks to curtail its peak electricity demand by offering a fixed group of customers a uniform price for reductions in consumption relative to their predetermined baselines. The underlying demand curve, which describes the aggregate reduction in consumption in response to the offered price, is assumed to be affine and subject to unobservable random shocks. Assuming that both the parameters of the demand curve and the distribution of the random shocks are initially unknown to the utility, we investigate the extent to which the utility might dynamically adjust its offered prices to maximize its cumulative risk-sensitive payoff over a finite number of T days. In order to do so effectively, the utility must design its pricing policy to balance the tradeoff between the need to learn the unknown demand model (exploration) and maximize its payoff (exploitation) over time. In this paper, we propose such a pricing policy, which is shown to exhibit an expected payoff loss over T days that is at most O(T)O(\sqrt{T}), relative to an oracle pricing policy that knows the underlying demand model. Moreover, the proposed pricing policy is shown to yield a sequence of prices that converge to the oracle optimal prices in the mean square sense.
Article
This paper proposes a novel demand response method that aims at reducing the long term cost of charging the battery of an individual plug-in electric vehicle (PEV). The problem is cast as a daily decision making problem for choosing the amount of energy to be charged in the PEV battery within a day. We model the problem as a Markov decision process with unknown transition probabilities. A batch reinforcement learning algorithm is proposed for learning an optimum cost reducing charging policy from a batch of transition samples and making cost reducing charging decisions in new situations. In order to capture the day-to-day differences of electricity charging costs the method makes use of actual electricity prices for the current day and predicted electricity prices for the following day. A Bayesian neural network is employed for predicting the electricity prices. For constructing the reinforcement learning training data set we use historical prices. A linear programming based method is developed for creating a data-set of optimal charging decisions. Different charging scenarios are simulated for each day of the historical time frame using the set of past electricity prices. Simulation results using real world pricing data demonstrate cost savings of 10%-50% for the PEV owner when using the proposed charging method.
Article
Driven by recent advances in batch Reinforcement Learning (RL), this paper contributes to the application of batch RL to demand response. In contrast to conventional model-based approaches, batch RL techniques do not require a system identification step, making them more suitable for a large-scale implementation. This paper extends fitted Q-iteration, a standard batch RL technique, to the situation when a forecast of the exogenous data is provided. In general, batch RL techniques do not rely on expert knowledge about the system dynamics or the solution. However, if some expert knowledge is provided, it can be incorporated by using the proposed policy adjustment method. Finally, we tackle the challenge of finding an open-loop schedule required to participate in the day-ahead market. We propose a model-free Monte Carlo method that uses a metric based on the state-action value function or Q-function and we illustrate this method by finding the day-ahead schedule of a heat-pump thermostat. Our experiments show that batch RL techniques provide a valuable alternative to model-based controllers and that they can be used to construct both closed-loop and open-loop policies.
Article
Electric water heaters have the ability to store energy in their water buffer without impacting the comfort of the end user. This feature makes them a prime candidate for residential demand response. However, the stochastic and nonlinear dynamics of electric water heaters, makes it challenging to harness their flexibility. Driven by this challenge, this paper formulates the underlying sequential decision-making problem as a Markov decision process and uses techniques from reinforcement learning. Specifically, we apply an auto-encoder network to find a compact feature representation of the sensor measurements, which helps to mitigate the curse of dimensionality. A wellknown batch reinforcement learning technique, fitted Q-iteration, is used to find a control policy, given this feature representation. In a simulation-based experiment using an electric water heater with 50 temperature sensors, the proposed method was able to achieve good policies much faster than when using the full state information. In a lab experiment, we apply fitted Q-iteration to an electric water heater with eight temperature sensors. Further reducing the state vector did not improve the results of fitted Q-iteration. The results of the lab experiment, spanning 40 days, indicate that compared to a thermostat controller, the presented approach was able to reduce the total cost of energy consumption of the electric water heater by 15%.
Article
In this paper, we study a dynamic pricing and energy consumption scheduling problem in the microgrid where the service provider acts as a broker between the utility company and customers by purchasing electric energy from the utility company and selling it to the customers. For the service provider, even though dynamic pricing is an efficient tool to manage the microgrid, the implementation of dynamic pricing is highly challenging due to the lack of the customer-side information and the various types of uncertainties in the microgrid. Similarly, the customers also face challenges in scheduling their energy consumption due to the uncertainty of the retail electricity price. In order to overcome the challenges of implementing dynamic pricing and energy consumption scheduling, we develop reinforcement learning algorithms that allow each of the service provider and the customers to learn its strategy without a priori information about the microgrid. Through numerical results, we show that the proposed reinforcement learning-based dynamic pricing algorithm can effectively work without a priori information about the system dynamics and the proposed energy consumption scheduling algorithm further reduces the system cost thanks to the learning capability of each customer.
Article
This paper addresses the problem of defining a day-ahead consumption plan for charging a fleet of electric vehicles (EVs), and following this plan during operation. A challenge herein is the beforehand unknown charging flexibility of EVs, which depends on numerous details about each EV (e.g., plug-in times, power limitations, battery size, power curve, etc.). To cope with this challenge, EV charging is controlled during opertion by a heuristic scheme, and the resulting charging behavior of the EV fleet is learned by using batch mode reinforcement learning. Based on this learned behavior, a cost-effective day-ahead consumption plan can be defined. In simulation experiments, our approach is benchmarked against a multistage stochastic programming solution, which uses an exact model of each EVs charging flexibility. Results show that our approach is able to find a day-ahead consumption plan with comparable quality to the benchmark solution, without requiring an exact day-ahead model of each EVs charging flexibility.
Article
Interruptible Load (IL) programs have long been an accepted measure to intelligently and reliably shed demand in case of contingencies in the power grid. However, the emerging market for Electric Vehicles (EV) and the notion of providing non-emergency ancillary services through the demand side have sparked new interest in designing direct load scheduling programs that manage the consumption of appliances on a day-to-day basis. In this paper, we define a mechanism for a Load Serving Entity (LSE) to strategically compensate customers that allow the LSE to directly schedule their consumption, every time they want to use an eligible appliance. We study how the LSE can compute such incentives by forecasting its profits from shifting the load of recruited appliances to hours when electricity is cheap, or by providing ancillary services, such as regulation and load following. To make the problem scalable and tractable we use a novel clustering approach to describe appliance load and laxity. In our model, customers choose to participate in this program strategically, in response to incentives posted by the LSE in publicly available menus. Since 1) appliances have different levels of demand flexibility; and 2) demand flexibility has a time-varying value to the LSE due to changing wholesale prices, we allow the incentives to vary dynamically with time and appliance cluster. We study the economic effects of the implementation of such program on a population of EVs, using real-world data for vehicle arrival and charge patterns.
Article
Demand response (DR) for residential and small commercial buildings is estimated to account for as much as 65% of the total energy savings potential of DR, and previous work shows that a fully automated Energy Management System (EMS) is a necessary prerequisite to DR in these areas. In this paper, we propose a novel EMS formulation for DR problems in these sectors. Specifically, we formulate a fully automated EMS's rescheduling problem as a reinforcement learning (RL) problem (referred to as the device based RL problem), and show that this RL problem decomposes over devices under reasonable assumptions. Compared with existing formulations, our new formulation (1) does not require explicitly modeling the user's dissatisfaction on job rescheduling, (2) enables the EMS to self-initiate jobs, (3) allows the user to initiate more flexible requests and (4) has a computational complexity linear in the number of devices. We also propose several new performance metrics for RL algorithms applied to the device based RL problem, and demonstrate the simulation results of applying Q-learning, one of the most popular and classical RL algorithms, to a representative example.
Article
In this paper, a weighted combination of different demand vs. price functions referred to as Composite Demand Function (CDF) is introduced in order to represent the demand model of consuming sectors which comprise different clusters of customers with divergent load profiles and energy use habitudes. Derived from the mathematical representations of demand, dynamic price elasticities are proposed to demonstrate the customers’ demand sensitivity with respect to the hourly price. Based on the proposed CDF and dynamic elasticities, a comprehensive demand response (CDR) model is developed in this paper for the purpose of representing customer response to time-based and incentive-based demand response (DR) programs. The above model helps a Retail Energy Provider (REP) agent in an agent-based retail environment to offer day-ahead real time prices to its customers. The most beneficial real time prices are determined through an economically optimized manner represented by REP agent’s learning capability based on the principles of Q-learning method incorporating different aspects of the problem such as price caps and customer response to real time pricing as a time-based demand response program represented by the CDR model. Numerical studies are conducted based on New England day-ahead market’s data to investigate the performance of the proposed model.Highlights► Composite Demand Function represents demand vs. price function for customer groups. ► Dynamic self elasticities are extracted by differentiating demand functions. ► Comprehensive DR model is developed based on CDF and dynamic price elasticities. ► CDR model represents customers’ behavior and predicts their response to DR programs. ► Day-ahead real time pricing is conducted based on the principles of Q-learning.
Article
The sharp increase in energy prices and the growing concern for environmental issues, among other things, are behind the renewed interest in energy demand estimation. However, there is very little academic literature that takes account of the usual situation of energy suppliers: high quality but incomplete data. In this paper, we propose a useful and rather simple instrument for estimating electricity demand with incomplete or/and imperfect data available to suppliers. In particular, using real data of expenditure and consumption of electricity, we employ a model of random effects for panel data in order to estimate residential and industrial electricity demand in Spain.