June 2020
·
51 Reads
·
30 Citations
IEEE Control Systems Letters
Residential loads have great potential to enhance the efficiency and reliability of electricity systems via demand response (DR) programs. One major challenge in residential DR is how to learn and handle unknown and uncertain customer behaviors. In this paper, we consider the residential DR problem where the load service entity (LSE) aims to select an optimal subset of customers to optimize some DR performance, such as maximizing the expected load reduction with a financial budget or minimizing the expected squared deviation from a target reduction level. To learn the uncertain customer behaviors influenced by various time-varying environmental factors, we formulate the residential DR as a contextual multi-armed bandit (MAB) problem, and develop an online learning and selection (OLS) algorithm based on Thompson sampling to solve it. This algorithm takes the contextual information into consideration and is applicable to complicated DR settings. Numerical simulations are performed to demonstrate the learning effectiveness of the proposed algorithm.