Article

Semi-parametric models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We sum up the methodology of the team Tololo on the Global Energy Forecasting Competition 2014 for the electric load and electricity price forecasting tracks. During the competition, we used and tested many statistical and machine learning methods such as random forests, gradient boosting machines, or generalized additive models. In this paper, we only present the methods that have shown the best performance. For electric load forecasting, our strategy consisted first in producing a probabilistic forecast of the temperature and then plugging the obtained temperature scenarios to produce a probabilistic forecast of the load. Both steps are performed by fitting a quantile generalized additive model (quantGAM). Concerning the electricity price forecasting, we investigate three methods that we used during the competition. The first method follows the spirit of the one used for electric load. The second one is based on combining a set of individual predictors and the last one fit a sparse linear regression on a large set of covariates. We chose to present in this paper these three methods, because they all exhibit good performances and present a nice potential of improvements for future research.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Both approaches have their proponents. For instance, Cuaresma et al. (2004), Misiorek et al. (2006), Zhou et al. (2006), Garcia-Martos et al. (2007), Karakatsani and Bunn (2008), Lisi and Nan (2014), Alonso et al. (2016), Gaillard et al. (2016), Hagfors et al. (2016), , Nowotarski and Weron (2016), Uniejewski et al. (2016), and Ziel (2016a), among others, advocate the use of sets of 24 (48 or more) models estimated independently for each load period, typically using Ordinary Least Squares (OLS). In the neural network literature, Amjady and Keynia (2009a), Marcjasz et al. (2018) and Panapakidis and Dagoumas (2016), among others, use a separate network (i.e., a different parameter set) for each hour of the next day. ...
... Still, by increasing the set of dependent explanatory variables such interrelationships can be added. For instance, Gaillard et al. (2016), Uniejewski et al. (2016) and Ziel (2016a) consider the previous day's price for midnight, i.e., P d−1,24 , as an explanatory variable in each of the 24 single models. Formally such a set of 24 interrelated models can be written as: ...
... This class of models is based on a parsimonious autoregressive structure originally proposed by Misiorek et al. (2006) and later used in a number of EPF studies (Weron and Misiorek, 2008;Serinaldi, 2011;Kristiansen, 2012;Nowotarski et al., 2014;Gaillard et al., 2016;Marcjasz et al., 2018;Nowotarski and Weron, 2018;Uniejewski et al., 2016;Ziel, 2016a). Since these models are built on some prior knowledge of experts, following Uniejewski et al. (2016) and Ziel (2016a), we refer to them as expert models. ...
Preprint
We conduct an extensive empirical study on short-term electricity price forecasting (EPF) to address the long-standing question if the optimal model structure for EPF is univariate or multivariate. We provide evidence that despite a minor edge in predictive performance overall, the multivariate modeling framework does not uniformly outperform the univariate one across all 12 considered datasets, seasons of the year or hours of the day, and at times is outperformed by the latter. This is an indication that combining advanced structures or the corresponding forecasts from both modeling approaches can bring a further improvement in forecasting accuracy. We show that this indeed can be the case, even for a simple averaging scheme involving only two models. Finally, we also analyze variable selection for the best performing high-dimensional lasso-type models, thus provide guidelines to structuring better performing forecasting model designs.
... To be usable in practical forecasting, additive quantile regression methods must have several properties: 1) the range of model structures available for modelling quantiles must be comparable to that available under conventional GAMs, otherwise the benefits of modelling quantiles may be offset by insufficient model flexibility; 2) smoothing and other tuning parameters must be selected automatically, otherwise the modelling process becomes too labour intensive and subjective for operational use; 3) uncertainty estimation has to be part of model estimation, since knowing forecast uncertainty is essential for operational use and 4) methods must be sufficiently numerically efficient and robust for routine deployment. The work reported here started when two of the authors (YG and RN) were participating in the GEFCom2014 forecasting competition, and found that existing additive quantile regression method implementations failed to meet these requirements, forcing them to develop the ad hoc procedure described in Gaillard et al. (2016). ...
... Given that load consumption is strongly dependent on the time of the day, it is common practice (e.g. Gaillard et al., 2016) to fit a different model for each half-hour. To limit the computational burden, here we consider only the period between between 11:30 and 12am. ...
... Similarly, we test each method on the last 24 and 12 months of, respectively, the UK and the French data set. Gaillard et al. (2016) proposed a quantile regression method which ranked 1st on both the load and the price forecasting track of GEFCom2014. This is a two-step procedure, which was partially motivated by the lack of reliable software for fitting additive quantile models. ...
Preprint
We propose a novel framework for fitting additive quantile regression models, which provides well calibrated inference about the conditional quantiles and fast automatic estimation of the smoothing parameters, for model structures as diverse as those usable with distributional GAMs, while maintaining equivalent numerical efficiency and stability. The proposed methods are at once statistically rigorous and computationally efficient, because they are based on the general belief updating framework of Bissiri et al. (2016) to loss based inference, but compute by adapting the stable fitting methods of Wood et al. (2016). We show how the pinball loss is statistically suboptimal relative to a novel smooth generalisation, which also gives access to fast estimation methods. Further, we provide a novel calibration method for efficiently selecting the 'learning rate' balancing the loss with the smoothing priors during inference, thereby obtaining reliable quantile uncertainty estimates. Our work was motivated by a probabilistic electricity load forecasting application, used here to demonstrate the proposed approach. The methods described here are implemented by the qgam R package, available on the Comprehensive R Archive Network (CRAN).
... Furthermore, hybrid models combining diverse forecasting methods have emerged lately, as summarized by Petropoulos et al. (2022) and Hong et al. (2016). Similarly, GAMs have consistently demonstrated outstanding performance in previous forecasting competitions, such as achieving top-three results, see Nedellec et al. (2014), in GEFCom 2012 Davis et al. (2014), and securing the top two spots, see Gaillard et al. (2016); Dordonnat et al. (2016), in GEFCom 2014 Hong et al. (2016). Motivated by the success of GAMs in load forecasting competitions alongside the effectiveness of early GAM-based methods in load forecasting, see e.g. ...
... Similar deficiencies in forecasting accuracy of Vanilla benchmarks were observed by e.g. Browell & Fasiolo (2021) or Ziel & Liu (2016); Gaillard et al. (2016) in GEFCom2014, see Hong et al. (2016), andZiel (2019) in GEFCom2017, see Hong et al. (2019). However, including actual temperatures and recency temperature effects, even when only forecasted by seasonality, improves Vanilla forecasting accuracy. ...
... Consequently, in further studies, multivariate models for high and low-demand hours, or in its most finely resolved form, for 24 hours as they are advocated in GAM load models by e.g. Pierrot & Goude (2011);Nedellec et al. (2014); Dordonnat et al. (2016); Gaillard et al. (2016), should be analyzed. ...
Preprint
Accurate mid-term (weeks to one year) hourly electricity load forecasts are essential for strategic decision-making in power plant operation, ensuring supply security and grid stability, and energy trading. While numerous models effectively predict short-term (hours to a few days) hourly load, mid-term forecasting solutions remain scarce. In mid-term load forecasting, besides daily, weekly, and annual seasonal and autoregressive effects, capturing weather and holiday effects, as well as socio-economic non-stationarities in the data, poses significant modeling challenges. To address these challenges, we propose a novel forecasting method using Generalized Additive Models (GAMs) built from interpretable P-splines and enhanced with autoregressive post-processing. This model uses smoothed temperatures, Error-Trend-Seasonal (ETS) modeled non-stationary states, a nuanced representation of holiday effects with weekday variations, and seasonal information as input. The proposed model is evaluated on load data from 24 European countries. This analysis demonstrates that the model not only has significantly enhanced forecasting accuracy compared to state-of-the-art methods but also offers valuable insights into the influence of individual components on predicted load, given its full interpretability. Achieving performance akin to day-ahead TSO forecasts in fast computation times of a few seconds for several years of hourly data underscores the model's potential for practical application in the power system industry.
... The study used the quantile generalized additive model (quantGAM) based on the work of [22] and extended by [23], defined as: ...
... Lastly, Online Prediction by ExpRt Aggregation (OPERA) was one of the methods used for forecasts combination and was developed by [22]. Given a set of observed values for a particular variable Y , GHI, with a sequence of values y 1 , y 2 , ..., y n , are the predicted values. ...
... For a given time step t = 1, 2, ..., n, there are predictions from independent variables, which are the weather variables given by x k,t , where k is a set of finite methods k = 1, ..., K combines a multiple of algorithms of online learning, and they predict forecasts. The OPERA forecasts are given by equation (22). ...
Article
Full-text available
The increasing demand for electricity and the need for clean energy sources have increased solar energy use. Accurate forecasts of solar energy are required for easy management of the grid. This paper compares the accuracy of two Gaussian Process Regression (GPR) models combined with Additive Quantile Regression (AQR) and Bayesian Structural Time Series (BSTS) models in the 2-day ahead forecasting of global horizontal irradiance using data from the University of Pretoria from July 2020 to August 2021. Four methods were adopted for variable selection, Lasso, Elasticnet, Boruta, and GBR (Gradient Boosting Regression). The variables selected using GBR were used because they produced the lowest MAE (Minimum Absolute Errors) value. A comparison of seven models GPR (Gaussian Process Regression), Two-layer DGPR (Two-layer Deep Gaussian Process Regression), bstslong (Bayesian Structural Time Series long), AQRA (Additive Quantile Regression Averaging), QRNN(Quantile Regression Neural Network), PLAQR(Partial Linear additive Quantile Regression), and Opera(Online Prediction by ExpRt Aggregation) was made. The evaluation metrics used to select the best model were the MAE (Mean Absolute Error) and RMSE (Root Mean Square Error). Further evaluations were done using proper scoring rules and Murphy diagrams. The best individual model was found to be the GPR. The best forecast combination was AQRA (AQR Averaging) based on MAE. However, based on RMSE, GPNN was the best forecast combination method. Companies such as Eskom could use the methods adopted in this study to control and manage the power grid. The results will promote economic development and sustainability of energy resources.
... Application to short-term electricity demand forecasting attests the relevance of the approach and highlights how it is crucial to optimize both the formula of the GAM and the adaptation parameter simultaneously. In practice, it is possible to greatly improve electricity demand forecasts by using mixtures of models (see for example Gaillard et al. (2016)). The idea is to have several good predictors, called "experts", who make quite different errors from one another, and to make time-evolving weighted averages of their predictions. ...
... In contrast, in EA(f, Q), we do not use the iterative grid search and the evolutionary algorithm optimizes the formula and the hyperparameter of the adaptive version both at the same time. ∈ Ω using the mutation operator described in Section 5 CO (f 1 , Q 1 ), (f 2 , Q 2 ) Children models created from models (f 1 , Q 1 ) and (f 2 , Q 2 ) ∈ Ω using the crossing-over operator described in Section 5 We refer, among others, to Goude et al., 2013 andGaillard et al., 2016 for an exhaustive presentation of the generalized additive models used to forecast power demand. Our state-of-the-art handcrafted model takes into account some meteorological variables at an hourly time step: the temperature T t and the smoothed temperatureT t , the cloud cover C t , and the wind speed W t ; and some calendar variables: the day of the week D t , the hour of the day H t ∈ {1, ..., 24} ...
Preprint
Electricity demand forecasting is key to ensuring that supply meets demand lest the grid would blackout. Reliable short-term forecasts may be obtained by combining a Generalized Additive Models (GAM) with a State-Space model (Obst et al., 2021), leading to an adaptive (or online) model. A GAM is an over-parameterized linear model defined by a formula and a state-space model involves hyperparameters. Both the formula and adaptation parameters have to be fixed before model training and have a huge impact on the model's predictive performance. We propose optimizing them using the DRAGON package of Keisler (2025), originally designed for neural architecture search. This work generalizes it for automated online generalized additive model selection by defining an efficient modeling of the search space (namely, the space of the GAM formulae and adaptation parameters). Its application to short-term French electricity demand forecasting demonstrates the relevance of the approach
... The out-of-sample test period spans 5 years (2019-2023), covering the Covid-19 pandemic and the 2021-2022 energy crisis with skyrocketing prices of electricity. The day-ahead electricity price forecasts are obtained either using a parsimonious autoregressive expert model with exogenous variables (ARX; Billé et al., 2023;Gaillard et al., 2016;Maciejowska et al., 2021;Ziel and Weron, 2018) or a parameter-rich, LASSO-estimated regression model (LEAR; Lago et al., 2021;Wagner et al., 2022). We assess the significance of differences in predictive performance using the multivariate variant of the Diebold-Mariano (DM) test, as introduced by Ziel and Weron (2018). ...
... The day-ahead electricity price forecasts are obtained using two model classes. The first is a parsimonious autoregressive expert model with exogenous variables (ARX), originally proposed by Misiorek et al. (2006), later modified and compared in a number of EPF studies under different names and acronyms (Billé et al., 2023;Gaillard et al., 2016;Maciejowska et al., 2021;Maciejowska and Nowotarski, 2016;Taylor, 2021;Ziel, 2016;Ziel and Weron, 2018). In the ARX model the electricity price for day d and hour h is given by the following formula: ...
Preprint
Full-text available
Recent studies provide evidence that decomposing the electricity price into the long-term seasonal component (LTSC) and the remaining part, predicting both separately, and then combining their forecasts can bring significant accuracy gains in day-ahead electricity price forecasting. However, not much attention has been paid to predicting the LTSC, and the last 24 hourly values of the estimated pattern are typically copied for the target day. To address this gap, we introduce a novel approach which extracts the trend-seasonal pattern from a price series extrapolated using price forecasts for the next 24 hours. We assess it using two 5-year long test periods from the German and Spanish power markets, covering the Covid-19 pandemic, the 2021/2022 energy crisis, and the war in Ukraine. Considering parsimonious autoregressive and LASSO-estimated models, we find that improvements in predictive accuracy range from 3\% to 15\% in terms of the root mean squared error and exceed 1\% in terms of profits from a realistic trading strategy involving day-ahead bidding and battery storage.
... A methodology for WSS, based on genetic algorithms, has been proposed in [23] for later use in electrical load forecasting. In this work, the authors conduct an extensive literature review on works related to the WSS theme for load forecasting, highlighting the following studies: [24] defines an equivalent temperature based on the average values found from selected substations. To achieve this, [24] employs generalized additive models and generalized crossvalidation (GCV) methods, a technique also observed in [25]. ...
... In this work, the authors conduct an extensive literature review on works related to the WSS theme for load forecasting, highlighting the following studies: [24] defines an equivalent temperature based on the average values found from selected substations. To achieve this, [24] employs generalized additive models and generalized crossvalidation (GCV) methods, a technique also observed in [25]. The latter, however, differs by utilizing a weighted average for the stations with the exponential weighted average algorithm. ...
Article
Full-text available
In the face of growing challenges in the electrical sector, such as demand variability and climate change, understanding and forecasting electrical variables become critical for distribuition companies. This work presents a set of methodologies for calculating equivalent temperatures in large geographic areas, which can be used in forecasting models to understand the behavior of electrical variables such as demand and load, thus assisting energy distributors. For this purpose, six calculation methodologies were developed, with emphasis on the one based on linear regression. With this parameter, the electric company improves communication about consumption variations with stakeholders, in addition to avoiding the problem of curse of dimensionality in the development of consumption forecasting models. The case study, related to a distribution company in Brazil, involves forecasting both own load and total load. The study shows that using this temperature data can improve the forecast’s performance, whether using statistical or machine learning models. The best results indicate a MAPE of 2.0% for the Arima model and 2.4% for the Random Forest Regressor. Finally, it is essential to mention that the developed methodologies are applicable to other weather variables, such as precipitation, solar radiation, air humidity, and wind speed.
... In GEFCom2014, participants were tasked with forecasting probabilistic loads for a load time series and were provided with temperature data from 25 weather stations. Gaillard et al. [25] fitted each weather station's data to a generalized additive model and selected the average temperature of four weather stations based on their performance using the generalized cross-validation (GCV) technique. Dordonnat et al. [26] initially applied GCV to select six weather stations but did not achieve satisfactory results, leading them to use an exponentially weighted average algorithm, ultimately selecting three stations with equal weights. ...
... For instance, if the method proposed in [29] is applied for WSS in the GEFCom2012 and GEFCom2014 datasets, only a negligible portion of the solution space would be evaluated, amounting to 1.03% and 0.00015%, respectively. • Constraints on selected weather stations: Many studies, e.g., [21][22][23][24][25], impose strict constraints on the number of weather stations, often fixing it to a predetermined value. While this may simplify the model, it can lead to suboptimal performance when the most effective combination of stations is not aligned with the imposed constraints. ...
Article
Full-text available
Temperature is a key factor in modeling electricity demand, making the selection of optimal weather stations essential for accurate predictions. However, current methods for selecting weather stations often rely on heuristic approaches that explore only a limited subset of potential combinations, potentially missing better solutions. In this paper, we propose an innovative approach that integrates the Simulated Annealing (SA) algorithm with local search techniques to improve forecast accuracy and reduce implementation time. Our method demonstrates superior performance in both quality and efficiency compared to existing approaches, as validated across three datasets, including data from a major distribution company in Iran and the Global Energy Forecasting Competitions of 2012 and 2014. Our results show that incorporating local search techniques reduces the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) by 3.48% and 2.82%, respectively. Furthermore, the average implementation time of our SA algorithm is 36.52% lower than that of the existing metaheuristic algorithm.
... In contrast, RMSE is more sensitive to large errors because it squares the differences between predicted and actual values before averaging, giving greater weight to large outliers or significant deviations. Furthermore, to evaluate the LSTM model's predictive performance, Pinball loss [24] is used as an additional metric. Pinball loss measures how closely a model's predictions align with a specified percentile of the target distribution. ...
... Percentiles are useful where models predict a specific percentile value of the target variable rather than the mean. This is particularly helpful when it is essential to understand different parts of the distribution, like the upper or lower extreme, where it exists in the wind velocity [24]. The model evaluation is conducted with respect to the 90th percentile, meaning the model is assessed based on how accurately it predicts values that fall at or below the 90th percentile of the target distribution. ...
Conference Paper
Currently, wind power is the fastest-growing sustainable energy resource, and accurately predicting wind velocity for individual turbines is essential. Wind behavior is turbulent, with variable patterns, characteristics, and unpredictable extreme events. Moreover, the wind reaching each turbine is impacted by wake loss effects from surrounding turbines, which underscores the need for a precise local prediction model for each turbine to enhance accuracy in smart wind power production. This study proposes a novel approach using local measurements at fixed points, adopting the Eulerian perspective in fluid dynamics to forecast wind turbulence for the coming hour using a deep learning model. A long short-term memory (LSTM) forecasting model is tuned and applied. Laboratory data from wind tunnel tests are utilized to account for velocity fluctuations and wake loss effects, aiming to establish similarity to an affected single wind turbine. Model training and inference run on the high-performance computing (HPC) DEEP-DAM systems at the Jülich Supercomputing Centre. Three train-test datasets are employed to evaluate the model. The measured metrics indicate that the mean absolute error is 0.048, the root mean square error is approximately 0.065, and the Pinball loss for the 90th percentile is 0.0042. The model demonstrates a robust predictive capability, supporting the development of advanced instruments for individual turbines and facilitating smart, clean, and sustainable energy production.
... Not only does our approach perform competitively in the considered track of the GEFCom2014, but several factors undercut its true performance. All top-placing contestants in the competition perform specialised operations to improve forecasting performance, i.e. peak pre-processing [20], filtering methods to weight certain days higher [40], or tailored training data periods to improve performance [14]. In contrast, we consider all available data for training, refrain from complex pre-processing steps, and only use the default hyperparameters for the base forecasters. ...
... This ranking is determined by how each individual model would have placed in the GEFCom2014 price forecasting challenge in 2014 and therefore assumes that the other models introduced in this article are not included 2[20] 3[40] 4[14] ...
Article
Full-text available
In various applications, probabilistic forecasts are required to quantify the inherent uncertainty associated with the forecast. However, many existing forecasting methods still only generate point forecasts. Although methods exist to generate probabilistic forecasts from these point forecasts, these are often limited to prediction intervals or must be trained together with a specific point forecast. Therefore, the present article proposes a novel approach for generating probabilistic forecasts from arbitrary point forecasts. In order to implement this approach, we apply a conditional Invertible Neural Network (cINN) to learn the underlying distribution of the data and then combine the uncertainty from this distribution with an arbitrary point forecast to generate probabilistic forecasts. We evaluate our approach by generating probabilistic forecasts from multiple point forecasts and comparing these forecasts to six probabilistic benchmarks on four data sets. We show that our approach generally outperforms all benchmarks with regard to CRPS and Winkler scores and generates probabilistic forecasts with the narrowest prediction intervals whilst remaining reasonably calibrated. Furthermore, our approach enables simple point forecasting methods to rank highly in the Global Energy Forecasting Competition 2014.
... This shows that additive non-parametric QR is a very powerful modelling framework when forecasting the whole response distribution, and cyclical and seasonal variations in SI. A quantile generalised additive model (QGAM) is a new approach that was introduced by 16 where smooth effects estimated by a generalised additive model (GAM) are taken as inputs to a QR model. That is, performing QR on smooth function outputs from a GAM. ...
... The AQR model proposed by 14 and algorithm further developed by 16 gives flexibility when modelling nonlinear effects beyond the conditional mean. The non-parametric components are composed of low-dimensional additive quantile pieces. ...
Article
Full-text available
Modelling of solar irradiation is paramount to renewable energy management. This warrants the inclusion of additive effects to predict solar irradiation. Modelling of additive effects to solar irradiation can improve the forecasting accuracy of prediction frameworks. To help develop the frameworks, this current study modelled the additive effects using non-parametric quantile regression (QR). The approach applies quantile splines to approximate non-parametric components when finding the best relationships between covariates and the response variable. However, some additive effects are perceived as linear. Thus, the study included the partial linearly additive quantile regression model (PLAQR) in the quest to find how best the additive effects can be modelled. As a result, a comparative investigation on the forecasting performances of the PLAQR, an additive quantile regression (AQR) model and the new quantile generalised additive model (QGAM) using out-of-sample and probabilistic forecasting metric evaluations was done. Forecasted density plots, Murphy diagrams and results from the Diebold–Mariano (DM) hypothesis test were also analysed. The density plot, the curves on the Murphy diagram and most metric scores computed for the QGAM were slightly better than for the PLAQR and AQR models. That is, even though the DM test indicates that the PLAQR and AQR models are less accurate than the QGAM, we could not conclude an outright greater forecasting performance of the QGAM than the PLAQR or AQR models. However, in situations of probabilistic forecasting metric preferences, each model can be prioritised to be applied to the metric where it performed slightly the best. The three models performed differently in different locations, but the location was not a significant factor in their performances. In contrast, forecasting horizon and sample size influenced model performance differently in the three additive models. The performance variations also depended on the metric being evaluated. Therefore, the study has established the best forecasting horizons and sample sizes for the different metrics. It was finally concluded that a 20% forecasting horizon and a minimum sample size of 10000 data points are ideal when modelling additive effects of solar irradiation using non-parametric QR.
... According to Weron [17], the trend in the number of journal articles on EPF, as shown in Fig. 1, increased for probabilisticforecasting papers from 2016 onward. Gaillard et al. [18], who won the GEFCom 2014 competition, employed a method based on quantile regression (QR). Interestingly, three of the top four teams primarily used QR to obtain predictive distributions. ...
Article
Full-text available
The volatility and uncertainty of electricity prices due to renewable energy sources create challenges for electricity trading, necessitating reliable probabilistic electricity-price forecasting (EPF) methods. This study introduces an EPF approach using quantile regression (QR) with general predictors, focusing on the UK market. Unlike market-specific models, this method ensures adaptability and reduces complexity. Using 1,132 days of training data, including electricity prices, demand forecasts, and generation forecasts obtained from UK electricity companies, results show that the proposed model achieved a mean absolute error of 18.27 [£/MWh] for predicting volatile short-term spot market prices. The QR model achieved high predictive accuracy and stability, with only a 4–25% average pinball loss increases when the previous day’s prices ( P t −1 ) were excluded due to bidding deadlines. These findings demonstrate the model’s robustness and its potential to enhance market efficiency by providing reliable and simplified probabilistic forecasts, aiding stakeholders in mitigating risks and optimizing strategies.
... [24] selected stations based on the cubic regression's in-sample fits, and the best two stations were aggregated by averaging. [25] utilized the generalized cross-validation criterion to select stations and created a virtual station by averaging the lowest validation loss across multiple stations. [26] proposed a rankbased method with three main procedures, including ranking stations based on Hongtao's vanilla benchmark [20], creating virtual stations by averaging the top weather stations, and selecting the best virtual stations based on the performance of the validation dataset. ...
Preprint
Full-text available
Meteorological factors (MF) are crucial in day-ahead load forecasting as they significantly influence the electricity consumption behaviors of consumers. Numerous studies have incorporated MF into the load forecasting model to achieve higher accuracy. Selecting MF from one representative location or the averaged MF as the inputs of the forecasting model is a common practice. However, the difference in MF collected in various locations within a region may be significant, which poses a challenge in selecting the appropriate MF from numerous locations. A representation learning framework is proposed to extract geo-distributed MF while considering their spatial relationships. In addition, this paper employs the Shapley value in the graph-based model to reveal connections between MF collected in different locations and loads. To reduce the computational complexity of calculating the Shapley value, an acceleration method is adopted based on Monte Carlo sampling and weighted linear regression. Experiments on two real-world datasets demonstrate that the proposed method improves the day-ahead forecasting accuracy, especially in extreme scenarios such as the "accumulation temperature effect" in summer and "sudden temperature change" in winter. We also find a significant correlation between the importance of MF in different locations and the corresponding area's GDP and mainstay industry.
... Another QR-based SI modelling framework is to look at the additive effects of its covariates. The framework developed by [49] takes smooth effects estimated by a GAM as inputs to a non-parametric QR. Ref. [50] explained that a GAM represents a method of fitting a smooth relationship between variables that can not easily fitted by standard linear or non-linear models. ...
Article
Full-text available
The renewable energy industry requires accurate forecasts of intermittent solar irradiance (SI) to effectively manage solar power generation and supply. Introducing the random forests (RFs) model and its hybridisation with quantile regression modelling, the quantile regression random forest (QRRF), can help improve the forecasts’ accuracy. This paper assesses the RFs and QRRF models against the quantile generalised additive model (QGAM) by evaluating their forecast performances. A simulation study of multivariate data-generating processes was carried out to compare the forecasting accuracy of the models when predicting global horizontal solar irradiance. The QRRF and QGAM are completely new forecasting frameworks for SI studies, to the best of our knowledge. Simulation results suggested that the introduced QRRF compared well with the QGAM when predicting the forecast distribution. However, the evaluations of the pinball loss scores and mean absolute scaled errors demonstrated a clear superiority of the QGAM. Similar results were obtained in an application to real-life data. Therefore, we recommend that the QGAM be preferred ahead of decision tree-based models when predicting solar irradiance. However, the QRRF model can be used alternatively to predict the forecast distribution. Both the QGAM and QRRF modelling frameworks went beyond representing forecast uncertainty of SI as probability distributions around a prediction interval to give complete information through the estimation of quantiles. Most SI studies conducted are residual and/or non-parametric modelling that are limited to represent information about the conditional mean distribution. Extensions of the QRRF and QGAM frameworks can be made to model other renewable sources of energy that have meteorological characteristics similar to solar irradiance.
... These off-the-shelf models can be directly applied to generate probabilistic load forecasts [22]. In GEFCom2014, a winning team developed a quantile generalized additive model (quantGAM), which is a hybrid of quantile regression and generalized additive models [64]. Probabilistic load forecasting has also been conducted on individual load profiles. ...
Preprint
Full-text available
The widespread popularity of smart meters enables an immense amount of fine-grained electricity consumption data to be collected. Meanwhile, the deregulation of the power industry, particularly on the delivery side, has continuously been moving forward worldwide. How to employ massive smart meter data to promote and enhance the efficiency and sustainability of the power grid is a pressing issue. To date, substantial works have been conducted on smart meter data analytics. To provide a comprehensive overview of the current research and to identify challenges for future research, this paper conducts an application-oriented review of smart meter data analytics. Following the three stages of analytics, namely, descriptive, predictive and prescriptive analytics, we identify the key application areas as load analysis, load forecasting, and load management. We also review the techniques and methodologies adopted or developed to address each application. In addition, we also discuss some research trends, such as big data issues, novel machine learning technologies, new business models, the transition of energy systems, and data privacy and security.
... Producing quantile forecasts was the theme of the Global Energy Forecasting Competition 2014 (GEFCom2014) [3]. of GEFCom2014 using a quantile generalized additive model (quantGAM), a hybrid of quantile regression and generalized additive models [4]. The second top team, Dordonnat et al. first developed a point forecasting model based on semiparametric regression, and then fed the model with different temperature scenarios to generate probabilistic forecasts [5]. ...
Preprint
Full-text available
Probabilistic load forecasts provide comprehensive information about future load uncertainties. In recent years, many methodologies and techniques have been proposed for probabilistic load forecasting. Forecast combination, a widely recognized best practice in point forecasting literature, has never been formally adopted to combine probabilistic load forecasts. This paper proposes a constrained quantile regression averaging (CQRA) method to create an improved ensemble from several individual probabilistic forecasts. We formulate the CQRA parameter estimation problem as a linear program with the objective of minimizing the pinball loss, with the constraints that the parameters are nonnegative and summing up to one. We demonstrate the effectiveness of the proposed method using two publicly available datasets, the ISO New England data and Irish smart meter data. Comparing with the best individual probabilistic forecast, the ensemble can reduce the pinball score by 4.39% on average. The proposed ensemble also demonstrates superior performance over nine other benchmark ensembles.
... A basic introduction for this topic can be found in Nowotarski and Weron (2014a), among others. Extensions include for instance the Factor Quantile Regression Averaging of Maciejowska et al. (2015) or lasso-based approaches as done by Gaillard et al. (2015). ...
Preprint
The liberalization of electricity markets and the development of renewable energy sources has led to new challenges for decision makers. These challenges are accompanied by an increasing uncertainty about future electricity price movements. The increasing amount of papers, which aim to model and predict electricity prices for a short period of time provided new opportunities for market participants. However, the electricity price literature seem to be very scarce on the issue of medium- to long-term price forecasting, which is mandatory for investment and political decisions. Our paper closes this gap by introducing a new approach to simulate electricity prices with hourly resolution for several months up to three years. Considering the uncertainty of future events we are able to provide probabilistic forecasts which are able to detect probabilities for price spikes even in the long-run. As market we decided to use the EPEX day-ahead electricity market for Germany and Austria. Our model extends the X-Model which mainly utilizes the sale and purchase curve for electricity day-ahead auctions. By applying our procedure we are able to give probabilities for the due to the EEG practical relevant event of six consecutive hours of negative prices. We find that using the supply and demand curve based model in the long-run yields realistic patterns for the time series of electricity prices and leads to promising results considering common error measures.
... Formally introduced by Nowotarski and Weron (2015), and successfully used in the GEFCom2014 competition (Maciejowska and Nowotarski, 2016;Gaillard et al., 2016) and later energy forecasting applications (Liu et al., 2017;Wang et al., 2019;Kath and Ziel, 2021;Uniejewski and Weron, 2021;Nitka and Weron, 2023;Yang et al., 2023;Cornell et al., 2024), the method estimates conditional quantiles of the target variable as a linear combination of point predictions in a quantile regression setting: ...
... LASSO is a regularization method that has been widely used in EPF but only in the past decade. Some of the earliest examples of research that employed LASSO in the context of EPF include [7,15,31,[35][36][37]. ...
Article
The most commonly used form of regularization typically involves defining the penalty function as a ℓ1 or ℓ2 norm. However, numerous alternative approaches remain untested in practical applications. In this study, we apply ten different penalty functions to predict electricity prices and evaluate their performance under two different model structures and in two distinct electricity markets. The study reveals that LQ and elastic net consistently produce more accurate forecasts compared to other regularization types. In particular, they were the only types of penalty functions that consistently produced more accurate forecasts than the most commonly used LASSO. Furthermore, the results suggest that cross-validation outperforms Bayesian information criteria for parameter optimization, and performs as well as models with ex-post parameter selection.
... The additive quantile regression (AQR) model is formed by combining the generalised additive model (GAM) and the quantile regression model (QRM); hence it is called a hybrid model. Gaillard et al. [23] were the first to use the AQR models in short-term load forecasting. The model was further extended by Fasiolo et al. [24]. ...
Article
Full-text available
Forecasting is important in any scientific field, including COVID-19 epidemiology. Daily confirmed COVID-19 cases are in different phases characterised by peaks, making it difficult for most mathematical models to handle. In such cases, extreme value theory plays a critical role because values of interest are usually far away from the mean. In this paper, we develop mathematical models using Extreme values to capture uncertainties of forecasts associated with the COVID-19 pandemic using real-time data. A three-stage approach to probabilistic forecasting is used in this study. The stochastic gradient boosting, generalised additive model, additive quantile regression, and the nonlinear quantile regression are used to predict extremely high quantiles, i.e. 0.95-, 0.99- and 0.995-quantiles. The second stage combines each model's predicted extremely high quantiles using the weighted mean and median methods. The pinball loss and coverage probabilities are used to evaluate the accuracy of the predictions in the third stage. For all the extreme quantiles, i.e. the 0.95-, 0.99- and 0.995-quantiles, the cubic spline regression method gives the best predictions based on the lowest pinball losses, which are 171.41, 563.49 and 115.28, respectively. The weighted mean average model dominated by the mean is the second best based on the pinball losses but the best based on the coverage probability. This study provides insights into the strengths and weaknesses of different models for short-term extreme quantile prediction of COVID-19. Estimating extreme quantiles of daily COVID-19 using models with high predictive capabilities, such as the weighted mean-median model dominated by the mean, is important to public health officials and policymakers for planning and preparing for potential surges in CoVID-19 cases and similar pandemics in the future.
... GAMs is a class of semi-parametric regression models that was developed in [19,40] and are now widely used in electricity consumption forecasting [13]. Indeed, GAMs are interesting in practice, since their additive aspect makes them highly explainable, but this also means that the choice of variables must be meticulous. ...
Preprint
Full-text available
Accurate electricity demand forecasting is essential for several reasons, especially as the integration of renewable energy sources and the transition to a decentralized network paradigm introduce greater complexity and uncertainty. The proposed methodology leverages graph-based representations to effectively capture the spatial distribution and relational intricacies inherent in this decentralized network structure. This research work offers a novel approach that extends beyond the conventional Generalized Additive Model framework by considering models like Graph Convolutional Networks or Graph SAGE. These graph-based models enable the incorporation of various levels of interconnectedness and information sharing among nodes, where each node corresponds to the combined load (i.e. consumption) of a subset of consumers (e.g. the regions of a country). More specifically, we introduce a range of methods for inferring graphs tailored to consumption forecasting, along with a framework for evaluating the developed models in terms of both performance and explainability. We conduct experiments on electricity forecasting, in both a synthetic and a real framework considering the French mainland regions, and the performance and merits of our approach are discussed.
... Gaillard et al. [26] developed a method that applies quantile regression (QR) using a generalised additive model, referred to as generalised additive quantile regression (GAQR). This modelling approach was extended by [27]. ...
Article
Full-text available
The main source of electricity worldwide stems from fossil fuels, contributing to air pollution, global warming, and associated adverse effects. This study explores wind energy as a potential alternative. Nevertheless, the variable nature of wind introduces uncertainty in its reliability. Thus, it is necessary to identify an appropriate machine learning model capable of reliably forecasting wind speed under various environmental conditions. This research compares the effectiveness of Dynamic Architecture for Artificial Neural Networks (DAN2), convolutional neural networks (CNN), random forest and XGBOOST in predicting wind speed across three locations in South Africa, characterised by different weather patterns. The forecasts from the four models were then combined using quantile regression averaging models, generalised additive quantile regression (GAQR) and quantile regression neural networks (QRNN). Empirical results show that CNN outperforms DAN2 in accurately forecasting wind speed under different weather conditions. This superiority is likely due to the inherent architectural attributes of CNNs, including feature extraction capabilities, spatial hierarchy learning, and resilience to spatial variability. The results from the combined forecasts were comparable with those from the QRNN, which was slightly better than those from the GAQR model. However, the combined forecasts were more accurate than the individual models. These results could be useful to decision-makers in the energy sector.
... For instance, winning methods in the IEEE DataPort Competition by De Vilmarest & Goude (2022) employed GAMs within model ensembles. Similarly, GAMs consistently ranked highly in previous forecasting competitions, securing top positions in GEFCom 2014 Hong et al. (2016), see Gaillard et al. (2016); Dordonnat et al. (2016). By applying linear model structures to non-linear functions along with discretization techniques, GAMs remain interpretable and efficient in estimation while capturing non-linear relationships Lepore et al. (2022); Wood (2006). ...
Preprint
Accurate forecasts of the impact of spatial weather and pan-European socio-economic and political risks on hourly electricity demand for the mid-term horizon are crucial for strategic decision-making amidst the inherent uncertainty. Most importantly, these forecasts are essential for the operational management of power plants, ensuring supply security and grid stability, and in guiding energy trading and investment decisions. The primary challenge for this forecasting task lies in disentangling the multifaceted drivers of load, which include national deterministic (daily, weekly, annual, and holiday patterns) and national stochastic weather and autoregressive effects. Additionally, transnational stochastic socio-economic and political effects add further complexity, in particular, due to their non-stationarity. To address this challenge, we present an interpretable probabilistic mid-term forecasting model for the hourly load that captures, besides all deterministic effects, the various uncertainties in load. This model recognizes transnational dependencies across 24 European countries, with multivariate modeled socio-economic and political states and cross-country dependent forecasting. Built from interpretable Generalized Additive Models (GAMs), the model enables an analysis of the transmission of each incorporated effect to the hour-specific load. Our findings highlight the vulnerability of countries reliant on electric heating under extreme weather scenarios. This emphasizes the need for high-resolution forecasting of weather effects on pan-European electricity consumption especially in anticipation of widespread electric heating adoption.
... In the load forecasting task of the GEFCom 2012, the top three teams used multiple linear regression [11], gradient boosting and Gaussian processes [12], and splines and ensembles of these models [13]. In the load forecasting task of GEFCom 2014, the top three teams used robust additive models [14], semiparametric regression models [15], and a combination of multiple different forecasting models [16]. Finally, for GEF-Com 2017, the top 3 teams used different quantile regression and generalised additive models [17], quantile gradient boosting regression trees, and an ensemble consisting of tree-based methods and neural networks [18]. ...
Article
Full-text available
Measures for balancing the electrical grid, such as peak shaving, require accurate peak forecasts for lower aggregation levels of electrical loads. Thus, the Big Data Energy Analytics Laboratory (BigDEAL) challenge—organised by the BigDEAL—focused on forecasting three different daily peak characteristics in low aggregated load time series. In particular, participants of the challenge were asked to provide long‐term forecasts with horizons of up to 1 year in the qualification. The authors present the approach of the KIT‐IAI team from the Institute for Automation and Applied Informatics at the Karlsruhe Institute of Technology. The approach to the challenge is based on a hybrid generative model. In particular, the authors use a conditional Invertible Neural Network (cINN). The cINN gets the forecast of a sliding mean as representative of the trend, different weather features, and calendar information as conditioning input. By this, the proposed hybrid method achieved second place overall and won two out of three tracks of the BigDEAL challenge.
... It may not be practical to assume that all covariates are non-linear. Such a model was introduced by Hoshino (2014) [14], which has a non-parametric component and an additive linear parametric component. ...
Preprint
Full-text available
Modelling of solar irradiation is paramount to renewable energy management. This warrants the inclusion of additive effects to solar irradiation. To help develop the frameworks, this current study modelled solar irradiation using non-parametric quantile regression (QR). The approach applies quantile splines when finding the best relationships between covariates and the response variable. As a result, the method was very suitable because relationship structures between covariates and solar irradiation are unknown. However, some additive effects are perceived as linear. Thus, the study included the partially linear additive quantile regression model (PLAQR) in our quest to find how best the additive effects can be modelled. The PLAQR model compared very well on reliability analysis, but it was outperformed by the new quantile generalised additive model (QGAM) we proposed, on all other metrics. Modelling of solar irradiation using QGAM is new to renewable energy studies. The Winkler score is another metric where QGAM was inferior, however, an additive quantile regression model was the best. Even though the models had also approximately the same continuous rank probability score in all cases, the QGAM was the best in most metric evaluations. The three models performed differently in different locations, but the location was not a significant factor in their performances. In contrast, forecasting horizon and sample size influenced model performance differently in the three additive models. The performance variations also depended on the metric being evaluated. Therefore, the study has established the best forecasting horizons and sample sizes for the different metrics. It was finally concluded that a 20% forecasting horizon and a minimum sample size of 10000 data points are ideal when modelling additive effects of solar irradiation using non-parametric QR.
... A detailed overview of their model has been published [9]. GAMs have been very successful in energy forecasting competitions [10] so their success is perhaps unsurprising in this challenge. Model combination is also something which has been very successful and has been demonstrated frequently in both GEFCOM and the M-Competitions. ...
... The improved performance of QRA against its constituent models was verified with several follow-up papers (Maciejowska, Nowotarski & Weron, 2016;Nowotarski & Weron, 2018;Kostrzewski & Kostrzewska, 2019;Uniejewski & Weron, 2021;Kath & Ziel, 2021). However, its most significant success came from the 2014 GEFcom forecasting competition, where the top two winning teams for the price track used some variant of QRA Gaillard, Goude & Nedellec, 2016). Mathematically, the model is identical to traditional quantile regression. ...
... In [74], the authors developed a model based on multiple linear regression also powered by different temperature scenarios. The authors, in [75], applied a model with quantile regression and generalized additive models for a probabilistic load forecast. In [11], the authors propose a practical methodology to generate probabilistic load forecasts by performing quantile regression averaging on a set of sister point forecasts. ...
Article
Full-text available
The advent of smart grid technologies has facilitated the integration of new and intermittent renewable forms of electricity generation in power systems. Advancements are driving transformations in the context of energy planning and operations in many countries around the world, particularly impacting short-term horizons. Therefore, one of the primary challenges in this environment is to accurately provide forecasting of the short-term load demand. This is a critical task for creating supply strategies, system reliability decisions, and price formation in electricity power markets. In this context, nonlinear models, such as Neural Networks and Support Vector Machines, have gained popularity over the years due to advancements in mathematical techniques as well as improved computational capacity. The academic literature highlights various approaches to improve the accuracy of these machine learning models, including data segmentation by similar patterns, input variable selection, forecasting from hierarchical data, and net load forecasts. In Brazil, the national independent system operator improved the operation planning in the short term through the DESSEM model, which uses short-term load forecast models for planning the day-ahead operation of the system. Consequently, this study provides a comprehensive review of various methods used for short-term load forecasting, with a particular focus on those based on machine learning strategies, and discusses the Brazilian Experience.
... The latter can be the best-fixed convex combination of all the experts or the best-fixed expert (constant over time). The used algorithm for the present work is the ML-Poly algorithm 35 , successfully applied for electricity load forecasting 36 and implemented in the R package opera 37 . This algorithm tracks the best expert or the best convex combination of experts by giving more weight to an expert that will generate a low regret. ...
Article
Full-text available
This paper focuses on day-ahead electricity load forecasting for substations of the distribution network in France; therefore, the corresponding problem lies between the instability of a single consumption and the stability of a countrywide total demand. Moreover, this problem requires to forecast the loads of over one thousand substations; consequently, it belongs to the field of multiple time series forecasting. To that end, the paper applies an adaptive methodology that provided excellent results at a national scale; the idea is to combine generalized additive models with state-space representations. However, extending this methodology to the prediction of over a thousand time series raises a computational issue. It is solved by developing a frugal variant that reduces the number of estimated parameters: forecasting models are estimated only for a few time series and transfer learning is achieved by relying on aggregation of experts. This approach yields a reduction of computational needs and their associated emissions. Several variants are built, corresponding to different levels of parameter transfer, to find the best trade-off between accuracy and frugality. The selected method achieves competitive results compared to individual models. Finally, the paper highlights the interpretability of the models, which is important for operational applications.
Preprint
Electricity price forecasting is a critical tool for the efficient operation of power systems and for supporting informed decision-making by market participants. This paper explores a novel methodology aimed at improving the accuracy of electricity price forecasts by incorporating probabilistic inputs of fundamental variables. Traditional approaches often rely on point forecasts of exogenous variables such as load, solar, and wind generation. Our method proposes the integration of quantile forecasts of these fundamental variables, providing a new set of exogenous variables that account for a more comprehensive representation of uncertainty. We conducted empirical tests on the German electricity market using recent data to evaluate the effectiveness of this approach. The findings indicate that incorporating probabilistic forecasts of load and renewable energy source generation significantly improves the accuracy of point forecasts of electricity prices. Furthermore, the results clearly show that the highest improvement in forecast accuracy can be achieved with full probabilistic forecast information. This highlights the importance of probabilistic forecasting in research and practice, particularly that the current state-of-the-art in reporting load, wind and solar forecast is insufficient.
Article
This paper studies the forecast accuracy and explainability of a battery of dayahead (Henry Hub and Title Transfer Facility (TTF)) natural gas price and volatility models. The results demonstrate the dominance of non-linear, non-parametric models with deep structure relative to various competing model specifications. By employing the explainable artificial intelligence (XAI) approach, we document that the price of natural gas is formed strategically based on crude oil and electricity prices. While the conditional volatility of natural gas returns is driven by long-memory dynamics and crude oil volatility, the informativeness of the electricity predictor has improved over the most recent volatile time period. Although we reveal that predictive non-linear relationships are inherently complex and time-varying, our findings in general support the notion that natural gas, crude oil and electricity are interconnected. Focusing on the periods when markets experienced sharp structural breaks and extreme volatility (e.g., the COVID-19 pandemic and the Russia-Ukraine conflict), we show that deep learning models provide better adaptability and lead to significantly more accurate forecast performance.
Article
This paper studies the forecast accuracy and explainability of a battery of dayahead (Henry Hub and Title Transfer Facility (TTF)) natural gas price and volatility models. The results demonstrate the dominance of non-linear, non-parametric models with deep structure relative to various competing model specifications. By employing the explainable artificial intelligence (XAI) approach, we document that the price of natural gas is formed strategically based on crude oil and electricity prices. While the conditional volatility of natural gas returns is driven by long-memory dynamics and crude oil volatility, the informativeness of the electricity predictor has improved over the most recent volatile time period. Although we reveal that predictive non-linear relationships are inherently complex and time-varying, our findings in general support the notion that natural gas, crude oil and electricity are interconnected. Focusing on the periods when markets experienced sharp structural breaks and extreme volatility (e.g., the COVID-19 pandemic and the Russia-Ukraine conflict), we show that deep learning models provide better adaptability and lead to significantly more accurate forecast performance.
Preprint
Full-text available
In this article, a multiple split method is proposed that enables construction of multidimensional probabilistic forecasts of a selected set of variables. The method uses repeated resampling to estimate uncertainty of simultaneous multivariate predictions. This nonparametric approach links the gap between point and probabilistic predictions and can be combined with different point forecasting methods. The performance of the method is evaluated with data describing the German short-term electricity market. The results show that the proposed approach provides highly accurate predictions. The gains from multidimensional forecasting are the largest when functions of variables, such as price spread or residual load, are considered. Finally, the method is used to support a decision process of a moderate generation utility that produces electricity from wind energy and sells it on either a day-ahead or an intraday market. The company makes decisions under high uncertainty because it knows neither the future production level nor the prices. We show that joint forecasting of both market prices and fundamentals can be used to predict the distribution of a profit, and hence helps to design a strategy that balances a level of income and a trading risk.
Article
Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting‐based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade‐off of some individual optimality for global enhancement. Verifying the hypotheses of Biau and Cadre's theorem (2021, Advances in contemporary statistics and econometrics—Festschrift in honour of Christine Thomas‐Agnan , Springer), we present a convergence result ensuring that the associated optimization strategy reaches the global optimum. In the experiments, we consider a variety of different base learners with increasing complexity: stumps, regression trees, Purely Random Forests, and Breiman's Random Forests. Finally, we consider simulated and benchmark datasets and a real‐world electricity demand dataset to show, by means of numerical experiments, the suitability of our procedure by examining the behavior not only of the final or the aggregated predictor but also of the whole generated sequence.
Chapter
We describe our experience in developing a predictive model that placed a high position in the BigDEAL Challenge 2022, an energy competition of load and peak forecasting. We present a novel procedure for feature engineering and feature selection, based on cluster permutation of temperatures and calendar variables. We adopted gradient boosting of trees and we enhanced its capabilities with trend modeling and distributional forecasts. We also included an approach to forecasts combination known as temporal hierarchies, which further improves the accuracy.
Article
In this paper, we present an algorithm designed to automatically merge predictions from a collection of individual prediction methods coded in R. The algorithm employs varying weights and decision rules to ascertain the optimal amalgamation of these methods, with the aim of forecasting historical time series data while minimizing human intervention. The algorithm serves as an automated component within the artificial intelligence toolkit.
Article
Electricity load forecasting is a necessary capability for power system operators and electricity market participants. Both demand and supply characteristics evolve over time. On the demand side, unexpected events as well as longer-term changes in consumption habits affect demand patterns. On the production side, the increasing penetration of intermittent power generation significantly changes the forecasting needs. We address this challenge in two ways. First, our setting is adaptive ; our models take into account the most recent observations available to automatically respond to changes in the underlying process. Second, we consider probabilistic rather than point forecasting; indeed, uncertainty quantification is required to operate electricity systems efficiently and reliably. Our methodology relies on the Kalman filter, previously used successfully for adaptive point load forecasting. The probabilistic forecasts are obtained by quantile regressions on the residuals of the point forecasting model. We achieve adaptive quantile regressions using the online gradient descent; we avoid the choice of the gradient step size considering multiple learning rates and aggregation of experts. We apply the method to two data sets: the regional net-load in Great Britain and the demand of seven large cities in the United States. Adaptive procedures improve forecast performance substantially in both use cases for both point and probabilistic forecasting.
Article
Full-text available
Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order. The authors dedicate this paper to the 2023 Turkey/Syria earthquake victims. We sincerely hope that advances in OR will play a role towards minimising the pain and suffering caused by this and future catastrophes.
Article
Full-text available
We examine possible accuracy gains from forecast averaging in the context of interval forecasts of electricity spot prices. First, we test whether constructing empirical prediction intervals (PI) from combined electricity spot price forecasts leads to better forecasts than those obtained from individual methods. Next, we propose a new method for constructing PI—Quantile Regression Averaging (QRA)—which utilizes the concept of quantile regression and a pool of point forecasts of individual (i.e. not combined) models. While the empirical PI from combined forecasts do not provide significant gains, the QRA-based PI are found to be more accurate than those of the best individual model—the smoothed nonparametric autoregressive model.
Article
Full-text available
Short-term electricity forecasting has been studied for years at EDF and different forecasting models were developed from various fields of statistics or machine learning (functional data analysis, time series, non-parametric regression, boosting, bagging). We are interested in the forecasting of France’s daily electricity load consumption based on these different approaches. We investigate in this empirical study how to use them to improve prediction accuracy. First, we show how combining members of the original set of forecasts can lead to a significant improvement. Second, we explore how to build various and heterogeneous forecasts from these models and analyze how we can aggregate them to get even better predictions.
Article
Full-text available
We consider the setting of sequential prediction of arbitrary sequences based on specialized experts. We first provide a review of the relevant literature and present two theoretical contributions: a general analysis of the specialist aggregation rule of Freund et al. (Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing (STOC), pp. 334–343, 1997) and an adaptation of fixed-share rules of Herbster and Warmuth (Mach. Learn. 32:151–178, 1998) in this setting. We then apply these rules to the sequential short-term (one-day-ahead) forecasting of electricity consumption; to do so, we consider two data sets, a Slovakian one and a French one, respectively concerned with hourly and half-hourly predictions. We follow a general methodology to perform the stated empirical studies and detail in particular tuning issues of the learning parameters. The introduced aggregation rules demonstrate an improved accuracy on the data sets at hand; the improvements lie in a reduced mean squared error but also in a more robust behavior with respect to large occasional errors.
Article
Full-text available
Electricity load forecasting faces rising challenges due to the advent of innovating technologies such as smart grids, electric cars and renewable energy production. For distribution network managers, a good knowledge of the future electricity consumption stands as a central point for the reliability of the network and investment strategies. In this paper, we suggest a semi-parametric approach based on generalized additive models theory to model electrical load over more than 2200 substations of the French distribution network, and this at both short and middle term horizons. These generalized additive models estimate the relationship between load and the explanatory variables: temperatures, calendar variables, etc. This methodology has been applied with good results on the French grid. In addition, we highlight the fact that the estimated functions describing the relations between demand and the driving variables are easily interpretable, and that a good temperature prediction is important.
Article
Full-text available
We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates. These bounds are in terms of excess losses, the differences between the instantaneous losses suffered by the algorithm and the ones of a given expert. We then demonstrate the interest of these bounds in the context of experts that report their confidences as a number in the interval [0,1] using a generic reduction to the standard setting. We conclude by two other applications in the standard setting, which improve the known bounds in case of small excess losses and show a bounded regret against i.i.d. sequences of losses.
Article
Full-text available
Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current “pseudo”-residuals by least squares at each iteration. The pseudo-residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner.
Article
Full-text available
We consider median regression and, more generally, a possibly infinite collection of quantile regressions in high-dimensional sparse models. In these models, the number of regressors p is very large, possibly larger than the sample size n, but only at most s regressors have a nonzero impact on each conditional quantile of the response variable, where s grows more slowly than n. Since ordinary quantile regression is not consistent in this case, we consider ℓ1-penalized quantile regression (ℓ1-QR), which penalizes the ℓ1-norm of regression coefficients, as well as the post-penalized QR estimator (post-ℓ1-QR), which applies ordinary QR to the model selected by ℓ1-QR. First, we show that under general conditions ℓ1-QR is consistent at the near-oracle rate s/nlog(pn)\sqrt{s/n}\sqrt{\log(p\vee n)} , uniformly in the compact set U(0,1)\mathcal{U}\subset(0,1) of quantile indices. In deriving this result, we propose a partly pivotal, data-driven choice of the penalty level and show that it satisfies the requirements for achieving this rate. Second, we show that under similar conditions post-ℓ1-QR is consistent at the near-oracle rate s/nlog(pn)\sqrt{s/n}\sqrt{\log(p\vee n)} , uniformly over U\mathcal{U} , even if the ℓ1-QR-selected models miss some components of the true models, and the rate could be even closer to the oracle rate otherwise. Third, we characterize conditions under which ℓ1-QR contains the true model as a submodel, and derive bounds on the dimension of the selected model, uniformly over U\mathcal{U} ; we also provide conditions under which hard-thresholding selects the minimal true model, uniformly over U\mathcal{U} .
Article
Full-text available
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multi- nomial regression problems while the penalties include âÂÂ_1 (the lasso), âÂÂ_2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Article
Full-text available
This article reviews flexible statistical methods that are useful for characterizing the effect of potential prognostic factors on disease endpoints. Applications to survival models and binary outcome models are illustrated.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Article
The energy industry has been going through a significant modernization process over the last decade. Its infrastructure is being upgraded rapidly. The supply, demand and prices are becoming more volatile and less predictable than ever before. Even its business model is being challenged fundamentally. In this competitive and dynamic environment, many decision-making processes rely on probabilistic forecasts to quantify the uncertain future. Although most of the papers in the energy forecasting literature focus on point or single-valued forecasts, the research interest in probabilistic energy forecasting research has taken off rapidly in recent years. In this paper, we summarize the recent research progress on probabilistic energy forecasting. A major portion of the paper is devoted to introducing the Global Energy Forecasting Competition 2014 (GEFCom2014), a probabilistic energy forecasting competition with four tracks on load, price, wind and solar forecasting, which attracted 581 participants from 61 countries. We conclude the paper with 12 predictions for the next decade of energy forecasting.
Book
The first edition of this book has established itself as one of the leading references on generalized additive models (GAMs), and the only book on the topic to be introductory in nature with a wealth of practical examples and software implementation. It is self-contained, providing the necessary background in linear models, linear mixed models, and generalized linear models (GLMs), before presenting a balanced treatment of the theory and applications of GAMs and related models. The author bases his approach on a framework of penalized regression splines, and while firmly focused on the practical aspects of GAMs, discussions include fairly full explanations of the theory underlying the methods. Use of R software helps explain the theory and illustrates the practical application of the methodology. Each chapter contains an extensive set of exercises, with solutions in an appendix or in the book’s R data package gamair, to enable use as a course text or for self-study.
Article
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
A simple minimization problem yielding the ordinary sample quantiles in the location model is shown to generalize naturally to the linear model generating a new class of statistics we term "regression quantiles." The estimator which minimizes the sum of absolute residuals is an important special case. Some equivariance properties and the joint aymptotic distribution of regression quantiles are established. These results permit a natural generalization to the linear model of certain well-known robust estimators of location. Estimators are suggested, which have comparable efficiency to least squares for Gaussian linear models while substantially out-performing the least-squares estimator over a wide class of non-Gaussian error distributions.
Article
In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Article
Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton-Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike's information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike's information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike's information criterion. The new approach also eliminates the convergence failures of previous REML-or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.
Article
This empirical paper compares the accuracy of 12 time series methods for short-term (day-ahead) spot price forecasting in auction-type electricity markets. The methods considered include standard autoregression (AR) models and their extensions -- spike preprocessed, threshold and semiparametric autoregressions (i.e., AR models with nonparametric innovations) -- as well as mean-reverting jump diffusions. The methods are compared using a time series of hourly spot prices and system-wide loads for California, and a series of hourly spot prices and air temperatures for the Nordic market. We find evidence that (i) models with system load as the exogenous variable generally perform better than pure price models, but that this is not necessarily the case when air temperature is considered as the exogenous variable; and (ii) semiparametric models generally lead to better point and interval forecasts than their competitors, and more importantly, they have the potential to perform well under diverse market conditions.
Article
Representation of generalized additive models (GAM's) using penalized regression splines allows GAM's to be employed in a straightforward manner using penalized regression methods. Not only is inference facilitated by this approach, but it is also possible to integrate model selection in the form of smoothing parameter selection into model fitting in a computationally efficient manner using well founded criteria such as generalized cross-validation. The current fitting and smoothing parameter selection methods for such models are usually effective, but do not provide the level of numerical stability to which users of linear regression packages, for example, are accustomed. In particular the existing methods cannot deal adequately with numerical rank deficiency of the GAM fitting problem, and it is not straightforward to produce methods that can do so, given that the degree of rank deficiency can be smoothing parameter dependent. In addition, models with the potential flexibility of GAM's can also present practical fitting difficulties as a result of indeterminacy in the model likelihood: Data with many zeros fitted by a model with a log link are a good example. In this article it is proposed that GAM's with a ridge penalty provide a practical solution in such circumstances, and a multiple smoothing parameter selection method suitable for use in the presence of such a penalty is developed. The method is based on the pivoted QR decomposition and the singular value decomposition, so that with or without a ridge penalty it has good error propagation properties and is capable of detecting and coping elegantly with numerical rank deficiency. The method also allows mixtures of user specified and estimated smoothing parameters and the setting of lower bounds on smoothing parameters. In terms of computational efficiency, the method compares well with existing methods. A simulation study compares the method to existing methods, including treating GAM's as mixed models.
Article
We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG(+/-). They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG(+/-) algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG(+/-) and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments which show that our worst-case upper bounds are quite tight already on simple artificial data. (C) 1997 Academic Press.
quantreg: Quantile Regression. R package version 5.05. URL http
  • R R W Koenker
  • G W Bassett
Koenker, R., 2013. quantreg: Quantile Regression. R package version 5.05. URL http://CRAN.R-project.org/package=quantreg Koenker, R. W., Bassett, G. W., 1978. Regression quantiles. Econometrica 46 (1), 33-50.
A second-order bound with excess losses
  • P Gaillard
  • G Stoltz
  • T Van Erven
Gaillard, P., Stoltz, G., van Erven, T., 2014. A second-order bound with excess losses. In: Proceedings of COLT.
Regularization paths for generalized linear models via coordinate descent
  • Friedman