Article

Online learning with the CRPS for ensemble forecasting: Ensemble online learning

Wiley
Quarterly Journal of the Royal Meteorological Society
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Ensemble forecasting resorts to multiple individual forecasts to produce a discrete probability distribution which accurately represents the uncertainties. Before every forecast, a weighted empirical distribution function is derived from the ensemble, so as to minimize the Continuous Ranked Probability Score (CRPS). We apply online learning techniques, which have previously been used for deterministic forecasting, and we adapt them for the minimization of the CRPS. The proposed method theoretically guarantees that the aggregated forecast competes, in terms of CRPS, against the best weighted empirical distribution function with weights constant in time. This is illustrated on synthetic data. Besides, our study improves the knowledge of the CRPS expectation for model mixtures. We generalize results on the bias of the CRPS computed with ensemble forecasts, and propose a new scheme to achieve fair CRPS minimization, without any assumption on the distributions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Pour fournir des prévisions probabilistes, nous proposons une approche innovante fournissant un mélange de distributions [TMB16], au chapitre 3. L'originalité de notre technique provient de l'utilisation de règles de mise à jour des poids issues de l'agrégation séquentielle pour minimiser le CRPS de la distribution empirique pondérée. Grâce à l'utilisation de l'agrégation séquentielle, nos prévisions bénéficient de garanties théoriques de robustesse. ...
... Grâce à l'utilisation de l'agrégation séquentielle, nos prévisions bénéficient de garanties théoriques de robustesse. L'agrégation séquentielle a déjà été appliquée avec succès aux prévisions quantiles [GGN16 ;BP11], mais pas pour fournir directement une prévision en loi, jusqu'aux travaux de cette thèse [TMB16], et de Baudin [Bau15] et Zamo [Zam16]. Les relations entre les prévisions quantiles et les prévisions en loi sont détaillées dans la Section 1.3. ...
... To provide probabilistic forecasts, we propose an innovative approach by combining multiple forecasts in a linear opinion pool [TMB16], in Chapter 3. The originality of our technique is to use combination rules deriving from online learning techniques in order to minimize the CRPS of the weighted empirical distribution function. Because we use online learning techniques, our forecasts come with theoretical guarantees of robustness. ...
Thesis
Our main objective is to improve the quality of photovoltaic power forecasts deriving from weather forecasts. Such forecasts are imperfect due to meteorological uncertainties and statistical modeling inaccuracies in the conversion of weather forecasts to power forecasts. First we gather several weather forecasts, secondly we generate multiple photovoltaic power forecasts, and finally we build linear combinations of the power forecasts. The minimization of the Continuous Ranked Probability Score (CRPS) allows to statistically calibrate the combination of these forecasts, and provides probabilistic forecasts under the form of a weighted empirical distribution function. We investigate the CRPS bias in this context and several properties of scoring rules which can be seen as a sum of quantile-weighted losses or a sum of threshold-weighted losses. The minimization procedure is achieved with online learning techniques. Such techniques come with theoretical guarantees of robustness on the predictive power of the combination of the forecasts. Essentially no assumptions are needed for the theoretical guarantees to hold. The proposed methods are applied to the forecast of solar radiation using satellite data, and the forecast of photovoltaic power based on high-resolution weather forecasts and standard ensembles of forecasts.
... Online learning techniques have been tested for several applications: electricity consumption, ozone concentration, wind and geopotential fields, and solar irradiance (Stoltz, 2010;Mallet et al., 2009;Mallet, 2010;Devaine et al., 2013;Baudin, 2015;Thorey et al., 2015). This paper presents application results with our innovative approach (Thorey et al., 2016), whose purpose is to combine multiple forecasters in a linear opinion pool (Genest and McConway, 1990;Geweke and Amisano, 2011). The originality of our technique is to use combination rules deriving from online learning techniques in order to minimize the CRPS of the weighted empirical distribution function. ...
... The last two terms are identical for all forecasters and appear due to terms 1 − M m=1 u m hidden in the expression of the CRPS, see Appendix B of Thorey et al. (2016). The loss gradient has two main terms: the distance of x m,t to y t and the weighted distance of x m,t to the ensemble members. ...
Article
We provide probabilistic forecasts of photovoltaic (PV) production, for several PV plants located in France up to 6 days of lead time, with a 30-min timestep. First, we derive multiple forecasts from numerical weather predictions (ECMWF and Météo France), including ensemble forecasts. Second, our parameter-free online learning technique generates a weighted combination of the production forecasts for each PV plant. The weights are computed sequentially before each forecast using only past information. Our strategy is to minimize the Continuous Ranked Probability Score (CRPS). We show that our technique provides forecast improvements for both deterministic and probabilistic evaluation tools.
Article
This paper describes a new seamless blended multi‐model ensemble configuration of an existing probabilistic medium‐ to extended‐range weather pattern forecasting tool (called Decider) run operationally at the Met Office. In its initial configuration, the tool calculated and presented probabilistic weather pattern forecast information for five individual ensemble forecasting systems, which varied in terms of their number of ensemble members, horizontal resolution, update frequencies and forecast lead time. This resulted in multiple forecasts for the same validity time which varied in terms of forecast skill depending on the lead time in question. This presented challenges for end‐users (e.g., operational meteorologists) in terms of knowing which forecast output is best to use and at which lead time, as well as knowing what to do in situations where forecasts varied between ensembles. To get around these challenges, a new seamless blended multi‐model ensemble configuration has been implemented operationally, comprising of output from five separate ensembles, and provides a single best forecast from day one out to day 45. Objective verification for a set of eight weather pattern groups covering forecasts initialized over a 6‐year period (2017–2022) shows that the seamless blended multi‐model ensemble forecasts are at least as good as, if not better than the best performing individual model.
Article
Full-text available
Data assimilation (DA) aims to optimally combine model forecasts and observations that are both partial and noisy. Multi‐model DA generalizes the variational or Bayesian formulation of the Kalman filter, and we prove that it is also the minimum variance linear unbiased estimator. Here, we formulate and implement a multi‐model ensemble Kalman filter (MM‐EnKF) based on this framework. The MM‐EnKF can combine multiple model ensembles for both DA and forecasting in a flow‐dependent manner; it uses adaptive model error estimation to provide matrix‐valued weights for the separate models and the observations. We apply this methodology to various situations using the Lorenz96 model for illustration purposes. Our numerical experiments include multiple models with parametric error, different resolved scales, and different fidelities. The MM‐EnKF results in significant error reductions compared to the best model, as well as to an unweighted multi‐model ensemble, with respect to both probabilistic and deterministic error metrics.
Article
The need of solar power uncertainty quantification in the power system has inspired probabilistic solar power forecasting. This paper proposes a novel multi-step parametric method for intra-day probabilistic solar power forecasting. First, statistical analysis on solar power distribution is done using four forecasting methods in real-world data. Fat tails are clearly found in solar power distribution, which could not be modelled by the widely-used normal distribution. In light of this discovery, two fat-tailed distributions, i.e., Laplace and two-sided power distributions, along with their generalized variants are then proposed to better model the conditional distribution of solar power output. Second, a recently proposed DeepAR model for time series probabilistic forecasting based on deep recurrent neural network is used to map various predictors into parameters of the fat-tailed distribution. Moreover, a novel loss function based on the continuous ranked probability score is proposed, and its closed-form formula over the proposed fat-tailed distributions is derived for efficient model training. Numerical results on public real-world data show that our method is very effective and the proposed model can provide intra-day probabilistic solar power forecasting with high quality and reliability.
Article
In this paper, we consider the problem of online probabilistic time series forecasting. The difference between a probabilistic prediction (distribution function) and a numerical outcome is measured using a loss function (scoring rule). In practical statistics, the Continuous Ranked Probability Score (CRPS) rule is often used to estimate the discrepancy between probabilistic predictions and quantitative outcomes. Here, we consider the case when several competing methods (experts) give their predictions in the form of distribution functions. Expert predictions are provided with confidence levels. We propose an algorithm for online aggregation of these distribution functions with allowance for the confidence levels to expert forecasts. The discounted error of the proposed algorithm with allowance for the confidence levels is estimated in the form of a comparison of the cumulative losses of the algorithm and the losses of experts. A technology for constructing predictive expert algorithms and aggregating their probabilistic predictions using the example of the problem of predicting electricity consumption 1 or more hours ahead was developed. The results of numerical experiments using real data are presented.
Article
Full-text available
An analog-based ensemble model output statistics (EMOS) is proposed to improve EMOS for the calibration of ensemble forecasts. Given a set of analog predictors and corresponding weights, which are optimized with a brute-force continuous ranked probability score (CRPS) minimization, forecasts similar to a current ensemble forecast (i.e., analogs) are searched. The best analogs and the corresponding observations form the training data set for estimating the EMOS coefficients. To test the new approach for renewable energy applications, wind speed measurements at 100-m height from six measurement towers and wind ensemble forecasts at 100-m height from the European Centre for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction System (EPS) are used. The analog-based EMOS is compared against EMOS, an adaptive and recursive wind vector calibration (AUV), and an analog ensemble applied to ECMWF EPS. It is shown that the analog-based EMOS outperforms EMOS, AUV, and the analog ensemble at all measurement sites in terms of CRPS and Brier score for common and rare events. The CRPS improvements relative to EMOS reach up to 11% and are statistically significant at almost all sites. The reliability of the analog-based EMOS ensemble for rare events is better compared to EMOS and AUV and similar compared to the analog ensemble.
Article
Full-text available
We consider the setting of sequential prediction of arbitrary sequences based on specialized experts. We first provide a review of the relevant literature and present two theoretical contributions: a general analysis of the specialist aggregation rule of Freund et al. (Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing (STOC), pp. 334–343, 1997) and an adaptation of fixed-share rules of Herbster and Warmuth (Mach. Learn. 32:151–178, 1998) in this setting. We then apply these rules to the sequential short-term (one-day-ahead) forecasting of electricity consumption; to do so, we consider two data sets, a Slovakian one and a French one, respectively concerned with hourly and half-hourly predictions. We follow a general methodology to perform the stated empirical studies and detail in particular tuning issues of the learning parameters. The introduced aggregation rules demonstrate an improved accuracy on the data sets at hand; the improvements lie in a reduced mean squared error but also in a more robust behavior with respect to large occasional errors.
Article
Full-text available
Online learning algorithms are fast, memory-efficient, easy to implement, and applicable to many prediction problems, including classification, regression, and ranking. Several online algorithms were proposed in the past few decades, some based on additive updates, like the Perceptron, and some on multiplicative updates, like Winnow. A unified viewpoint on the design and the analysis of online algorithms is provided by online mirror descent, a general prediction strategy from which most first-order algorithms can be obtained as special cases. We generalize online mirror descent to sequences of time-varying regularizers with generic updates. Unlike standard mirror descent, our more general formulation also captures second order algorithms, algorithms for composite losses, and algorithms for adaptive filtering. Moreover, we recover, and sometimes improve, known regret bounds by instantiating our analysis on specific regularizers. Finally, we show the power of our approach by deriving a new second order algorithm with a regret bound invariant with respect to arbitrary rescalings of individual features.
Article
Full-text available
Questions remain regarding how the skill of operational probabilistic forecasts is most usefully evaluated or compared, even though probability forecasts have been a long-standing aim in meteorological forecast- ing. This paper explains the importance of employing proper scores when selecting between the various measures of forecast skill. It is demonstrated that only proper scores provide internally consistent evalua- tions of probability forecasts, justifying the focus on proper scores independent of any attempt to influence the behavior of a forecaster. Another property of scores (i.e., locality) is discussed. Several scores are examined in this light. There is, effectively, only one proper, local score for probability forecasts of a continuous variable. It is also noted that operational needs of weather forecasts suggest that the current concept of a score may be too narrow; a possible generalization is motivated and discussed in the context of propriety and locality.
Book
Full-text available
This important new text and reference for researchers and students in machine learning, game theory, statistics and information theory offers the first comprehensive treatment of the problem of predicting individual sequences. Unlike standard statistical approaches to forecasting, prediction of individual sequences does not impose any probabilistic assumption on the data-generating mechanism. Yet, prediction algorithms can be constructed that work well for all possible sequences, in the sense that their performance is always nearly as good as the best forecasting strategy in a given reference class. The central theme is the model of prediction using expert advice, a general framework within which many related problems can be cast and discussed. Repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems are viewed as instances of the experts' framework and analyzed from a common nonstochastic standpoint that often reveals new and intriguing connections. Old and new forecasting methods are described in a mathematically precise way in order to characterize their theoretical limitations and possibilities.
Article
Full-text available
In this technical report I show that the Brier game of prediction is perfectly mixable and find the optimal learning rate and substitution function for it. These results are straightforward, but the computations are surprisingly messy. 1 Loss bound A game of prediction consists of three components: the observation space Ω, the decision space Γ, and the loss function λ: Ω × Γ → R. In this note we are interested in the following Brier game [1]: Ω is a finite and non-empty set, Γ: = P(Ω) is the set of all probability measures on Ω, and λ(ω, γ) = ∑ (γ{o} − δω{o}) 2, o∈Ω where δω ∈ P(Ω) is the probability measure concentrated at ω: δω{ω} = 1 and δω{o} = 0 for o ̸ = ω. The Brier game is being played repeatedly by a learner having access to decisions made by a pool of experts, which leads to the following prediction protocol: Prediction with expert advice for the Brier game
Article
This paper deals first with the relationship between the theory of probability and the theory of rational behaviour. A method is then suggested for encouraging people to make accurate probability estimates, a connection with the theory of imformation being mentioned. Finally Wald's theory of statistical decision functions is summarised and generalised and its relation to the theory of rational behaviour is discussed.
Article
Since a meteorologist's predictions are subjective, a framework for the evaluation of meteorological probability assessors must be consistent with the theory of subjective probability. Such a framework is described in this paper. First, two standards of “goodness,” one based upon normative considerations and one based upon substantive considerations, are proposed. Specific properties which a meteorologist's assessments should possess are identified for each standard. Then, several measures of “goodness,” or scoring rules, which indicate the extent to which such assessments possess certain properties, are described. Finally, several important uses of these scoring rules are considered.
Article
A probabilistic forecast takes the form of a predictive probability distribution over future quantities or events of interest. Probabilistic forecasting aims to maximize the sharpness of the predictive distributions, subject to calibration, on the basis of the available information set. We formalize and study notions of calibration in a prediction space setting. In practice, probabilistic calibration can be checked by examining probability integral transform (PIT) histograms. Proper scoring rules such as the logarithmic score and the continuous ranked probability score serve to assess calibration and sharpness simultaneously. As a special case, consistent scoring functions provide decision-theoretically coherent tools for evaluating point forecasts. We emphasize methodological links to parametric and nonparametric distributional regression techniques, which attempt to model and to estimate conditional distribution functions; we use the context of statistically postprocessed ensemble forecasts in numerical weather prediction as an example. Throughout, we illustrate concepts and methodologies in data examples.
Article
The continuous ranked probability score (CRPS) is a frequently used scoring rule. In contrast with many other scoring rules, the CRPS evaluates cumulative distribution functions. An ensemble of forecasts can easily be converted into a piecewise constant cumulative distribution function with steps at the ensemble members. This renders the CRPS a convenient scoring rule for the evaluation of ‘raw’ ensembles, obviating the need for sophisticated ensemble model output statistics or dressing methods prior to evaluation. In this article, a relation between the CRPS score and the quantile score is established. The evaluation of ‘raw’ ensembles using the CRPS is discussed in this light. It is shown that latent in this evaluation is an interpretation of the ensemble as quantiles but with non-uniform levels. This needs to be taken into account if the ensemble is evaluated further, for example with rank histograms. Copyright © 2012 Royal Meteorological Society
Article
The notion of fair scores for ensemble forecasts was introduced recently to reward ensembles with members that behave as though they and the verifying observation are sampled from the same distribution. In the case of forecasting binary outcomes, a characterization is given of a general class of fair scores for ensembles that are interpreted as random samples. This is also used to construct classes of fair scores for ensembles that forecast multicategory and continuous outcomes. The usual Brier, ranked probability and continuous ranked probability scores for ensemble forecasts are shown to be unfair, while adjusted versions of these scores are shown to be fair. A definition of fairness is also proposed for ensembles with members that are interpreted as being dependent and it is shown that fair scores exist only for some forms of dependence.
Article
Evaluation is important for improving climate prediction systems and establishing the credibility of their predictions of the future. This paper shows how the choices that must be made about how to evaluate predictions affect the outcome and ultimately our view of the prediction system's quality. The aim of evaluation is to measure selected attributes of the predictions, but some attributes are susceptible to having their apparent performance artificially inflated by the presence of climate trends, thus rendering past performance an unreliable indicator of future performance. We describe a class of performance measures that are immune to such spurious skill. The way in which an ensemble prediction is interpreted also has strong implications for the apparent performance, so we give recommendations about how evaluation should be tailored to different interpretations. Finally, we explore the role of the timescale of the predictand in evaluation and suggest ways to describe the relationship between timescale and performance. The ideas in this paper are illustrated using decadal temperature hindcasts from the CMIP5 archive. (c) 2013 The Authors. Meteorological Applications published by John Wiley & Sons Ltd on behalf of the Royal Meteorological Society.
Article
Bayesian model averaging (BMA) is a statistical postprocessing technique that generates calibrated and sharp predictive probability density functions (PDFs) from forecast ensembles. It represents the predictive PDF as a weighted average of PDFs centered on the bias-corrected ensemble members, where the weights reflect the relative skill of the individual members over a training period. This work adapts the BMA approach to situations that arise frequently in practice; namely, when one or more of the member forecasts are exchangeable, and when there are missing ensemble members. Exchangeable members differ in random perturbations only, such as the members of bred ensembles, singular vector ensembles, or ensemble Kalman filter systems. Accounting for exchangeability simplifies the BMA approach, in that the BMA weights and the parameters of the component PDFs can be assumed to be equal within each exchangeable group. With these adaptations, BMA can be applied to postprocess multimodel ensembles of any composition. In experiments with surface temperature and quantitative precipitation forecasts from the University of Washington mesoscale ensemble and ensemble Kalman filter systems over the Pacific Northwest, the proposed extensions yield good results. The BMA method is robust to exchangeability assumptions, and the BMA postprocessed combined ensemble shows better verification results than any of the individual, raw, or BMA postprocessed ensemble systems. These results suggest that statistically postprocessed multimodel ensembles can outperform individual ensemble systems, even in cases in which one of the constituent systems is superior to the others.
Article
The generation of a probabilistic view of dynamical weather prediction is traced back to the early 1950s, to that point in time when deterministic short-range numerical weather prediction (NWP) achieved its earliest success. Eric Eady was the first meteorologist to voice concern over strict determinism—that is, a future determined by the initial state without account for uncertainties in that state. By the end of the decade, Philip Thompson and Edward Lorenz explored the predictability limits of deterministic forecasting and set the stage for an alternate view—a stochastic–dynamic view that was enunciated by Edward Epstein. The steps in both operational short-range NWP and extended-range forecasting that justified a coupling between probability and dynamical law are followed. A discussion of the bridge from theory to practice follows, and the study ends with a genealogy of ensemble forecasting as an outgrowth of traditions in the history of science.
Article
Some time ago, the continuous ranked probability score (CRPS) was proposed as a new verification tool for (probabilistic) forecast systems. Its focus is on the entire permissible range of a certain (weather) parameter. The CRPS can be seen as a ranked probability score with an infinite number of classes, each of zero width. Alternatively, it can be interpreted as the integral of the Brier score over all possible threshold values for the parameter under consideration. For a deterministic forecast system the CRPS reduces to the mean absolute error. In this paper it is shown that for an ensemble prediction system the CRPS can be decomposed into a reliability part and a resolution/uncertainty part, in a way that is similar to the decomposition of the Brier score. The reliability part of the CRPS is closely connected to the rank histogram of the ensemble, while the resolution/uncertainty part can be related to the average spread within the ensemble and the behavior of its outliers. The usefulness of such a decomposition is illustrated for the ensemble prediction system running at the European Centre for Medium-Range Weather Forecasts. The evaluation of the CRPS and its decomposition proposed in this paper can be extended to systems issuing continuous probability forecasts, by realizing that these can be interpreted as the limit of ensemble forecasts with an infinite number of members.
Article
No abstract available.
Article
Four recent papers have investigated the effects of ensemble size on the Brier score (BS) and discrete ranked probability score (RPS) attained by ensemble-based probabilistic forecasts. The connections between these papers are described and their results are generalized. In particular, expressions, explanations and estimators for the expected effect of ensemble size on the RPS and continuous ranked probability score (CRPS) are obtained. Copyright © 2008 Royal Meteorological Society
Article
Ensemble prediction systems typically show positive spread-error correlation, but they are subject to forecast bias and dispersion errors, and are therefore uncalibrated. This work proposes the use of ensemble model output statistics (EMOS), an easy-to-implement postprocessing technique that addresses both fore-cast bias and underdispersion and takes into account the spread-skill relationship. The technique is based on multiple linear regression and is akin to the superensemble approach that has traditionally been used for deterministic-style forecasts. The EMOS technique yields probabilistic forecasts that take the form of Gaussian predictive probability density functions (PDFs) for continuous weather variables and can be applied to gridded model output. The EMOS predictive mean is a bias-corrected weighted average of the ensemble member forecasts, with coefficients that can be interpreted in terms of the relative contributions of the member models to the ensemble, and provides a highly competitive deterministic-style forecast. The EMOS predictive variance is a linear function of the ensemble variance. For fitting the EMOS coefficients, the method of minimum continuous ranked probability score (CRPS) estimation is introduced. This tech-nique finds the coefficient values that optimize the CRPS for the training data. The EMOS technique was applied to 48-h forecasts of sea level pressure and surface temperature over the North American Pacific Northwest in spring 2000, using the University of Washington mesoscale ensemble. When compared to the bias-corrected ensemble, deterministic-style EMOS forecasts of sea level pressure had root-mean-square error 9% less and mean absolute error 7% less. The EMOS predictive PDFs were sharp, and much better calibrated than the raw ensemble or the bias-corrected ensemble.
Article
Sequential aggregation is an ensemble forecasting approach that weights each ensemble member based on past observations and past forecasts. This approach has sev-eral limitations: the weights are computed only at the locations and for the variables that are observed, and the observational errors are typically not accounted for. This paper introduces a way to address these limitations by coupling sequential aggregation and data assimilation. The leading idea of the proposed approach is to have the aggregation pro-cedure forecast the forthcoming analyses, produced by a data assimilation method, in-stead of forecasting the observations. The approach is therefore referred to as ensemble forecasting of analyses. The analyses, which are supposed to be the best a posteriori knowl-edge of the model's state, adequately take into account the observational errors and they are naturally multivariable and distributed in space. The aggregation algorithm theo-retically guarantees that, in the long run and for any component of the model's state, the ensemble forecasts approximate the analyses at least as well as the best constant (in time) linear combination of the ensemble members. In this sense, the ensemble forecasts of the analyses optimally exploit the information contained in the ensemble. The method is tested for ground-level ozone forecasting, over Europe during the full year 2001, with a twenty-member ensemble. In this application, the method proves to perform well with 28% reduction in RMSE compared to a reference simulation, to be robust in time and space, and to reproduce many spatial patterns found in the analyses only.
Article
Ensembles used for probabilistic weather forecasting often exhibit a spread-error correlation, but they tend to be underdispersive. This paper proposes a statistical method for postprocessing ensembles based on Bayesian model averaging (BMA), which is a standard method for combining predictive distributions from different sources. The BMA predictive probability density function (PDF) of any quantity of interest is a weighted average of PDFs centered on the individual bias-corrected forecasts, where the weights are equal to posterior probabilities of the models generating the forecasts and reflect the models' relative contributions to predictive skill over the training period. The BMA weights can be used to assess the usefulness of ensemble members, and this can be used as a basis for selecting ensemble members; this can be useful given the cost of running large ensembles. The BMA PDF can be represented as an unweighted ensemble of any desired size, by simulating from the BMA predictive distribution. The BMA predictive variance can be decomposed into two components, one corresponding to the between-forecast variability, and the second to the within-forecast variability. Predictive PDFs or intervals based solely on the ensemble spread incorporate the first component but not the second. Thus BMA provides a theoretical explanation of the tendency of ensembles to exhibit a spread-error correlation but yet be underdispersive. The method was applied to 48-h forecasts of surface temperature in the Pacific Northwest in January–June 2000 using the University of Washington fifth-generation Pennsylvania State University–NCAR Mesoscale Model (MM5) ensemble. The predictive PDFs were much better calibrated than the raw ensemble, and the BMA forecasts were sharp in that 90% BMA prediction intervals were 66% shorter on average than those produced by sample climatology. As a by-product, BMA yields a deterministic point forecast, and this had root-mean-square errors 7% lower than the best of the ensemble members and 8% lower than the ensemble mean. Similar results were obtained for forecasts of sea level pressure. Simulation experiments show that BMA performs reasonably well when the underlying ensemble is calibrated, or even overdispersed.
Article
A systematic study is performed of a number of scores that can be used for objective validation of probabilistic prediction of scalar variables: Rank Histograms, Discrete and Continuous Ranked Probability Scores (DRPS and CRPS respectively). The reliability-resolution-uncertainty decomposition, defined by Murphy (1972a and 1972b) for the DRPS, and extented here to the CRPS, is studied in detail. The decomposition is applied to the results of the Ensemble Prediction Systems of ECMWF and NCEP. Comparison is made with the decomposition of the CRPS defined by Hersbach (2000). The possibility of determining an accurate reliability-resolution decomposition of the RPS's is severely limited by the unavoidably (relatively) small number of available realizations of the prediction system. The Hersbach decomposition may be an appriopriate compromise between the competing needs for accuracy and practical computability.
Article
Scoring rules are an important tool for evaluating the performance of probabilistic forecasting schemes. A scoring rule is called strictly proper if its expectation is optimal if and only if the forecast probability represents the true distribution of the target. In the binary case, strictly proper scoring rules allow for a decomposition into terms related to the resolution and the reliability of a forecast. This fact is particularly well known for the Brier Score. In this article, this result is extended to forecasts for finite-valued targets. Both resolution and reliability are shown to have a positive effect on the score. It is demonstrated that resolution and reliability are directly related to forecast attributes that are desirable on grounds independent of the notion of scores. This finding can be considered an epistemological justification of measuring forecast quality by proper scoring rules. A link is provided to the original work of DeGroot and Fienberg, extending their concepts of sufficiency and refinement. The relation to the conjectured sharpness principle of Gneiting, et al., is elucidated. Copyright © 2009 Royal Meteorological Society
Article
An analogue of the linear continuous ranked probability score is introduced that applies to probabilistic forecasts of circular quantities, such as wind direction. This scoring rule is proper and thereby discourages hedging. The circular continuous ranked probability score reduces to angular distance when the forecast is deterministic, just as the linear continuous ranked probability score generalizes the absolute error. Furthermore, the circular continuous ranked probability score provides a direct way of comparing deterministic forecasts, discrete forecast ensembles, and post‐processed forecast ensembles that can take the form of circular probability density functions. The circular continuous ranked probability score is used in this study to compare predictions of 10 m wind direction for 361 cases of mesoscale, short‐range ensemble forecasts over the North American Pacific Northwest. Simple, calibrated probability forecasts based on the ensemble mean and its forecast error history over the period outperform probability forecasts constructed directly from the ensemble sample statistics. These results suggest that short‐term forecast uncertainty is not yet well predicted at mesoscale resolutions near the surface, despite the inclusion of multi‐scheme physics diversity and surface boundary parameter perturbations in the mesoscale ensemble design. Copyright © 2006 Royal Meteorological Society
Article
Motivated by a broad range of potential applications, we address the quantile prediction problem of real-valued time series. We present a sequential quantile forecasting model based on the combination of a set of elementary nearest neighbor-type predictors called “experts” and show its consistency under a minimum of conditions. Our approach builds on the methodology developed in recent years for prediction of individual sequences and exploits the quantile structure as a minimizer of the so-called pinball loss function. We perform an in-depth analysis of real-world data sets and show that this nonparametric strategy generally outperforms standard quantile prediction methods.
Article
Numerical weather prediction models as well as the atmosphere itself can be viewed as nonlinear dynamical systems in which the evolution depends sensitively on the initial conditions. The fact that estimates of the current state are inaccurate and that numerical models have inadequacies, leads to forecast errors that grow with increasing forecast lead time. The growth of errors depends on the flow itself. Ensemble forecasting aims at quantifying this flow-dependent forecast uncertainty.The sources of uncertainty in weather forecasting are discussed. Then, an overview is given on evaluating probabilistic forecasts and their usefulness compared with single forecasts. Thereafter, the representation of uncertainties in ensemble forecasts is reviewed with an emphasis on the initial condition perturbations. The review is complemented by a detailed description of the methodology to generate initial condition perturbations of the Ensemble Prediction System (EPS) of the European Centre for Medium-Range Weather Forecasts (ECMWF). These perturbations are based on the leading part of the singular value decomposition of the operator describing the linearised dynamics over a finite time interval. The perturbations are flow-dependent as the linearisation is performed with respect to a solution of the nonlinear forecast model.The extent to which the current ECMWF ensemble prediction system is capable of predicting flow-dependent variations in uncertainty is assessed for the large-scale flow in mid-latitudes.
Article
A standard approach to the combination of probabilistic opinions involves taking a weighted linear average of the individuals distributions. This paper reviews some of the possible interpretations that have been proposed for these weights in the literature on expert use. Several paradigms for selecting weights are also considered. Special attention is devoted to the Bayesian mechanism used for updating expert weights in the face of new information. An asymptotic result is proved which highlights the importance of choosing the initial weights carefully.
Article
We apply machine learning algorithms to perform sequential aggregation of ozone forecasts. The latter rely on a multimodel ensemble built for ozone forecasting with the modeling system Polyphemus. The ensemble simulations are obtained by changes in the physical parameterizations, the numerical schemes, and the input data to the models. The simulations are carried out for summer 2001 over western Europe in order to forecast ozone daily peaks and ozone hourly concentrations. On the basis of past observations and past model forecasts, the learning algorithms produce a weight for each model. A convex or linear combination of the model forecasts is then formed with these weights. This process is repeated for each round of forecasting and is therefore called sequential aggregation. The aggregated forecasts demonstrate good results; for instance, they always show better performance than the best model in the ensemble and they even compete against the best constant linear combination. In addition, the machine learning algorithms come with theoretical guarantees with respect to their performance, that hold for all possible sequences of observations, even nonstochastic ones. Our study also demonstrates the robustness of the methods. We therefore conclude that these aggregation methods are very relevant for operational forecasts.
Article
Using Bayesian Model Averaging, we examine whether inflation's effects on economic growth are robust to model uncertainty across numerous specifications. Cross-sectional data provide little evidence of a robust inflation-growth relationship, even after allowing for non-linear effects. Panel data with fixed effects suggest inflation is one of the more robust variables affecting growth, and non-linear results suggest that high inflation observations drive the results. However, this robustness is lost when estimation is carried out with instrumental variables.
Article
We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG(+/-). They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG(+/-) algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG(+/-) and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments which show that our worst-case upper bounds are quite tight already on simple artificial data. (C) 1997 Academic Press.
Article
We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when n denotes the size of the training data, we provide minimax convergence rates of the form C ([log |G|]/n)^v with tight evaluation of the positive constant C and with exact v in ]0;1], the latter value depending on the convexity of the loss function and on the level of noise in the output distribution. The risk upper bounds are based on a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. Our analysis puts forward the links between the probabilistic and worst-case viewpoints, and allows to obtain risk bounds unachievable with the standard statistical learning approach. One of the key idea of this work is to use probabilistic inequalities with respect to appropriate (Gibbs) distributions on the prediction function space instead of using them with respect to the distribution generating the data. The risk lower bounds are based on refinements of the Assouad's lemma taking particularly into account the properties of the loss function. Our key example to illustrate the upper and lower bounds is to consider the L_q-regression setting for which an exhaustive analysis of the convergence rates is given while q describes [1;+infinity[.
Verification of forecasts expressed in terms of probability
  • Brier
Combining probability distributions from experts in risk analysis
  • Clemen
Combining probability forecasts
  • Ranjan
Elicitation of personal probabilities and expectations
  • Savage