Article

Validity of Climate Change Forecasting for Public Policy Decision Making

Abstract and Figures

Policymakers need to know whether prediction is possible and, if so, whether any proposed forecasting method will provide forecasts that are substantially more accurate than those from the relevant benchmark method. An inspection of global temperature data suggests that temperature is subject to irregular variations on all relevant time scales, and that variations during the late 1900s were not unusual. In such a situation, a "no change" extrapolation is an appropriate benchmark forecasting method. We used the UK Met Office Hadley Centre's annual average thermometer data from 1850 through 2007 to examine the performance of the benchmark method. The accuracy of forecasts from the benchmark is such that even perfect forecasts would be unlikely to help policymakers. For example, mean absolute errors for the 20- and 50-year horizons were 0.18 Â oC and 0.24 Â oC respectively. We nevertheless demonstrate the use of benchmarking with the example of the Intergovernmental Panel on Climate Change's 1992 linear projection of long-term warming at a rate of 0.03 Â oC per year. The small sample of errors from ex ante projections at 0.03 Â oC per year for 1992 through 2008 was practically indistinguishable from the benchmark errors. Validation for long-term forecasting, however, requires a much longer horizon. Again using the IPCC warming rate for our demonstration, we projected the rate successively over a period analogous to that envisaged in their scenario of exponential CO2 growth--the years 1851 to 1975. The errors from the projections were more than seven times greater than the errors from the benchmark method. Relative errors were larger for longer forecast horizons. Our validation exercise illustrates the importance of determining whether it is possible to obtain forecasts that are more useful than those from a simple benchmark before making expensive policy decisions.
No caption available
… 
No caption available
… 
No caption available
… 
No caption available
… 
Content may be subject to copyright.
Validity of Climate Change Forecasting for Public Policy Decision Making
Kesten C. Green
Business and Economic Forecasting, Monash University, Vic 3800, Australia.
Contact: PO Box 10800, Wellington 6143, New Zealand.
kesten@kestencgreen.com; T +64 4 976 3245; F +64 4 976 3250
J. Scott Armstrong
The Wharton School, University of Pennsylvania
747 Huntsman, Philadelphia, PA 19104
armstrong@wharton.upenn.edu; jscottarmstrong.com; T +1 610 622 6480
Willie Soon
Harvard-Smithsonian Center for Astrophysics, Cambridge MA 02138
wsoon@cfa.harvard.edu; T +1 617 495 7488
February 24, 2009
ABSTRACT
Policymakers need to know whether prediction is possible and if so whether any proposed
forecasting method will provide forecasts that are substantively more accurate than those from the
relevant benchmark method. Inspection of global temperature data suggests that it is subject to
irregular variations on all relevant time scales and that variations during the late 1900s were not
unusual. In such a situation, a “no change” extrapolation is an appropriate benchmark forecasting
method. We used the U.K. Met Office Hadley Centre’s annual average thermometer data from
1850 through 2007 to examine the performance of the benchmark method. The accuracy of
forecasts from the benchmark is such that even perfect forecasts would be unlikely to help
policymakers. For example, mean absolute errors for 20- and 50-year horizons were 0.18°C and
0.24°C. We nevertheless demonstrate the use of benchmarking with the example of the
Intergovernmental Panel on Climate Change’s 1992 linear projection of long-term warming at a
rate of 0.03°C-per-year. The small sample of errors from ex ante projections at 0.03°C-per-year
for 1992 through 2008 was practically indistinguishable from the benchmark errors. Validation
for long-term forecasting, however, requires a much longer horizon. Again using the IPCC
warming rate for our demonstration, we projected the rate successively over a period analogous to
that envisaged in their scenario of exponential CO2 growth—the years 1851 to 1975. The errors
from the projections were more than seven times greater than the errors from the benchmark
method. Relative errors were larger for longer forecast horizons. Our validation exercise
illustrates the importance of determining whether it is possible to obtain forecasts that are more
useful than those from a simple benchmark before making expensive policy decisions.
Key words: climate model, ex ante forecasts, out-of-sample errors, predictability, public policy,
relative absolute errors, unconditional forecasts.
Introduction
We examine procedures that should be used to evaluate forecasts of global mean temperatures
over the policy-relevant long term. A necessary condition for using forecasts to inform public
policy decisions is evidence that the proposed forecasting procedure can provide ex ante forecasts
that are substantively more accurate than those from a simple benchmark model. By ex ante
forecasts, we mean forecasts for periods that were not taken into account when the forecasting
model was developed. 1
Benchmark errors provide a standard by which to determine whether alternative scientifically-
based forecasting methods can provide useful forecasts. When benchmark errors are large, it is
possible that alternative methods would provide useful forecasts. When benchmark errors are
small, it is less likely that other methods would provide improvements in accuracy that would be
useful to decision makers.
An appropriate benchmark model
Exhibit 1 displays Antarctic temperature data from the ice-core record for the 800,000 years up to
1950. The temperatures are relative to the average for the last one-thousand-years of the record
(950 to 1950 AD), in degrees Celsius. The data show large irregular variations and no obvious
trend. For such data the no-change forecasting model is an appropriate benchmark.
INSERT EXHIBIT 1 ABOUT HERE
800,000-year Record of Antarctic Temperature Change

1Theabilityofamodeltofittimeseriesdatabearslittlerelationshiptoitsabilitytoforecast;a
findingthathasoftenpuzzledresearchers(Armstrong2001,pp.460‐462).
Performance of the benchmark model
We used the Hadley (HadCRUt3) “best estimate” annual average temperature differences from
1850 to 2007 from the U.K. Met Office Hadley Centre2 to examine the benchmark errors for
global mean temperatures (Exhibit 23) over policy-relevant forecasting horizons.
INSERT EXHIBIT 2
Errors from the benchmark model
We used each year’s mean global temperature as a forecast of each subsequent year in the future
and calculated the errors relative to the measurements for those years. For example, the year 1850
temperature measurement from Hadley was our forecast of the average temperature for each year
from 1851 through 1950. We calculated the differences between this benchmark forecast and the
Hadley measurement for each year of this 100-year forecast horizon. In this way we obtained
from the Hadley data 157 error estimates for one-year-ahead forecasts, 156 for two-year-ahead
forecasts, and so on up to 58 error estimates for 100-year-ahead forecasts; a total of 10,750
forecasts across all horizons
Exhibit 3 shows that mean absolute errors from our benchmark model increased from less than
0.1°C for one-year-ahead forecasts to less than 0.4°C for 100-year-ahead forecasts. Maximum
absolute errors increased from slightly more than 0.3°C for one-year-ahead forecasts to less than
1.0°C for 100-year-ahead forecasts.

2 Obtained from http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual on 9 October, 2008.
3 Exhibit 2 has been updated to include the 2008 figure.
Overwhelmingly, errors were no-more-than 0.5°C, as shown in Exhibit 4. For horizons less than
65-years, fewer than one-in-eight of our ex-ante forecasts were more than 0.5°C different from
the Hadley measurement. All forecasts for horizons up to 80 years and more than 95% of
forecasts for horizons from 81 to 100-years-ahead were within 1°C of the Hadley figure. The
overall maximum error from all 10,750 forecasts for all horizons was 1.08°C (from an 87-year-
ahead forecast for 1998).
INSERT EXHIBIT 3
INSERT EXHIBIT 4

Performance of Intergovernmental Panel on Climate Change projections
Since the benchmark model performs so well it is hard to determine what additional benefits
public policymakers would get from a better forecasting model. Governments did however, via
the United Nations, establish the IPCC to search for a better model. The IPCC projections provide
an opportunity to illustrate the use of the benchmark. Our intent in this paper is not to assess what
might be the true state of the world; rather it is to illustrate proper validation by testing the IPCC
projections against the benchmark model.
We used the IPCC’s 1992 projection, which was an update of their 1990 projection, for our
demonstration. The 1992 projection was for a linear increase of 0.03°C per year (IPCC 1990 p.
xi, IPCC 1992 p.17).
The IPCC 1992 projections were based on the judgments of the IPCC report’s authors and the
process they used was not specified in such a way that it would be replicable. We nevertheless
used the IPCC projection because it has had a major influence on policymakers, coming out as it
did in time for the Rio Earth Summit, which produced inter alia Agenda 21 and the United
Nations Framework Convention on Climate Change. According to the United Nations webpage
on the Summit 4, “The Earth Summit influenced all subsequent UN conferences…”.
To test any forecasting method, it is necessary to exclude data that were used to develop the
model; that is, the testing must be done using out-of-sample data. The most obvious out-of-
sample data are the observations that occurred after the forecast was made. By using the IPCC’s
1992 projection, we were able to conduct a longer ex ante forecasting test than if we had used
projections from later IPCC reports.
Evaluation method
We followed the procedure that we had used for our benchmark model and calculated absolute
errors as the unsigned difference between the IPCC 1992 projection and the Hadley figure for the
same year. We then compared these IPCC projection errors with forecast errors from the
benchmark model using the cumulative relative absolute error or CumRAE (Armstrong and
Collopy 1992).
The CumRAE is the sum across all forecast horizons of the errors (ignoring signs) from the
method being evaluated divided by the equivalent sum of benchmark errors. For example, a
CumRAE of 1.0 would indicate that the evaluated-method errors and benchmark errors came to
the same total while a figure of 0.8 would indicate that the sum of evaluated-method errors was
20% lower than the um of benchmark errors.
We are concerned about forecasting accuracy by forecast horizon and so calculated error scores
for each horizon, and then averaged across the horizons. Thus, the CumRAEs we report are the
cumulated sum of the mean absolute errors across horizons divided by the equivalent sum of
benchmark errors.
4 http://www.un.org/geninfo/bp/enviro.html
Forecasts from 1992 through 2008 using 1992 IPCC projected warming rate
We created an IPCC projection series from 1992 to 2008 by starting with the 1991 Hadley figure
and adding 0.03°C per year. It was also possible to test the IPCC projected warming rate against
the University of Alabama at Huntsville’s (UAH) data on global near surface temperature
measured from satellites using microwave sounding units. These data are available from 1979. To
do that, we created another projection series by starting with the 1991 UAH figure.
Benchmark forecasts for the two series were based on the 1991 Hadley and UAH temperatures,
respectively, for all years. This process, by including estimates for 2008 from both sources, gave
us two small samples of 17 years of out-of-sample forecasts. When tested against Hadley
measures, IPCC errors were essentially the same as those from our benchmark forecasts
(CumRAE 0.98); they were nearly twice as large (CumRAE 1.82) when tested against the UAH
satellite measures.
We also employed successive forecasting by using each year of the Hadley data from 1991 to
2007 in turn as the base from which to forecast from one to 17 years ahead. We obtained a total
of 136 forecasts from each of the 1992 IPCC projected warming rate and the benchmark model
over horizons from one to 17 years. We found that averaged across all 17 forecast horizons, the
1992 IPCC projected warming rate errors for the period 1992 to 2008 were 16% smaller than
forecast errors from our benchmark as the CumRAE was 0.84.
We repeated the successive forecasting test using UAH data. The 1992 IPCC projected warming
rate errors for the period 1992 to 2008 were 5% smaller than forecast errors from our benchmark
(CumRAE 0.95).
Assessed against the UAH data, the average of the mean errors for all 17 horizons was 0.215°C
for rolling forecasts from the benchmark model and 0.203°C for the IPCC projected warming
rate. The IPCC projections thus provided an error reduction of 0.012°C for this small sample of
short-horizon forecasts. The difference of 0.012°C is too small to be of any practical interest.
The concern of policymakers is with long-term climate forecasting, and the ex ante analysis we
have described was limited to a small sample of short-horizon projections. To address these
limitations, we calculated rolling projections from 1851 to illustrate a proper validation
procedure.
Forecasts from 1851 through 1975 using 1992 IPCC projected warming rate
Dangerous manmade global warming became an issue of public concern after NASA scientist
James Hansen testified on the subject to the U.S. Congress on June 23, 1988 (McKibben 2007)
after a 13-year period from 1975 over which global temperature estimates were up more than they
were down. The IPCC (2007) authors explained however, that “Global atmospheric
concentrations of carbon dioxide, methane and nitrous oxide have increased markedly as a result
of human activities since 1750” (p. 2). There have even been claims that human activity has been
causing global warming for at least 5,000 years (Bergquist 2008).
It is not unreasonable, then, to suppose for the purposes of our validation illustration that
scientists in 1850 had noticed that the increasing industrialization of the world was resulting in
exponential growth in “greenhouse gases” and to project that this would lead to global warming
of 0.03°C per year.
We used the Hadley data from the beginning of the series in 1850 through to 1975 to illustrate the
testing procedure. The period is not strictly out-of-sample, however, in that the IPCC authors
knew in retrospect that there had been a broadly upward trend in the Hadley temperature series.
From 1850 to 1974 there were 66 years in which the temperature increased from the previous
year and 59 in which it declined. There is some positive trend so the benchmark is disadvantaged
for the period under consideration. As shown in Exhibit 1, the temperature variations shown by
the longer temperature series suggest that there is no assurance that the irregular trend observed in
retrospect will continue in the future.
We first created a single forecast series by adding the 1992 IPCC projected warming rate of
0.03°C to the previous year’s figure, starting with the 1850 Hadley figure, and repeating the
process for each year through to 1975. Our benchmark forecast was equal to the 1850 Hadley
figure for all years. This process provided forecast data for each of the 125 years. The warming-
rate projection errors totaled more than ten times the benchmark errors (CumRAE 10.1).
We then successively used each year from 1850 to 1974 as the base from which to forecast from
one up to 100 years ahead using the 1992 IPCC projected warming rate and the benchmark
model. This yielded a total of 7,550 forecasts covering the period 1851 to 1975. Across all
horizons, the projection errors for the period were more than seven times greater than errors from
our benchmark (CumRAE 7.67). The relative errors increased rapidly with the horizon. For
example, for horizons one through ten the CumRAE was 1.45, while for horizons 41 through 50 it
was 6.77 and for horizons 91 through 100 it was 12.6.
Discussion
We have illustrated how to validate a forecast. There are other reasonable validation tests for
global mean temperatures. For example, one reviewer argued that the relevant forecasts for
climate change are for decades or longer periods. For decadal forecasts, the appropriate
benchmark forecast is that the decades ahead will be the same as the decade just gone. The mean
absolute error of a rolling one-decade-ahead benchmark forecast, calculated using the entire
Hadley series from 1850 to 2007, was 0.104°C. The Mean Absolute Error (MAE) for five
decades ahead was 0.198°C, and for 10 decades ahead was 0.345°C. The decadal benchmark
errors are smaller than the annual errors.
Validation tests should properly be conducted on forecasts from evidence-based forecasting
procedures. The models should be clearly specified, fully-disclosed, and replicable. The
conditions under which the forecasts apply should be described.
Speculation is not sufficient for forecasting. The belief that “things have changed” and the future
cannot be judged by the past is common, but invalid. The 1980 bet between Julian Simon and
Paul Ehrlich on the 1990 price of resources was a high-profile example. Ehrlich espoused the
Malthusian view that the human population’s demands had, or soon would, outstrip the resources
of the Earth. Simon’s position was that real resource prices had fallen over human history and
that there were good reasons why this was so; the fundamental reason being ingenuity. It was
therefore a mistake, Simon maintained, to extrapolate recent price increases. Ehrlich dictated the
terms of the bet: a ten-year period and the five commodity metals copper, chromium, nickel, tin,
and tungsten. The metals were selected with the help of energy and resource experts John Harte
and John P. Holdren. All five commodities fell in price over the ten-year period, and Simon won
the bet (Tierney 1990).
To base public policy decisions on forecasts of global mean temperature one would have to show
that changes are forecastable over policy-relevant horizons and that a valid evidence-based
forecasting procedure would provide usefully more accurate forecasts than those from the “no
change” benchmark model.
We did not address the issue of forecasting the net benefit or cost of any climate change that
might be predicted. Here again one would need to establish a benchmark forecast, presumably a
model assuming that changes in either direction would have no net effects. Researchers who have
examined this issue are not in agreement on what is the optimum temperature.
Finally, success in forecasting climate change and the effects of climate change must then be
followed by valid forecasts of the effects of alternative policies. And, again, one would need
benchmark forecasts; presumably based on an assumption of taking no action, as that is typically
the least costly.
The problem is a complex one. A failure at any of one of the three stages of forecasting—
temperature change, impacts of changes, and impacts of alternative policies—would imply that
climate change policies have no scientific basis.
Conclusions
Global mean temperatures were found to be remarkably stable over policy-relevant horizons. The
benchmark forecast is that the global mean temperature for each year for the rest of this century
will be within 0.5°C of the 2008 figure.
There is little room for improving the accuracy of forecasts from our benchmark model. In fact, it
is questionable whether practical benefits could be gained by obtaining perfect forecasts. While
the Hadley temperature data shown in Exhibit 2 shows an upwards drift over the last century or
so, the longer series in Exhibit 1 shows that such trends can occur naturally over long periods
before reversing. Moreover there is some concern that the upward trend observed over the last
century and half might be at least in part an artifact of measurement errors rather than a genuine
global warming (McKitrick and Michaels 2007). Even if one puts these reservations aside, our
analysis shows that errors from the benchmark forecasts would have been so small that they
would not have been of concern to decision makers who relied on them.
Acknowledgements
We thank the nine people who reviewed the paper for us at different stages of its development
and the two anonymous reviewers for their many helpful comments and suggestions. We also
thank Michael Guth for his useful suggestions on the writing.
REFERENCES
Armstrong, J.S. (2001), Evaluating forecasting models,” Principles of Forecasting. Kluwer Academic
Publishers: Boston.
Armstrong, J. S., & Collopy, F. (1992). Error measures for generalizing about forecasting methods:
Empirical comparisons. International Journal of Forecasting, 8, 69-80.
Bergquist, L. (2008). Humans started causing global warming 5,000 years ago, UW study says. Journal
Sentinel, posted 17 December, http://www.jsonline.com/news/education/36279759.html
Green, K.C., & Armstrong, J.S. (2007). Global warming: Forecasts by scientists versus scientific forecasts,
Energy & Environment, 18, 997-1022.
IPCC (1990). Climate Change: The IPCC Scientific Assessment. Edited by J.T. Houghton, G.J. Jenkins,
and J.J. Ephraums. Cambridge University Press: Cambridge, United Kingdom.
IPCC (1992). Climate Change 1992: The Supplementary Report to the IPCC Scientific Assessment. Edited
by J.T. Houghton, B.A. Callander, and S.K. Varney. Cambridge University Press: Cambridge, United
Kingdom.
IPCC (2007). Summary for Policymakers, in Climate Change 2007: The Physical Science Basis.
Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on
Climate Change [Solomon, S., D. Qin, M. Manning, Z. Chen, M. Marquis, K.B. Averyt, M.Tignor and
H.L. Miller (eds.)]. Cambridge University Press, Cambridge, U.K. and New York, NY, USA.
McKibben, W. (2007). Warning on warming. New York Review of Books, 54, 15 March.
McKitrick, R., & Michaels, P. J. (2007). Quantifying the influence of anthropogenic surface processes and
inhomogeneities on gridded global climate data. Journal of Geophysical Research, 112,
doi:10.1029/2007JD008465.
Tierney, J. (1990). Betting the planet. New York Times, December 2.
... They do not begin to describe the real world that we live in". Green et al. (2009) tested whether the warming-trend forecasts used by the IPCC are more accurate than the standard benchmark forecast that there will be no change, using the historical HadCRUT3 observed dataset, which exhibited clear global warming till the present years. To their surprise, they found that the errors from the IPCC warming trend forecasts were nearly eight times greater than the errors from the no-change forecasts. ...
... To their surprise, they found that the errors from the IPCC warming trend forecasts were nearly eight times greater than the errors from the no-change forecasts. Consequently, Green et al. (2009) recommend that the best policy is to do nothing about global warming. The evaluation method of Green et al. (2009) was not based upon model based statistical or probabilistic methods and thus calls for more sophisticated analyses. ...
... Consequently, Green et al. (2009) recommend that the best policy is to do nothing about global warming. The evaluation method of Green et al. (2009) was not based upon model based statistical or probabilistic methods and thus calls for more sophisticated analyses. In this article, we evaluate the global warming forecasts shown in Figure 1.1 in a rigorous footing using our recently-developed Bayesian methods. ...
Preprint
Full-text available
Global warming, the phenomenon of increasing global average temperature in the recent decades, is receiving wide attention due to its very significant adverse effects on climate. Whether global warming will continue even in the future, is a question that is most important to investigate. In this regard, the so-called general circulation models (GCMs) have attempted to project the future climate, and nearly all of them exhibit alarming rates of global temperature rise in the future. Although global warming in the current time frame is undeniable, it is important to assess the validity of the future predictions of the GCMs. In this article, we attempt such a study using our recently-developed Bayesian multiple testing paradigm for model selection in inverse regression problems. The model we assume for the global temperature time series is based on Gaussian process emulation of the black box scenario, realistically treating the dynamic evolution of the time series as unknown. We apply our ideas to datasets available from the Intergovernmental Panel on Climate Change (IPCC) website. The best GCM models selected by our method under different assumptions on future climate change scenarios do not convincingly support the present global warming pattern when only the future predictions are considered known. Using our Gaussian process idea, we also forecast the future temperature time series given the current one. Interestingly, our results do not support drastic future global warming predicted by almost all the GCM models.
... Can experts make useful climate forecasts? Armstrong (1978) summarized studies to date: people with much expertise are no better at forecasting than those with little expertise. Tetlock (2005): evaluated 82,361 forecasts made over 20 years by 284 professional commentators and advisors on politics and economics and found that expertise did not lead to better forecasts. ...
... (Fortunately for pundits, the Seer-sucker theory offers hope: "No matter how much evidence exists that seers do not exist, seers will find suckers." (Armstrong 1978)) Identifying key papers on forecasting climate change Sent requests to 240 climate experts (70% were IPCC authors or reviewers), "We want to know which forecasts people regard as the most credible and how those forecasts were derived… ...
... In real-world situations, uncertainty governs all different steps of decision making, and one of the main tasks of accurate forecasting is to embrace this uncertainty to support decisions. The increased relevance and complexity has led to forecasting becoming a highly interdisciplinary research topic that deals with areas as diverse as economics (Timmermann 2000), politics (Wolfers and Leigh 2002), energy supply (Hong et al. 2016), weather (Taylor and Buizza 2004), climate (Green et al. 2009), criminality (Gorr et al. 2003), or demography (Booth 2006). The present paper is focused on another well-established and widely studied forecasting domain, which addresses the outcomes of sports events (McHale and Swartz 2019; Stekler et al. 2010;Wunderlich and Memmert 2020b). ...
Article
Full-text available
Far-reaching decisions in organizations often rely on sophisticated methods of data analysis. However, data availability is not always given in complex real-world systems, and even available data may not fully reflect all the underlying processes. In these cases, artificial data can help shed light on pitfalls in decision making, and gain insights on optimized methods. The present paper uses the example of forecasts targeting the outcomes of sports events, representing a domain where despite the increasing complexity and coverage of models, the proposed methods may fail to identify the main sources of inaccuracy. While the actual outcome of the events provides a basis for validation, it remains unknown whether inaccurate forecasts source from misestimating the strength of each competitor, inaccurate forecasting methods or just from inherently random processes. To untangle this paradigm, the present paper proposes the design of a comprehensive simulation framework that models the sports forecasting process while having full control of all the underlying unknowns. A generalized model of the sports forecasting process is presented as the conceptual basis of the system and is supported by the main challenges of real-world data applications. The framework aims to provide a better understanding of rating procedures and forecasting techniques that will boost new developments and serve as a robust validation system accounting for the predictive quality of forecasts. As a proof of concept, a full data generation is showcased together with the main analytical advantages of using artificial data.
... R. Wang et al., 2014;Yeditha et al., 2020). Furthermore, the significance of univariate time-series forecasting in the hydrometeorology and water resources domains has been well documented (the interested reader is directed to Green et al. (2009) andPapacharalampous et al. (2019) for more details regarding this matter). Therefore, in the present study, univariate time-series forecasting using data-driven methods is chosen for precipitation forecasting (please see Section 4). ...
Article
Full-text available
Obtaining consistent forecasts at different timescales is important for reliable decision‐making. This study introduces and evaluates the benefits of utilizing temporal hierarchical reconciliation methods for water resources forecasting, with an application to precipitation. Original (precipitation) Forecasts (ORFs) were produced using “automatic” Exponential Time‐Series Smoothing (ETS), Artificial Neural Network (ANN), and Seasonal Auto‐Regressive Integrated Moving Average (SARIMA) models at six timescales, namely, monthly, 2‐monthly, quarterly, 4‐monthly, bi‐annual, and annual, for 84 basins extracted from the Canadian model parameter experiment. Temporal hierarchical reconciliation methods, including structural scaling‐based Weighted Least Squares (WLS), series variance scaling‐based WLS, and Ordinary Least Squares, along with the simple Bottom‐Up (BU) method, were applied to reconcile the forecasts. In general, ETS (direct forecasting) demonstrated better performance compared to ANN and SARIMA (recursive forecasting). The results confirmed that improvements in accuracy due to reconciliation is dependent on the basin, timescale, and the ORFs' accuracy. For different forecast models, the reconciliation methods showed different levels of performance. For ETS, BU was able to improve forecast accuracy to a greater extent than the temporal hierarchical reconciliation methods, while for ANN and SARIMA, forecast accuracy was improved through all temporal hierarchical reconciliation methods but not BU. The reconciled forecasts' accuracy was affected more by the ORFs' accuracy than by the reconciliation method. Different timescales showed dissimilar sensitivity to reconciliation. The presented results are anticipated to serve as a valuable benchmark for evaluating future developments in the promising area of temporal hierarchical reconciliation for water resources forecasting.
... Consequently, Green et al. (2009) This gives rise to an inverse regression problem in the following sense. The future temperature depends upon the present, our goal is to learn about the present, pretending it to be unknown, while the future is assumed to be known. ...
Thesis
Full-text available
Inverse problems, where in a broad sense the task is to learn from the noisy response about some unknown function, usually represented as the argument of some known functional form, has received wide attention in the general scientific disciplines. However, apart from the class of traditional inverse problems, there exists another class of inverse problems, which qualify as more authentic class of inverse problems, but unfortunately did not receive as much attention. In a nutshell, the other class of inverse problems can be described as the problem of predicting the covariates corresponding to given responses and the rest of the data. Since the model is built for the responses conditional on the covariates, the inverse nature of the prediction problem is evident. Our motivating example in this regard arises in palaeoclimate reconstruction, where the model is built for the multivariate species composition conditional on climate; however, it is of interest to predict past climate given the modern species and climate data and the fossil species data. In the Bayesian context, it is natural to consider a prior for covariate prediction. In this thesis, we bring to attention such a class of inverse problems, which we refer to as ‘inverse regression problems’ to distinguish them from the traditional inverse problems, which are typically special cases of the former, as we point out. Development of the Bayesian inverse regression setup is the goal of this thesis. We particularly focus on Bayesian model adequacy test and Bayesian model and variable selection in the inverse contexts, proposing new approaches and illuminating their asymptotic properties. Towards Bayesian model adequacy, we adopt and extend the inverse reference distribution approach of Bhattacharya (2013), proving the convergence properties. Along the way, out of necessity, we develop asymptotic theories for Bayesian covariate consistency and posterior convergence theories of unknown functions modeled by suitable stochastic processes embedded in normal, double-exponential, binary and Poisson distributions that include rates of convergence and misspecifications. In the realm of inverse model and variable selection, we first develop an asymptotic theory for Bayes factors in the general setup, and then introduce pseudo-Bayes factors for model selection, showing that the asymptotic properties of the two approaches are in agreement, while the latter is more useful from several theoretical and computational perspectives. Along with the inverse regression setup we also develop the forward regression context, where the aim is to predict new responses given known covariate values, and illustrate the suitability, differences and advantages of the approaches, with various theoretical examples and simulation experiments. We further propose and develop a novel Bayesian multiple testing procedure for model and variable selection in the inverse regression setup, also exploring its elegant asymptotic properties. Our simulation studies demonstrate that this approach outperforms Bayes and pseudo-Bayes factors with respect to inverse model and variable selection. As an interesting application encompassing most of our developments, we attempt to evaluate if the future world is likely to experience the terrifying global warming projection that has perturbed the scientists and policymakers the world over. Showing that the question falls within the purview of inverse regression problems, we propose a novel nonparametric model for climate dynamics based on Gaussian processes and exploit our inverse regression methodologies to conclude that there is no real threat to the world as far as global warming is concerned.
Article
Purpose This study is concerned with evaluating the Federal Reserve forecasts of light motor vehicle sales. The goal is to assess accuracy gains from using consumer vehicle-buying attitudes and expectations about future business conditions derived from the long-running Michigan Surveys of Consumers. Design/methodology/approach Simplicity is a core principle in forecasting, and the literature provides plentiful evidence that combining forecasts from different methods and models reduces out-of-sample forecast errors if the methods and models are valid. As such, the authors construct a simple vector autoregressive (VAR) model that incorporates consumer vehicle-buying attitudes and expectations about future business conditions. Comparable forecasts of vehicle sales from this model are then combined with the Federal Reserve forecasts to assess accuracy gains. Findings The findings for 1994–2016 indicate that the Federal Reserve and VAR forecasts contain distinct and useful predictive information, and the combination of the two forecasts shows reductions in forecast errors that are more significant at longer horizons. The authors thus conclude that there are accuracy gains from using consumer survey responses. Originality/value This is the first study that is concerned with evaluating the Federal Reserve forecasts of vehicle sales and examines whether there are accuracy gains from using consumer vehicle-buying attitudes and expectations.
Technical Report
Full-text available
My submission relates particularly to the following clause in the Terms of Reference: Identify the central/benchmark projections which are being used as the motivation for international agreements to combat climate change; and consider the uncertainties and risks surrounding these projections.
Technical Report
Full-text available
Statement Our research findings challenge the basic assumptions of the State Department's Fifth U.S. Climate Action Report (CAR 2010). The alarming forecasts of dangerous manmade global warming are not the product of proper scientific evidence-based forecasting methods. Furthermore, there have been no validation studies to support a belief that the forecasting procedures used were nevertheless appropriate for the situation. As a consequence, alarming forecasts of global warming are merely the opinions of some scientists and, for a situation as complicated and poorly understood as global climate, such opinions are unlikely to be as accurate as forecasts that global temperatures will remain much the same as they have been over recent years. Using proper forecasting procedures we predict that the global warming alarm will prove false and that government actions in response to the alarm will be shown to have been harmful.
Technical Report
Full-text available
Scientific understanding about the Earth's climate is tentative at best. As a result of uncertainties over what causes climate to change and how and when, there are rival theories and arguments among scientists about how to interpret the evidence. Rather than join these arguments, we have examined the processes that have been used to analyze the available data in order to derive forecasts of climate over the 21 st Century. We have concluded that the forecasting process reported on by the Intergovernmental Panel on Climate Change (IPCC) lacks a scientific basis...
Thesis
Full-text available
This thesis falls into the scientific areas of stochastic hydrology, hydrological modelling and hydroinformatics. It contributes with new practical solutions, new methodologies and large-scale results to predictive modelling of hydrological processes, specifically to solving two interrelated technical problems with emphasis on the latter. These problems are: (A) hydrological time series forecasting by exclusively using endogenous predictor variables (hereafter, referred to simply as “hydrological time series forecasting”); and (B) stochastic process-based modelling of hydrological systems via probabilistic post-processing (hereafter, referred to simply as “probabilistic hydrological post-processing”). For the investigation of these technical problems, the thesis forms and exploits a novel predictive modelling and benchmarking toolbox. This toolbox is consisted of: (i) approximately 6 000 hydrological time series (sourced from larger freely available datasets), (ii) over 45 ready-made automatic models and algorithms mostly originating from the four major families of stochastic, (machine learning) regression, (machine learning) quantile regression, and conceptual process-based models, (iii) seven flexible methodologies (which together with the ready-made automatic models and algorithms consist the basis of our modelling solutions), and (iv) approximately 30 predictive performance evaluation metrics. Novel model combinations coupled with different algorithmic argument choices result in numerous model variants, many of which could be perceived as new methods. All the utilized models (i.e., the ones already available in open software, as well as those automated and proposed in the context of the thesis) are flexible, computationally convenient and fast; thus, they are appropriate for large-sample (even global-scale) hydrological investigations. Such investigations are implied by the (mainly) algorithmic nature of the methodologies of the thesis. In spite of this nature, the thesis also provides innovative theoretical supplements to its practical and methodological contribution. Technical problem (A) is examined in four stages. During the first stage, a detailed framework for assessing forecasting techniques in hydrology is introduced. Complying with the principles of forecasting and contrary to the existing hydrological (and, more generally, geophysical) time series forecasting literature (in which forecasting performance is usually assessed within case studies), the introduced framework incorporates large-scale benchmarking. The latter relies on big hydrological datasets, large-scale time series simulation by using classical stationary stochastic models, many automatic forecasting models and algorithms (including benchmarks), and many forecast quality metrics. The new framework is exploited (by utilizing part of the predictive modelling and benchmarking toolbox of the thesis) to provide large-scale results and useful insights on the comparison of stochastic and machine learning forecasting methods for the case of hydrological time series forecasting at large temporal scales (e.g., the annual and monthly ones), with emphasis on annual river discharge processes. The related investigations focus on multi-step ahead forecasting. During the second stage of the investigation of technical problem (A), the work conducted during the previous stage is expanded by exploring the one-step ahead forecasting properties of its methods, when the latter are applied to non-seasonal geophysical time series. Emphasis is put on the examination of two real-world datasets, an annual temperature dataset and an annual precipitation dataset. These datasets are examined in both their original and standardized forms to reveal the most and least accurate methods for long-run one-step ahead forecasting applications, and to provide rough benchmarks for the one-year ahead predictability of temperature and precipitation. The third stage of the investigation of technical problem (A) includes both the examination-quantification of predictability of monthly temperature and monthly precipitation at global scale, and the comparison of a large number of (mostly stochastic) automatic time series forecasting methods for monthly geophysical time series. The related investigations focus on multi-step ahead forecasting by using the largest real-world data sample ever used so far in hydrology for assessing the performance of time series forecasting methods. With the fourth (and last) stage of the investigation of technical problem (A), the multiple-case study research strategy is introduced −in its large-scale version− as an innovative alternative to conducting single- or few-case studies in the field of geophysical time series forecasting. To explore three sub-problems associated with hydrological time series forecasting using machine learning algorithms, an extensive multiple-case study is conducted. This multiple-case study is composed by a sufficient number of single-case studies, which exploit monthly temperature and monthly precipitation time series observed in Greece. The explored sub-problems are lagged variable selection, hyperparameter handling, and comparison of machine learning and stochastic algorithms. Technical problem (B) is examined in three stages. During the first stage, a novel two-stage probabilistic hydrological post-processing methodology is developed by using a theoretically consistent probabilistic hydrological modelling blueprint as a starting point. The usefulness of this methodology is demonstrated by conducting toy model investigations. The same investigations also demonstrate how our understanding of the system to be modelled can guide us to achieve better predictive modelling when using the proposed methodology. During the second stage of the investigation of technical problem (B), the probabilistic hydrological modelling methodology proposed during the previous stage is validated. The validation is made by conducting a large-scale real-world experiment at monthly timescale. In this experiment, the increased robustness of the investigated methodology with respect to the combined (by this methodology) individual predictors and, by extension, to basic two-stage post-processing methodologies is demonstrated. The ability to “harness the wisdom of the crowd” is also empirically proven. Finally, during the third stage of the investigation of technical problem (B), the thesis introduces the largest range of probabilistic hydrological post-processing methods ever introduced in a single work, and additionally conducts at daily timescale the largest benchmark experiment ever conducted in the field. Additionally, it assesses several theoretical and qualitative aspects of the examined problem and the application of the proposed algorithms to answer the following research question: Why and how to combine process-based models and machine learning quantile regression algorithms for probabilistic hydrological modelling?
Article
Full-text available
Ideally, forecasting methods should be evaluated in the situations for which they will be used. Underlying the evaluation procedure is the need to test methods against reasonable alternatives. Evaluation consists of four steps: testing assumptions, testing data and methods, replicating outputs, and assessing outputs. Most principles for testing forecasting methods are based on commonly accepted methodological procedures, such as to prespecify criteria or to obtain a large sample of forecast errors. However, forecasters often violate such principles, even in academic studies. Some principles might be surprising, such as do not use R-square, do not use Mean Square Error, and do not use the within-sample fit of the model to select the most accurate time-series model. A checklist of 32 principles is provided to help in systematically evaluating forecasting methods.
Article
Full-text available
In 2007, the Intergovernmental Panel on Climate Change's Working Group One, a panel of experts established by the World Meteorological Organization and the United Nations Environment Programme, issued its Fourth Assessment Report. The Report included predictions of dramatic increases in average world temperatures over the next 92 years and serious harm resulting from the predicted temperature increases. Using forecasting principles as our guide we asked: Are these forecasts a good basis for developing public policy? Our answer is “no”. To provide forecasts of climate change that are useful for policy-making, one would need to forecast (1) global temperature, (2) the effects of any temperature changes, and (3) the effects of feasible alternative policies. Proper forecasts of all three are necessary for rational policy making. The IPCC WG1 Report was regarded as providing the most credible long-term forecasts of global average temperatures by 31 of the 51 scientists and others involved in forecasting climate change who responded to our survey. We found no references in the 1056-page Report to the primary sources of information on forecasting methods despite the fact these are conveniently available in books, articles, and websites. We audited the forecasting processes described in Chapter 8 of the IPCC's WG1 Report to assess the extent to which they complied with forecasting principles. We found enough information to make judgments on 89 out of a total of 140 forecasting principles. The forecasting procedures that were described violated 72 principles. Many of the violations were, by themselves, critical. The forecasts in the Report were not the outcome of scientific procedures. In effect, they were the opinions of scientists transformed by mathematics and obscured by complex writing. Research on forecasting has shown that experts' predictions are not useful in situations involving uncertainly and complexity. We have been unable to identify any scientific forecasts of global warming. Claims that the Earth will get warmer have no more credence than saying that it will get colder.
Article
Full-text available
: This study evaluated measures for making comparisons of errors across time series. We analyzed 90 annual and 101 quarterly economic time series. We judged error measures on reliability, construct validity, sensitivity to small changes, protection against outliers, and their relationship to decision making. The results lead us to recommend the Geometric Mean of the Relative Absolute Error (GMRAE) when the task involves calibrating a model for a set of time series. The GMRAE compares the absolute error of a given method to that from the random walk forecast. For selecting the most accurate methods, we recommend the Median RAE (MdRAE) when few series are available and the Median Absolute Percentage Error (MdAPE) otherwise. The Root Mean Square Error (RMSE) is not reliable, and is therefore inappropriate for comparing accuracy across series. Keywords: Forecast accuracy, M-Competition, Relative absolute error, Theil's U. 1. Introduction Over the past-two decades, many studies have been ...
Book
Book review of the intergovernmental panel on climate change report on global warming and the greenhouse effect. Covers the scientific basis for knowledge of the future climate. Presents chemistry of greenhouse gases and mathematical modelling of the climate system. The book is primarily for government policy makers.
Article
Local land surface modification and variations in data quality affect temperature trends in surface-measured data. Such effects are considered extraneous for the purpose of measuring climate change, and providers of climate data must develop adjustments to filter them out. If done correctly, temperature trends in climate data should be uncorrelated with socioeconomic variables that determine these extraneous factors. This hypothesis can be tested, which is the main aim of this paper. Using a new database for all available land-based grid cells around the world we test the null hypothesis that the spatial pattern of temperature trends in a widely used gridded climate data set is independent of socioeconomic determinants of surface processes and data inhomogeneities. The hypothesis is strongly rejected (P = 7.1 × 10-14), indicating that extraneous (nonclimatic) signals contaminate gridded climate data. The patterns of contamination are detectable in both rich and poor countries and are relatively stronger in countries where real income is growing. We apply a battery of model specification tests to rule out spurious correlations and endogeneity bias. We conclude that the data contamination likely leads to an overstatement of actual trends over land. Using the regression model to filter the extraneous, nonclimatic effects reduces the estimated 1980-2002 global average temperature trend over land by about half.
Summary for Policymakers, in Climate Change 2007: The Physical Science Basis Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change
IPCC (2007). Summary for Policymakers, in Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Solomon, S., D. Qin, M. Manning, Z. Chen, M. Marquis, K.B. Averyt, M.Tignor and H.L. Miller (eds.)]. Cambridge University Press, Cambridge, U.K. and New York, NY, USA.