PresentationPDF Available

Predictive validity and reliability of causal temperature models - Effect of variable choice

Authors:

Abstract

Slides used for a talk to the Climate Science and Economics Group on May 8 2024. The talk described findings from research to assess the predictive validities and reliabilities of solar and anthropogenic models of NH surface temperatures relative to a benchmark model of no change. The relationship between models’ statistical fits and their predictive validities was also examined.
Predictive validity and reliability
of causal temperature models
Effect of variable choice
Kesten C. Green
A talk to the
Climate Science & Economics Group
8th of May 2024
J. Scott Armstrong
& Kesten C. Green
FOREWORD BY
VERNON SMITH
AFTERWORD BY
TERENCE KEALEY
Useful Knowledge
SCIENTIFIC
METHOD
THE
A Guide to Finding
Background
IPCC rejects a substantive contribution of the Sun to increasing
temperatures since the 1950s based on “attribution studies”.
Connolly, Soon, Connolly, et al.s (2023) “Challenges in the
Detection and Attribution of Northern Hemisphere Surface
Temperat ure Trends Since 1850
Compared the statistical fits (adjusted-R2s) of causal models of 5
measures of NH Surface Temperature (ST, )
275 models estimated using putative causal variable (Wm-2)
one of 28 Solar measures (including IPCCs one), or none
IPCC’s Volcanic variable, or not
IPCC’s All Anthropogenicvariable, or not
Statistical fit figures were consistent with temperatures being:
“mostly human-caused, mostly natural, or some combination of
both”
But do any of the models provide useful forecasts?
Predictive validity:
Which, if any, models provide forecasts that are more
accurate than those from a simple benchmark? (i.e., have
smaller errors)
Forecasts in this study are derived using “actual,” rather
than forecast, values of causal variables
Statistical fit (adjusted-R2) does NOT answer that question
Reliability:
Which, if any, models provide forecasts that are
independent of the subset of data used for estimation?
What would one expect of the accuracy (errors) of the
forecasts of a valid model as more data is added to the
estimation sample?
Why Northern Hemisphere?
68 percent of Earth’s
land surface is in NH
More weather stations
with longer history
https://svs.gsfc.nasa.gov/4961
IPCC’s Anthropogenic & Volcanic variables
Anthro:
Putative human impact on Earths
temperature
Composite of 11 proposed anthropogenic
influences
Mainly CO2 emissions in atmosphere
Volcanic
Effect of eruptions via atmosphere
Total sol ar insolation (TSI) var iables
IPCC Solar:
IPCC’s AR6 uses Matthes, et al. (2017) TSI estimate
A low-variability estimate
Two high-variability estimates were chosen for this study from
Connolly, et al.’s ( 2 0 2 3 ) 2 7 a l t e r n a t i v e s t o I P C C s o l a r
Solar B2000 (Bard, et al., 2000)
11th largest range
1st smallest correlation with IPCC Solar (0.39)
Solar H1993 (Hoyt & Schatten, 1993)
2nd largest range
4th smallest correlation with IPCC Solar (0.62)
Endorsed in IPCC’s AR4, but dropped for AR6
Eight models tested in this study a,b
a With Willie Soon
b Models that include the Anthro variable use the IPCC’s preferred data and
formulations. Those that do not include the Anthro variable, are herein described as
Independent”.
Model
name
Causal variables Forecast variable
AVL Anthro
Volcanic
-NH All Land Annual Average
Temp era tur e An om al y
AVSL
Anthro
Volcanic
Solar IPCC
S
B
VL
Solar B2000
Volcanic
-
S
H
VL
Solar H1993
Volcanic
-
AVR Anthro
Volcanic
-
NH Rural Land Annual Average
Temp era tur e An om al y
AVSR
Anthro
Volcanic
Solar IPCC
S
B
VR
Solar B2000
Volcanic
-
S
H
VR
Solar H1993
Volcanic
-
Measuring errors relative to a benchmark
𝐶𝑢𝑚𝑅𝐴𝐸(𝑜𝑟(𝑅𝑒𝑙𝑀𝐴𝐸 =!"#
$𝑒!
%
!"#
$𝑒!
𝑈𝑀𝐵𝑅𝐴𝐸
( =( 𝑀𝐵𝑅𝐴𝐸
1𝑀𝐵𝑅𝐴𝐸,
𝑤ℎ𝑒𝑟𝑒(𝑀𝐵𝑅𝐴𝐸 =1
𝑛8
!"#
$𝑒!
𝑒!
%+ 𝑒!
(
𝑎𝑛𝑑(𝑒!
(𝑖𝑠(𝑡ℎ𝑒(𝑖𝑡ℎ(𝑒𝑟𝑟𝑜𝑟(𝑓𝑟𝑜𝑚(𝑎(𝑏𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘(𝑚𝑜𝑑𝑒𝑙(
An appropriate benchmark model of temperatures
Why?
Simplicity
Green & Armstrong (2015)
Conservatism
Armstrong, Green, & Graefe (2015)
Realism
Apparent trends in Earth temperatures reverse on all time scales
Causal variables difficult if not impossible to forecast accurately
Prior evidence
Green, Armstrong, & Soon (2009) benchmark of mean historical
No change, or no trend
= Median of estimation sample temperatures
Absolute Errors of NH Temperature Forecasts to 2018 (
)
All land Rural land
0
1
2
3
4
5
6
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Anthro, Volcanic
Anthro, Volcanic, IPCC Solar
Median historical temperature
B2000 Solar, Volcanic
H1993 Solar, Volcanic
0
1
2
3
4
5
6
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Absolute Errors of NH Temperature Forecasts to 2018 (
)
All land Rural land
Es#ma#on period: 1850 to 1949
0
1
2
3
4
5
6
1950 1960 1970 1980 1990 2000 2010
0
1
2
3
4
5
6
1950 1960 1970 1980 1990 2000 2010
Absolute Errors of NH Temperature Forecasts to 2018 (
)
All land Rural land
Es#ma#on period: 1850 to 1969
0
1
2
3
4
5
6
1970 1980 1990 2000 2010
0
1
2
3
4
5
6
1970 1980 1990 2000 2010
Absolute Errors of NH Temperature Forecasts to 2018 (
)
All land Rural land
Es#ma#on period: 1850 to 1999
0
1
2
3
4
5
6
2000 2005 2010 2015
0
1
2
3
4
5
6
2000 2005 2010 2015
Predictive validity vs statistical fit of models
Average correlation -0.26
Sign-reversed Pearson’s r of UMBRAE vs adjusted-R2
i.e., !
𝑅!↑$ $𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦$
Large and negative r for 6/8 combinations of
estimation period and temperature series (All Land;
Rural Land)
Only for All Land models estimated with largest
sample (1850 1999) (i.e., 1/8) was r large and positive
Reliability of models
Median absolute errors of NH temperature forecasts for 2000 to 2018 (
)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Number of observations used for estimating models
Yea r o f la st obs er va ti on ad de d to es ti ma ti on sa mp le
Median histori cal temperature
Anthro, Volcanic
Anthro, Volcanic, IPCC Solar
B2000 Solar, Volcanic
H1993 Solar, Volcanic
All Land temperature
models
Rural Land temperature models
50 100 150|50 100 150
1899 1949 1999|1899 1949 1999
Reliability of models 2: Effect of sampling strategy
Forecast errors etc. from models estimated with 1850-1933 vs odd years data (
)
From estimating models using 84
consecutive to using 84 odd years
data…
Reductions in MdAE (IQR)
IPCC: 57%+ (75%+).
Independent: Little or none.
Changes in model coefficients*
IPCC: IPCC Solar small ve to small
+ve; Anthro large, halved.
Independent: H1993 and B2000 Solar
nearly doubled, >4 x IPCC Solar.
*Estimated using standardised data.
IPCC models of All Land; Independent
models of Rural Land
Reliability of models 3: Effect of sampling strategy
Parameters of models estimated with standardised data 1850-1933 vs odd years
Your conclusions on
1. Predictive validity of IPCC vs Independent vs
Benchmark models?
2. Relationship between statistical fit (in-sample) and
predictive validity (out of sample)?
3. Reliability of IPCC vs Independent vs Benchmark
models?
4. Practical usefulness of IPCC vs Independent vs
Benchmark models?
References
Armstrong, J.S., Green, K.C. & Graefe, A. (2015). Golden rule of forecasting: Be conservative.
Journal of Business Research, 68, 1717-1731. https://doi.org/10.1016/j.jbusres.2015.03.031
Bard, E., Raisbeck, G., Yiou, F. & Jouzel, J. (2000). Solar irradiance during the last 1200 years
based on cosmogenic nuclides. Tellus B, 52, 985-992. http://dx.doi.org/10.1034/j.1600-
0889.2000.d01-7.x
Connolly, R., Soon, W., Connolly, M., Baliunas, S., Berglund, J., Butler, C.J., Cionco, R.G., Elias,
A.G., Fedorov, V.M., Harde, H., Henry, G.W., Hoyt, D.V., Humlum, O., Legates, D.R.,
Scafetta, N., Solheim, J.-E., Szarka, L., Velasco Herrera, V.M., Yan, H. & Zhang, W. (2023).
Challenges in the Detection and Attribution of Northern Hemisphere Surface Temperature
Trends Since 1850. Research in Astronomy and Astrophysics, 23, 105015.
https://iopscience.iop.org/article/10.1088/1674-4527/acf18e
Green, K.C. & Armstrong, J.S. (2015). Simple versus complex forecasting: The evidence. Journal
of Business Research, 68, 1678-1685. https://doi.org/10.1016/j.jbusres.2015.03.026
Green, K.C., Armstrong, J.S. & Soon W. (2009). Validity of climate change forecasting for public
policy decision making. International Journal of Forecasting, 25, 826-832.
https://doi.org/10.1016/j.ijforecast.2009.05.011
Hoyt, D.V. & Schatten, K.H. (1993). A discussion of plausible solar irradiance variations, 1700-
1992. Journal of Geophysical Research, 98, 18895-18906.
https://doi.org/10.1029/93JA01944
Matthes, K. et al. (2017). Solar forcing for CMIP6 (v3.2). Geoscientific Model Development, 10,
2247-2302. https://doi.org/10.5194/gmd-10-2247-2017
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper describes the recommended solar forcing dataset for CMIP6 and highlights changes with respect to CMIP5. The solar forcing is provided for radiative properties, namely total solar irradiance (TSI), solar spectral irradiance (SSI), and the F10.7 index as well as particle forcing, including geomagnetic indices Ap and Kp, and ionization rates to account for effects of solar protons, electrons, and galactic cosmic rays. This is the first time that a recommendation for solar-driven particle forcing has been provided for a CMIP exercise. The solar forcing datasets are provided at daily and monthly resolution separately for the CMIP6 preindustrial control, historical (1850–2014), and future (2015–2300) simulations. For the preindustrial control simulation, both constant and time-varying solar forcing components are provided, with the latter including variability on 11-year and shorter timescales but no long-term changes. For the future, we provide a realistic scenario of what solar behavior could be, as well as an additional extreme Maunder-minimum-like sensitivity scenario. This paper describes the forcing datasets and also provides detailed recommendations as to their implementation in current climate models. For the historical simulations, the TSI and SSI time series are defined as the average of two solar irradiance models that are adapted to CMIP6 needs: an empirical one (NRLTSI2–NRLSSI2) and a semi-empirical one (SATIRE). A new and lower TSI value is recommended: the contemporary solar-cycle average is now 1361.0 W m-2. The slight negative trend in TSI over the three most recent solar cycles in the CMIP6 dataset leads to only a small global radiative forcing of -0.04 W m-2. In the 200–400 nm wavelength range, which is important for ozone photochemistry, the CMIP6 solar forcing dataset shows a larger solar-cycle variability contribution to TSI than in CMIP5 (50 % compared to 35 %). We compare the climatic effects of the CMIP6 solar forcing dataset to its CMIP5 predecessor by using time-slice experiments of two chemistry–climate models and a reference radiative transfer model. The differences in the long-term mean SSI in the CMIP6 dataset, compared to CMIP5, impact on climatological stratospheric conditions (lower shortwave heating rates of -0.35 K day-1 at the stratopause), cooler stratospheric temperatures (-1.5 K in the upper stratosphere), lower ozone abundances in the lower stratosphere (-3 %), and higher ozone abundances (+1.5 % in the upper stratosphere and lower mesosphere). Between the maximum and minimum phases of the 11-year solar cycle, there is an increase in shortwave heating rates (+0.2 K day-1 at the stratopause), temperatures (∼ 1 K at the stratopause), and ozone (+2.5 % in the upper stratosphere) in the tropical upper stratosphere using the CMIP6 forcing dataset. This solar-cycle response is slightly larger, but not statistically significantly different from that for the CMIP5 forcing dataset. CMIP6 models with a well-resolved shortwave radiation scheme are encouraged to prescribe SSI changes and include solar-induced stratospheric ozone variations, in order to better represent solar climate variability compared to models that only prescribe TSI and/or exclude the solar-ozone response. We show that monthly-mean solar-induced ozone variations are implicitly included in the SPARC/CCMI CMIP6 Ozone Database for historical simulations, which is derived from transient chemistry–climate model simulations and has been developed for climate models that do not calculate ozone interactively. CMIP6 models without chemistry that perform a preindustrial control simulation with time-varying solar forcing will need to use a modified version of the SPARC/CCMI Ozone Database that includes solar variability. CMIP6 models with interactive chemistry are also encouraged to use the particle forcing datasets, which will allow the potential long-term effects of particles to be addressed for the first time. The consideration of particle forcing has been shown to significantly improve the representation of reactive nitrogen and ozone variability in the polar middle atmosphere, eventually resulting in further improvements in the representation of solar climate variability in global models.
Article
Full-text available
This article introduces this JBR Special Issue on simple versus complex methods in forecasting. Simplicity in forecasting requires that (1) method, (2) representation of cumulative knowledge, (3) relationships in models, and (4) relationships among models, forecasts, and decisions are all sufficiently uncomplicated as to be easily understood by decision-makers. Our review of studies comparing simple and complex methods - including those in this special issue - found 97 comparisons in 32 papers. None of the papers provide a balance of evidence that complexity improves forecast accuracy. Complexity increases forecast error by 27 percent on average in the 25 papers with quantitative comparisons. The finding is consistent with prior research to identify valid forecasting methods: all 22 previously identified evidence-based forecasting procedures are simple. Nevertheless, complexity remains popular among researchers, forecasters, and clients. Some evidence suggests that the popularity of complexity may be due to incentives: (1) researchers are rewarded for publishing in highly ranked journals, which favor complexity; (2) forecasters can use complex methods to provide forecasts that support decision-makers’ plans; and (3) forecasters’ clients may be reassured by incomprehensibility. Clients who prefer accuracy should accept forecasts only from simple evidence-based procedures. They can rate the simplicity of forecasters’ procedures using the questionnaire at simple-forecasting.com.
Article
Full-text available
This article proposes a unifying theory, or the Golden Rule, of forecasting. The Golden Rule of Forecasting is to be conservative. A conservative forecast is consistent with cumulative knowledge about the present and the past. To be conservative, forecasters must seek out and use all knowledge relevant to the problem, including knowledge of methods validated for the situation. Twenty-eight guidelines are logically deduced from the Golden Rule. A review of evidence identified 105 papers with experimental comparisons; 102 support the guidelines. Ignoring a single guideline increased forecast error by more than two-fifths on average. Ignoring the Golden Rule is likely to harm accuracy most when the situation is uncertain and complex, and when bias is likely. Non-experts who use the Golden Rule can identify dubious forecasts quickly and inexpensively. To date, ignorance of research findings, bias, sophisticated statistical procedures, and the proliferation of big data, have led forecasters to violate the Golden Rule. As a result, despite major advances in evidence-based forecasting methods, forecasting practice in many fields has failed to improve over the past half-century.
Article
Full-text available
Policymakers need to know whether prediction is possible and, if so, whether any proposed forecasting method will provide forecasts that are substantially more accurate than those from the relevant benchmark method. An inspection of global temperature data suggests that temperature is subject to irregular variations on all relevant time scales, and that variations during the late 1900s were not unusual. In such a situation, a "no change" extrapolation is an appropriate benchmark forecasting method. We used the UK Met Office Hadley Centre's annual average thermometer data from 1850 through 2007 to examine the performance of the benchmark method. The accuracy of forecasts from the benchmark is such that even perfect forecasts would be unlikely to help policymakers. For example, mean absolute errors for the 20- and 50-year horizons were 0.18 Â oC and 0.24 Â oC respectively. We nevertheless demonstrate the use of benchmarking with the example of the Intergovernmental Panel on Climate Change's 1992 linear projection of long-term warming at a rate of 0.03 Â oC per year. The small sample of errors from ex ante projections at 0.03 Â oC per year for 1992 through 2008 was practically indistinguishable from the benchmark errors. Validation for long-term forecasting, however, requires a much longer horizon. Again using the IPCC warming rate for our demonstration, we projected the rate successively over a period analogous to that envisaged in their scenario of exponential CO2 growth--the years 1851 to 1975. The errors from the projections were more than seven times greater than the errors from the benchmark method. Relative errors were larger for longer forecast horizons. Our validation exercise illustrates the importance of determining whether it is possible to obtain forecasts that are more useful than those from a simple benchmark before making expensive policy decisions.
Article
Full-text available
From satellite observations the solar total irradiance is known to vary. Sunspot blocking, facular emission, and network emission are three identified causes for the variations. In this paper we examine several different solar indices measured over the past century that are potential proxy measures for the Sun's irradiance. These indices are (1) the equatorial solar rotation rate, (2) the sunspot structure, the decay rate of individual sunspots, and the number of sunspots without umbrae, and (3) the length and decay rate of the sunspot cycle. Each index can be used to develop a model for the Sun's total irradiance as seen at the Earth. Three solar indices allow the irradiance to be modeled back to the mid-1700s. The indices are (1) the length of the solar cycle, (2) the normalized decay rate of the solar cycle, and (3) the mean level of solar activity. All the indices are well correlated, and one possible explanation for their nearly simultaneous variations is changes in the Sun's convective energy transport. Although changes in the Sun's convective energy transport are outside the realm of normal stellar structure theory (e.g., mixing length theory), one can imagine variations arising from even the simplest view of sunspots as vertical tubes of magnetic flux, which would serve as rigid pillas affecting the energy flow patterns by ensuring larger-scale eddies. A composite solar irradiance model, based upon these proxies, is compared to the northern hemisphere temperature depatures for 1700-1992. Approximately 71% of the decadal variance in the last century can be modeled with these solar indices, although this analysis does not include anthropogenic or other variations which would affect the results. Over the entire three centuries, approx. 50% of the variance is modeled. Both this analysis and previous similar analyses have correlations of model solar irradiances and measured Earth surface temperatures that are significant at better than the 95% confidence level. To understand our present climate variations, we must place the anthropogenic variations in the context of natural variability from solar, volcanic, oceanic, and other sources.
Article
Based on a quantitative study of the common fluctuations of 14C and 10Be production rates, we have derived a time series of the solar magnetic variability over the last 1200 years. This record is converted into irradiance variations by linear scaling based on previous studies of sun2010like stars and of the sun's behavior over the last few centuries. The new solar irradiance record exhibits low values during the well2010known solar minima centered at about 1900, 1810 (Dalton) and 1690 ad(Maunder). Further back in time, a rather long period between 1450 and 1750 ad is characterized by low irradiance values. A shorter period is centered at about 1200 ad, with irradiance slightly higher or similar to present day values. It is tempting to correlate these periods with the so2010called "little ice age" and "medieval warm period", respectively. An accurate quantification of the climatic impact of this new irradiance record requires the use of coupled atmosphere2013ocean general circulation models (GCMs). Nevertheless, our record is already compatible with a global cooling of about 0.520101∞C during the "little ice age", and with a general cooling trend during the past millenium followed by global warming during the 20th century (Mann et al., 1999).
Challenges in the Detection and Attribution of Northern Hemisphere Surface Temperature Trends Since 1850
  • R Connolly
  • W Soon
  • M Connolly
  • S Baliunas
  • J Berglund
  • C J Butler
  • R G Cionco
  • A G Elias
  • V M Fedorov
  • H Harde
  • G W Henry
  • D V Hoyt
  • O Humlum
  • D R Legates
  • N Scafetta
  • J.-E Solheim
  • L Szarka
  • V M Velasco Herrera
  • H Yan
  • W Zhang
Connolly, R., Soon, W., Connolly, M., Baliunas, S., Berglund, J., Butler, C.J., Cionco, R.G., Elias, A.G., Fedorov, V.M., Harde, H., Henry, G.W., Hoyt, D.V., Humlum, O., Legates, D.R., Scafetta, N., Solheim, J.-E., Szarka, L., Velasco Herrera, V.M., Yan, H. & Zhang, W. (2023). Challenges in the Detection and Attribution of Northern Hemisphere Surface Temperature Trends Since 1850. Research in Astronomy and Astrophysics, 23, 105015. https://iopscience.iop.org/article/10.1088/1674-4527/acf18e