Content uploaded by Kesten Green
Author content
All content in this area was uploaded by Kesten Green on Feb 03, 2022
Content may be subject to copyright.
Content uploaded by Kesten Green
Author content
All content in this area was uploaded by Kesten Green on Feb 23, 2021
Content may be subject to copyright.
Forecasts of doctor visits for flu: Simple conservative methods beat
Google’s big data machine learning model*
Kesten C Green, 3 February 2022
a
*Previously titled “Comparison of forecasts of weekly weighted average US percentage of
doctor visits associated with flu symptoms”.
Purpose
Katsikopoulos et al. (2021) found that the simple and easily understood recency heuristic—which uses
a single historical observation to forecast weekahead percentage of doctor visits associated with
influenza symptoms—reduced forecast errors by nearly onehalf compared to Google Flu Trends’ (GFT’s)
complex and opaque machine learning model—which uses “big data”.
This research note examines whether the accuracy of forecasts can be further improved by using
another simple forecasting method (Green & Armstrong, 2015) that takes account of the observation that
infection rates can trend, and does so in a conservative way (Armstrong, Green, and Graefe, 2015) by
damping recent trends toward zero.
Methods
(1) Katsikopoulos et al.’s (2021) Table 1 findings on the accuracy of the recency heuristic, linear
regression, Google Flu Trends, and a predict zero benchmark model over the 440 weeks from Week 11
2007 to Week 32 2015 of the CDC’s U.S. national weighted average data available from
https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html were replicated. The recency heuristic simply
forecast that next week’s figure will be the same as last week’s—also known as the nochange or no
trend model—and the linear regression (n=2) model forecasts that next week’s figure will be the sum of
last week’s figure plus the trend or difference between last week’s figure and the previous week’s.
(2) A damped trend model was then estimated using the first 54 weeks of continuous national data—
Week 40 2002 to Week 41 2003—and employed to forecast the same period as Katsikopoulos et al.’s
(2021) as an extension. The model forecasts that next week’s figure will be a sum of last week’s figure
plus a proportion of the trend—difference between last week’s figure and the previous week’s—where
the proportion, or damping factor, was the figure between 0.0 and 1.0 that minimised the total absolute
forecast error over the 52week estimation period. The damping factor was estimated by trial and error
using a spreadsheet.
(3) At the time of writing, CDC data were available up to Week 5 of 2021, and continuous data were
available from Week 42 2003 after allowing for 54 weeks prior to estimate the damping factor. The
second extension conducted for this research note compares the accuracy of forecasts from damped
trend with those from the recency heuristic and from linear regression (n=2).
(4) To address the question of whether, in the absence of historical data on the time series being
forecast but with a priori evidence that the series is likely to experience short term trends, a model that
damps the latest trend by onehalf provides any advantage in forecast accuracy during the first year in
which data were available and comparison between the methods is possible (Week 42 2002 to Week 41
2003). Halving the recent trend is a conservative approach for situations where there is uncertainty about
the persistence of apparent trends. The approach is also supported by findings that combining forecasts
from different methods improves forecast accuracy: halving the recent trend is equivalent to averaging
the recency heuristic and the linear regression (n=2) forecasts. Damping the recent trend by onehalf is
included in the other comparisons for completeness.
(5) Finally, forecasts for the continuous data from Week 42 2011 to Week 5 2021 from the methods in
(3) were obtained for three populous and contrasting U.S. states—California, Texas, and New York. For
each state, two variations of the damped trend model were used. One used the damping factor that was
estimated for the national level data, and the other involved estimating a state specific damping factor
from the Week 40 2010 to Week 41 2011 of the state data using the approach described in (2).
Findings
Damped trend models reduced absolute forecast errors by 13% relative to forecasts from the recency
heuristic, by nearly 12% relative to forecasts from linear regression models estimated from the two most
recent observations, and by roughly 54% relative to Google Flu Trends forecasts for the 440week period
examined by Katsikopoulos et al. (2021). See Table 1.
Table 1
Replication and extension of Katsikopoulos et al. (2001)
Forecasts from Week 11 2007 to 32 2015 – As per Katsikopoulos, et al. (2021) – n = 440)
MAE
MAPE
MdAE
MdAPE
CumRAE
Damped trend (x .39)
0.17
8.7
0.09
6.5
0.097
Damped trend (x .50)
0.17
8.7
0.09
6.4
0.096
Recency heuristic
0.20
9.4
0.10
7.3
0.110
Linear regression (n=2)
0.19
10.4
0.11
7.8
0.108
Google Flu Trends (GFT)
0.38
20


0.211
Predict zero benchmark
1.80
100
1.37

1.000
Figures in green shading in are new findings, while those without shading in Table 1 (above) are a
replication of Katsikopoulos et al.’s (2021) Table 1. The blue figure differs from the 0.20 figure in
Katsikopoulos et al. (2001).
Error measures are as follows: MAE is the mean absolute error; MAPE is the mean absolute
percentage error; MdAE is the median absolute error; MdAPE is the median absolute percentage error;
and CumRAE is the sum of the absolute errors of forecasts from the method relative to the sum of the
absolute errors of forecasts from the benchmark.
Forecasts from the damped trend model estimated from 52 weekly observations (damping factor =
0.39) reduced absolute errors by 15% relative to forecasts from the recency heuristic benchmark, and by
9% relative to forecasts from linear regression (n=2) when the models were applied to the whole outof
sample U.S. national data series of 904 nonzero observations. See Table 2.
The estimated damped trend model (0.39 damping factor) reduced bias and variance relative to the
recency heuristic and to linear regression. The signed errors were .00003 versus .00007 and .00018, and
the standard deviations of the absolute errors were .255 versus .292 and .295 respectively.
Table 2
U.S. national doctor visits for influenza symptoms
(Forecasts from Week 42 2003 to 5 2021 – Full period less estimation data – n=904)
MAE
MAPE
MdAE
MdAPE
CumRAE
Damped trend (x .39)
0.18
8.6
0.09
6.6
0.851
Damped trend (x .50)
0.18
8.6
0.09
6.7
0.835
Recency heuristic benchmark
0.21
9.8
0.10
7.8
1.000
Linear regression (n=2)
0.20
10.2
0.11
7.9
0.934
Forecasting the first year (52 weeks) of the time series of national doctor visits for flu symptoms using
a damped trend model that did not require estimation (by using a 0.5 damping factor) resulted in forecast
errors that were little different from the recency heuristic benchmark and reduced absolute errors by
12% relative to linear regression. See Table 3.
A comparison of the same model with the others over the longer time series (Table 2) found that
halving the trend reduced absolute errors compared to forecasts from the estimated damped trend
model (with a damping factor of 0.39) by an additional 2%, and nearly 17% relative to the recency
heuristic benchmark.
Damped trend (0.5 damping factor) reduced bias relative to the recency heuristic and was little different
in terms of variance over the estimation period covered in Table 3. The mean signed errors were .00262 and
.00588, and the standard deviations of the absolute errors were .173 and .172. For the longer period
covered by Table 2, the magnitude of the mean signed damped trend (0.5) forecast error was similar to that
of the recency heuristic benchmark (.00006 versus .00007) and the standard deviation of the absolute error
was little different to that of the estimated damped trend model forecasts (.254 versus .255).
Table 3
U.S. national doctor visits for influenza symptoms over the “estimation period”
(Forecasts from Week 42 2002 to Week 41 203 – Estimation period – n=52)
MAE
MAPE
MdAE
MdAPE
CumRAE
Damped trend (x .50)
0.17
14.7
0.11
11.0
1.000
Recency heuristic benchmark
0.17
13.8
0.13
11.2
1.000
Linear regression (n=2)
0.22
18.8
0.18
13.9
1.279
Damped trend and recency heuristic forecasts were substantially more accurate than linear regression
(n=2) forecasts for the states of California, New York, and Texas whether assessed using MAE or MdAE.
On average, damped trend and recency models reduced errors by 20% relative to linear regression.
Forecasts from damped trend models—whether estimated from the state level data or from the national
data or using a damping factor of 0.50—were generally more accurate than, or comparable in accuracy
to, forecasts from the recency heuristic. See Table 4. Percentage errors are not calculated, because
observations of zero occur in the state level data.
Table 4
California, New York, and Texas – Out of sample oneweek ahead forecast errors
(Forecasts from Week 42 2011 to Week 5 2021 – n=486)
California
New York
Texas
MAE
MdAE
MAE
MdAE
MAE
MdAE
Damped trend (x .1; .025; .42)
.240
.160
.350
.163
.497
.258
Damped trend (x .39)
.245
.157
.356
.185
.494
.257
Damped trend (x .50)
.253
.158
.364
.195
.507
.266
Recency heuristic
.243
.163
.351
.163
.504
.255
Linear regression (n=2)
.311
.204
.441
.230
.621
.347
Limitations
This research note did not test rolling estimation of damping factors, multiple stepahead forecasts, or
forecasts of timeseries with many zero observations such as for smaller U.S. states.
Implications
The findings in this research note provide further support for the superiority of the simple and
conservative nochange and dampedtrend (or combined) models for making timeseries forecasts about
complex uncertain situations.
Unlike machine learning models—which require “big data,” sophisticated software, and technical
expertise—only two historical observations from the timeseries of interest are needed for a simple no
change or dampedtrend model, and anyone can implement them.
Resources can be saved by avoiding the cost of machine learning models and instead developing
simple conservative models that are consistent with what is known about the situation and about
forecasting methods and that can be understood by decision makers. And resources could be better
employed as a result of the improved decisions and policies that are possible with more accurate and
transparent forecasts.
References
Armstrong, J. S., Green, K. C., & Graefe, A. (2015). Golden Rule of Forecasting: Be conservative.
Journal of Business Research, 68(8), 17171731.
Green, K. C., & Armstrong, J. S. (2015). Simple versus complex forecasting: the evidence. Journal of
Business Research, 68(8), 16781685.
Katsikopoulos, K. V., Simsek, Ö., Buckmann, M., & Gigerenzer, G. (2021). Transparent modelling of
influenza incidence: Big data or a single data point from psychological theory?. International
Journal of Forecasting, In Press.
Katsikopoulos, K. V., Simsek, Ö., Buckmann, M., & Gigerenzer, G. (2022). Reply to commentaries on
“Transparent modelling of influenza incidence”: Recency heuristics and psychological AI.
International Journal of Forecasting, In Press.
a
Changes in this revision from the original 23 January 2021 version are the addition of the word “not” in the
“Limitations” section on 25 January, the note on the title of the original version of this research note that is
referred to in Katsikopoulos et. al.’s (2022) reply, and a correction to the title of Katsikopoulos et. al. (2021) in
the References section.