Content uploaded by James R Knaub
Author content
All content in this area was uploaded by James R Knaub on Apr 25, 2015
Content may be subject to copyright.
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page1
When Prediction is Not Time Series Forecasting:
Note on Forecasting v Prediction in Samples for Continuous Data
James R. Knaub, Jr.
April 25, 2015
Abstract:
Statistical terminology is often confusing or even misleading. Consider "ignorable nonresponse,"
which is not "ignorable" at all. Perhaps the worst offender is so-called "significance." Here we
will discuss the problematic word, "prediction." It has been discussed on ResearchGate, and here
we will concentrate on the incompatibility of time series forecasting with prediction when
estimating totals for continuous data from sampling (in surveys, econometrics applications,
experiments, etc.). Here regressor data are on the population, from one or more other
sources. Unfortunately, "prediction," such as used in model-based survey estimation, is a term
that is often subsumed under the term "forecasting," but here we show why it is important not
to confuse these two terms.
Difference between
Prediction for Surveys/Econometrics/Experiments/etc.,
and Forecasting, particularly from a Time Series
Often, with continuous data, especially for official statistics, we may, for example, have repeated
establishment census surveys and more frequently repeated establishment sample
surveys. Here we may have regressor data on a population which may be related to a sample
survey, collected with or without randomization. (See Knaub(2015).) The regression used often
involves just one regressor, regression through the origin (a ratio model) - see Brewer (2002) -
and under perhaps quite reasonable conditions, the classical ratio estimator (CRE) can be "...hard
to beat..." (Cochran(1977), page 160).
When a regression is used to impute for missing data, whether due to nonresponse or mass
imputation for cases not in the sample, we call each estimate in place of an observed value a
"prediction." An idea of the error for these "predicted" values may be obtained through the
variance of the prediction error, noted, for example, in Maddala(2001), the square root of which
is generated, for example, as STDI in SAS PROC REG. Knaub(1999) shows the difference between
the estimated variance of the prediction error for an individual case as opposed to estimating
finite population totals, in the first several pages.
For regression modeling, variance is noted, but bias comes from model misspecification. See
Shmueli(2010). Note also that Shmueli seems to use the term "prediction" in place of forecasting,
rather than note work in survey prediction and econometrics where many authors (see
Brewer(1999), for example) use "prediction" as stated here. As in forecasting, the simpler the
model, the more general, and less susceptible to overfitting to a given set of test data.
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page2
Both prediction (from a regression model-based sample survey estimator) and time series
forecasting involve regression, but forecast modeling is typically more complex (sometimes
overly so). A prediction regression model looks at a given sample and estimates/'predicts' for
data not observed in the sample. A forecasting regression model looks at a time series of
responses by a given respondent, and from the trend (sometimes accounting for phenomena
such as seasonality), forecasts the next response or series of responses, for that given
respondent, to that given question.
Thus, a prediction for a member of a current population that is not in the currently observed
sample, is based on the regression relationship between all members of the sample and some
given set of regressor data - or more sets if there are more regressors. It is therefore important
to stratify these data into groups for which a single simple model fits well. These groupings may
not technically be strata if stratification is defined only to enhance a more aggregate level
estimate, as each group might be published separately, and a small area estimation scheme could
be involved (see Knaub(1999)).
At any rate, subpopulation groups are formed, within each only one model is to be used per
question, and one is interested in predicting for missing data based on regressor data, perhaps
from a previous census of the same data elements. This may happen in official statistics when
there is an annual census and monthly samples, or a monthly census and weekly samples.
However, when forecasting, one is often only looking at a given single response from a given
respondent (though it could be at a more aggregate level), and at each point in a time series, the
respondent has a value that contributes to that series - or a missing value there as well - and a
trend is approximated, such that one might forecast a coming response. This, of course, means
that any unanticipated change that may impact the series will contribute unknown error, which
is most substantial if those unanticipated changes are of primary importance. Thus the further
forward a forecast, the worse it might become, not only due to variance, but also changes that
could greatly impact the model could occur at any time.
Forecasting may help in planning, and if not too long term, may often be fairly accurate, as long
as a change point in the time series has not occurred. But if you are looking at a current sample
and want to estimate for missing data, prediction is what is needed, especially if you are
interested in a change, say a market change in an economic application, that is currently
occurring. A forecast cannot know about such a change.
To summarize, a prediction based on a relationship between a sample and regressor data is used
to basically impute for missing data, whatever the cause. Grouping data for modeling purposes
is very important, so the concept of stratification is important. However, a forecast for a missing
value in a sample is based only on the past history of that respondent, for that question. It is only
possible if there is a history (time series) for that respondent, so can only be used for imputation
for nonresponse or edit failed data. If there are missing values in the time series for that
respondent, that increases uncertainty associated with such an imputed value, and if there is a
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page3
break in the time series due to any current event, that may greatly decrease accuracy. This is not
a problem with prediction as described. It uses the current data in the regression.
So, although the prediction and forecasting described above both use regression, it is applied
very differently. Do you want to know a forecast, or are you looking at what is currently
occurring? Using the term "prediction" may be confusing for the latter, especially as it may be
confused with time series forecasting, but these are the terms with which we are 'stuck.'
Consider: "Prediction" does not mean the same thing in Brewer(1999), as in Shmueli(2010),
though from their titles, “Design-based or prediction-based inference?” and “To Explain or to
Predict?” respectively, one might expect that they did. But as will be seen below, we have to
distinguish between prediction as typically used in survey statistics for current data, and time
series forecasting. The error structures are very different.
Relevant online literature: Prediction v Forecasting –
OECD Definition, and Prediction Examples from Michigan State University and Onlinestatbook
So, “prediction” can mean “forecasting,” but in statistics, it can have another meaning, as noted
in OECD(2005): It can be the value found for y in a regression equation, based on the regressor
data x, whether or not there is a “temporal element.” (Note
https://stats.oecd.org/glossary/detail.asp?ID=3792 in the references.)
In model-based estimation for survey sampling, one finds that it is the latter definition that is
used, and we should not confuse this survey definition of “prediction” with time series
“forecasting” for two reasons: (1) they involve very different types of regression application and
theory, and (2) the estimates for y and variances involve very different, incompatible
structures/mechanisms, with errors based on different manners of variation, estimated for
different purposes. -- Further, a time series forecast will ignore current changes, as they do not
use the current data for the variables of interest.
In an Internet search, one may also find examples, showing how to interpret a prediction, which
is not a time series forecast. One such example for such a prediction (for illustration, but not
showing the usual heteroscedasticity, which would realistically, generally be present), is found at
Stocks(1999). There you will find a nice explanation and illustration of the “regression
(prediction) line.” (See, https://www.msu.edu/user/sw/statrev/strv203.htm.)
Among other discussions/examples one may also find available on the internet, which may be of
interest, consider Onlinestatbook(2015). This resource shows an interesting reversal of usual
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page4
heteroscedasticity, comparing GPAs between high school and college (near the end of
https://www.msu.edu/user/sw/statrev/strv203.htm). One might think of this example as a kind
of forecast since you can use the high school data as regressor data and college data for y, and in
this case the regressor data occur first and might 'forecast' the college GPA, but it is definitely
not a time series forecast. Also, in other examples, both x regressor data and y sample data may
exist in the same time frame, as when electric plant capacity is used to predict for missing
generation data in a given, single time frame. But often, a previous census is used as regressor
data to be able to “predict” for missing data from a current sample, as in the electric sales data
illustrated in a scatterplot on the last page of Knaub(2013). This is prediction as found in survey
statistics, not a time series forecast. Here heteroscedasticity means an increase in the variance
of the prediction error for y, as x becomes larger. However, in the scatterplot on the last page of
Knaub(2014), the errors are small, and heteroscedasticity may be visually imperceptible, but still
mathematically present in that establishment survey example with real data. Note that
stratification or otherwise grouping of data is important, so that one model (regression) applies
per group. The scatterplot shown at the end of Knaub(2013) is for one such data group. Usually
for these cases, we have one very good regressor, often the same data element in a previous,
less frequently collected census, as stated, but not always. Often, for electric generation, the
same data element/variable from a previous census is the best regressor, but for wind-powered
generation, nameplate capacity can do much better. (Sometimes, multiple regression might
help, say for fuel switching: Knaub(2003).) Often the model-based classical ratio estimator (CRE)
is very robust for these data. (See Knaub(2005), Knaub(2013), Brewer(2002), and consider page
160 in Cochran(1977).)
So, to review, although using a high school GPA as a predictor for college GPA, may sound like
forecasting something you will discover later, it is not time series forecasting. Here, and in the
survey sampling context, "prediction" is the correct term, even if you never obtain those
observations - and unless you are studying test data, you generally never will obtain observations
to compare to these “predicted” values. (However, as in design-based sampling and estimation,
testing is very important.)
Aside: The GPA example illustrates very well the idea of prediction that is not a time series
forecast. However, it also has some interesting peculiarities: Both x and y are limited in range
from 0 to 4. Generally, as x becomes larger, the standard error of the prediction error for y
becomes larger, so the coefficient of heteroscedasticity (Knaub(2011a)) is greater than 0, even
though the ratio of standard error of prediction error of y to y may generally become smaller
with larger x. Here, that coefficient of heteroscedasticity would be a negative number. (?) But
this may make sense, because the high school GPA values are from a special subpopulation of
high school students: those motivated and able to attend college, or pressured and able to attend
college. No high school GPAs from those who did not go to college can be used. (Thus one should
not expect prediction of college GPAs for those who did not attend to be very accurate! This
subgroup has no data.) The peculiar heteroscedasticity may be because those more motivated
in high school who go to college tend to be more motivated there also, and variance of the
prediction error for y is reduced at higher x, rather than the usual increase. This situation also
produces a peculiar intercept, if using linear regression with an intercept term, as done in the
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page5
online example. That intercept is very large here, and may be interpreted as saying that if one
had a GPA of zero in high school, but somehow went to college (?), her/his predicted college GPA
would be about 1.1. This is a bit nonsensical. The problem appears to be that logically, this
regression should go through the origin. It may be somewhat nonlinear, but with such large
residuals nearer to the origin that this cannot be readily shown. A ratio estimate through the
origin, with negative coefficient of heteroscedasticity (alternative ratio estimators are discussed
in Sarndal, Swensson, and Wretman(1992)), may really be best here. Comparing the estimated
variances of the prediction errors between such a heteroscedastic regression model through the
origin, and the simple regression model that was used, would be interesting. Also, in the model,
as used online, it might be instructive to know how the estimated standard error of the intercept
compared to the magnitude of the estimated intercept.
Example from US Energy Information Weekly Petroleum Sample Surveys
Weekly sampling of petroleum information occurs at the US Energy Information Administration
(EIA) in various surveys which basically represent strata for overall categories that could have
been part of a larger survey. Single regressor data are available for the same data elements (same
variables) from previous monthly census surveys. Thus a (robust) classical ratio estimator (CRE)
is used to relate monthly census and weekly sample data. However, referring to chapter 4 of
Lohr(2010), this has been documented at the EIA in the design-based CRE format, but applied in
the model-based CRE format, and the EIA also currently fails to supply relative standard error
(RSE) estimates, as could easily be done. (See Knaub(2011b).) The EIA also uses exponential
smoothing time series forecasts to impute for nonresponse, or replace edit rejected data. Such
forecasts are often very close to the predicted values for each case, but the forecasts must of
necessity (ie, by definition) have generally inferior performance when the market changes in such
a way that will impact the variable of interest, say stock level for a given petroleum product.
Forecasting is completely incapable of discerning a sudden change in the current market - it does not use
the data relevant to that - so that in such cases, though forecasts and predictions may often have
nearly identical results, and time series forecasts can sometimes even perform better, this is not
what will happen, in general, when it most matters.
Put another way, it is not logical to mix regression-model/prediction-based estimation for missing
current survey data (whether missing because it was not in the sample, or because of
nonresponse), with time series forecasting imputation for nonresponse. They have different
goals. The former has sampling error variance due to missing data from the current sample. The
latter has variance due to time series trends, and at crucial times would be substantially biased.
The variances are not compatible, the time series cases assume a trend change cannot happen,
and any attempt to bootstrap an overall variance could therefore be greatly misleading.
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page6
If you have currently collected data for a current sample or partial census, you want imputed
data - whether for nonresponse or out-of-sample cases - to be compatible with the other current
data, not a forecast which makes no use of current data. To confuse these is a classic case of
"mixing apples and oranges."
Conclusions regarding difference between time series forecast and prediction
for missing data in a current survey:
A time series forecast is based on past trends for a single response data series. Autocorrelation
is an important consideration. Seasonality may be taken into account. Exponential smoothing can
be done to give more emphasis to more recent responses. The first forecast (after the end of the
response series to date) will be the best estimate, with forecasts deteriorating as you look further
into the future. The biggest problem will be break points in the coming series that are unknown
to the forecasting mechanism. A forecast is completely blind to that. Thus, for example, if a new
influence comes into play, say a new invention or a major merger within a market, a time series,
which can only be based on past history, cannot know this. Therefore, if one were to use a time
series forecast of any kind to impute for a nonresponse in a current sample survey or partial
census survey, it may often be acceptable, but not when it really matters. That is, a current
survey is used to learn what is happening ... well ... currently! If one is trying to be alert to a
current change in a market, collecting current data, one would wish to substitute predictions, for
all missing data, that are based on the current data collected as modeled by one or more sets of
regressor data. A time series forecast that looks at only the history of a nonrespondent will not
do this. Further, any variance estimate will be for forecasting, knowing that breaks in the series
are ignored, and such variance estimates are not compatible with variance estimates for the
current data. When you are collecting current data, and some are missing, you want to ‘predict’
(estimate really) for those missing data by taking advantage of the other current data that you
have, and how they relate to regressor data. If you are only interested in a forecast, you have
data for that before starting to collect the data of interest.
A further, unrelated problem, with forecasting for imputation for nonresponse and edit failed
data, is that these are generally the worst time series data available. Those that nonrespond or
provide low quality data, are not likely to do so only once. If they nonrespond more than one
cycle in a row, you probably will not even get a good forecast.
A prediction for missing data in a current survey is based on a group relationship - so good
grouping or stratification is important - between the sample data (or partial census), and one or
more regressors. This relationship, unlike a time series regression for individual respondent
history, is based instead on the current data that have been collected. (Better than PPS sampling,
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page7
you can customize size measures in estimation by question on a survey, not having to rely on one
size measure for all.)
So, a current sample survey is not compatible with a forecast. But it is compatible with prediction
as defined by model-based survey sampling inference. Further, because the model-based
classical ratio estimator (CRE) is quite robust, and good groupings/stratification can greatly
reduce any nonignorability in the predictions (ie, we use one unique mechanism/regression
model per group), one may successfully use prediction/model-based estimation for all missing
data. Use of forecasting for nonrespondents is not compatible, ignores the goal of the survey to
determine the current circumstances, and is thus a needless, counterproductive
complication. Further, Knaub(2013) can be used to find effective sample sizes by independent
group, knowing that one may need to experiment iteratively with its application to obtain a good
estimate of overall sample size needs. For establishment surveys, a stratified cutoff sample using
the CRE may often be useful. See Karmel and Jain(1987), on stratification by size, and
Knaub(2014) regarding stratification by category (such as type of oil/gas well, with depth of well
as a proxy measure for grouping, when considering production, say from shale versus non-shale
[traditional] wells).
Summary:
A time series forecast is for an individual series over time, not including current data, with
possible autocorrelation, including possibly other regressors and complexities, and a generally
more complex error structure. However, a prediction of y by one or more regressors that is not
‘temporal’ in nature is based on observations by group/strata. The error structure is generally
less complex, but is virtually never homoscedastic. (See Knaub(2007).) This form of prediction is
found in survey statistics, econometrics, and other fields when we need to estimate for missing
data from a cross sectional data set, for any case where there are missing data from among a set
of respondents or experimental subjects. - Many may often refer to prediction and forecasting
interchangeably, but it is best to maintain a distinction, as shown in this paper. These two types
of regression are not compatible, and it is not logical to mix them.
Consider survey statistics, for example: A (time series) forecast estimates what will happen if a
pattern remains the same. But what if the point is to detect when a change has just now
occurred? Then a ‘non-temporal’ prediction is needed, such as the CRE, not a forecast, such as
exponential smoothing.
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page8
References
Brewer, K.R.W. (1999). Design-based or prediction-based inference? stratified random vs
stratified balanced sampling. Int. Statist. Rev., 67(1), 35-47.
Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold:
London and Oxford University Press.
Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
Karmel, T.S., and Jain, M. (1987), "Comparison of Purposive and Random Sampling Schemes for
Estimating Capital Expenditure," Journal of the American Statistical Association, Vol.82, pages 52-
57.
Knaub, J.R., Jr. (1999), “Using Prediction‐Oriented Software for Survey Estimation,” InterStat,
August 1999,
http://interstat.statjournals.net/YEAR/1999/abstracts/9908001.php?Name=908001 ‐ Short
version: “Using Prediction‐Oriented Software for Model‐Based and Small Area Estimation,”
Proceedings of the Survey Research Methods Section, American Statistical Association,
http://www.amstat.org/sections/srms/proceedings/papers/1999_115.pdf
https://www.researchgate.net/publication/261586154_Using_Prediction-
Oriented_Software_for_Survey_Estimation
Knaub, J.R., Jr. (2003), “Applied Multiple Regression for Surveys with Regressors of Changing
Relevance: Fuel Switching by Electric Power Producers,” InterStat, May 2003,
http://interstat.statjournals.net/YEAR/2003/abstracts/0305002.php?Name=305002
https://www.researchgate.net/publication/261586154_Using_Prediction-
Oriented_Software_for_Survey_Estimation
Knaub, J.R., Jr.(2005), “’Classical Ratio Estimator’ (Model-Based)” InterStat, October 2005,
http://interstat.statjournals.net/YEAR/2005/abstracts/0510004.php?Name=510004
https://www.researchgate.net/publication/261474011_The_Classical_Ratio_Estimator_%28Mo
del-Based%29
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page9
Knaub, J.R., Jr. (2007), “Heteroscedasticity and Homoscedasticity” in Encyclopedia of
Measurement and Statistics, Editor: Neil J. Salkind, Sage Publications, Vol. 2, pp. 431-432.
https://www.researchgate.net/publication/262972023_HETEROSCEDASTICITY_AND_HOMOSCE
DASTICITY
Knaub, J.R., Jr. (2011a), “Ken Brewer and the Coefficient of Heteroscedasticity as Used in Sample
Survey Inference,” Pakistan Journal of Statistics, Vol. 27(4), 2011, 397‐406, invited article for
special edition in honor of Ken Brewer’s 80th birthday, found at
http://www.pakjs.com/journals/27(4)/27(4)6.pdf .
https://www.researchgate.net/publication/261596397_KEN_BREWER_AND_THE_COEFFICIENT
_OF_HETEROSCEDASTICITY_AS_USED_IN_SAMPLE_SURVEY_INFERENCE
Knaub, J.R., Jr.(2011b), “Some Proposed Optional Estimators for Totals and their Relative
Standard Errors for a set of Weekly Cutoff Sample Establishment Surveys,” InterStat, July 2011,
http://interstat.statjournals.net/YEAR/2011/abstracts/1107004.php?Name=107004 .
https://www.researchgate.net/publication/261474159_Some_Proposed_Optional_Estimators_
for_Totals_and_their_Relative_Standard_Errors_for_a_set_of_Weekly_Quasi-
Cutoff_Sample_Establishment_Surveys
Knaub, J.R., Jr. (2013), “Projected Variance for the Model‐Based Classical Ratio Estimator:
Estimating Sample Size Requirements,” to be published in the Proceedings of the Survey Research
Methods Section, American Statistical Association,
https://www.amstat.org/sections/SRMS/Proceedings/y2013/Files/309176_82260.pdf,
https://www.researchgate.net/publication/261947825_Projected_Variance_for_the_Model-
based_Classical_Ratio_Estimator_Estimating_Sample_Size_Requirements
Knaub, J.R., Jr. (2014), “Efficacy of Quasi‐Cutoff Sampling and Model‐Based Estimation For
Establishment Surveys ‐ and Related Considerations, InterStat, January 2014,
http://interstat.statjournals.net/YEAR/2014/abstracts/1401001.php
https://www.researchgate.net/publication/261472614_Efficacy_of_Quasi-
Cutoff_Sampling_and_Model-
Based_Estimation_For_Establishment_Surveys_and_Related_Considerations
unpublished note When Prediction is not Forecasting
James R. Knaub, Jr … April 2015
Page10
Knaub, J.R., Jr.(2015), “Short Note on Various Uses of Models to Assist Probability Sampling and
Estimation,” unpublished note on ResearchGate.
https://www.researchgate.net/publication/274704886_Short_Note_on_Various_Uses_of_Mod
els_to_Assist_Probability_Sampling_and_Estimation
Lohr, S.L.(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole.
Maddala, G.S. (2001), Introduction to Econometrics, 3rd ed., Wiley.
OECD(2005), Organization for Economic Cooperation and Development (OECD) Glossary of
Statistical Terms, https://stats.oecd.org/glossary/detail.asp?ID=3792, last updated August 11,
2005, citing this source: “A Dictionary of Statistical Terms, 5th edition, prepared for the
International Statistical Institute by F.H.C. Marriott. Published for the International Statistical
Institute by Longman Scientific and Technical.”
Onlinestatbook(2015), “Introduction to Linear Regression,” Online Statistics Education: An
Interactive Multimedia Course of Study, Downloaded April 24, 2015, Developed by Rice University
(Lead Developer), University of Houston Clear Lake, and Tufts University.
http://onlinestatbook.com/2/regression/intro.html,
home page: http://onlinestatbook.com/2/index.html
Särndal C.-E, Swensson B., and Wretman, J.(1992), Model Assisted Survey Sampling, Springer.
Shmueli, G.(2010), “To Explain or to Predict?” Statistical Science, Vol. 25, No. 3 (August 2010), pp.
289-310. Published by: Institute of Mathematical Statistics. Article Stable URL:
http://www.jstor.org/stable/41058949
Stocks, J.T.(1999), “Correlation and Regression: The Regression (Prediction) Line,” Basic Stats
Review, Michigan State University, https://www.msu.edu/user/sw/statrev/strv203.htm,
Home page: https://www.msu.edu/user/sw/statrev/strev.htm
Please note that there have been question and answer threads on ResearchGate regarding the
definitions of "forecasting," and "prediction" that may be of interest to the reader.