Content uploaded by James R Knaub

Author content

All content in this area was uploaded by James R Knaub on Apr 25, 2015

Content may be subject to copyright.

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page1

When Prediction is Not Time Series Forecasting:

Note on Forecasting v Prediction in Samples for Continuous Data

James R. Knaub, Jr.

April 25, 2015

Abstract:

Statistical terminology is often confusing or even misleading. Consider "ignorable nonresponse,"

which is not "ignorable" at all. Perhaps the worst offender is so-called "significance." Here we

will discuss the problematic word, "prediction." It has been discussed on ResearchGate, and here

we will concentrate on the incompatibility of time series forecasting with prediction when

estimating totals for continuous data from sampling (in surveys, econometrics applications,

experiments, etc.). Here regressor data are on the population, from one or more other

sources. Unfortunately, "prediction," such as used in model-based survey estimation, is a term

that is often subsumed under the term "forecasting," but here we show why it is important not

to confuse these two terms.

Difference between

Prediction for Surveys/Econometrics/Experiments/etc.,

and Forecasting, particularly from a Time Series

Often, with continuous data, especially for official statistics, we may, for example, have repeated

establishment census surveys and more frequently repeated establishment sample

surveys. Here we may have regressor data on a population which may be related to a sample

survey, collected with or without randomization. (See Knaub(2015).) The regression used often

involves just one regressor, regression through the origin (a ratio model) - see Brewer (2002) -

and under perhaps quite reasonable conditions, the classical ratio estimator (CRE) can be "...hard

to beat..." (Cochran(1977), page 160).

When a regression is used to impute for missing data, whether due to nonresponse or mass

imputation for cases not in the sample, we call each estimate in place of an observed value a

"prediction." An idea of the error for these "predicted" values may be obtained through the

variance of the prediction error, noted, for example, in Maddala(2001), the square root of which

is generated, for example, as STDI in SAS PROC REG. Knaub(1999) shows the difference between

the estimated variance of the prediction error for an individual case as opposed to estimating

finite population totals, in the first several pages.

For regression modeling, variance is noted, but bias comes from model misspecification. See

Shmueli(2010). Note also that Shmueli seems to use the term "prediction" in place of forecasting,

rather than note work in survey prediction and econometrics where many authors (see

Brewer(1999), for example) use "prediction" as stated here. As in forecasting, the simpler the

model, the more general, and less susceptible to overfitting to a given set of test data.

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page2

Both prediction (from a regression model-based sample survey estimator) and time series

forecasting involve regression, but forecast modeling is typically more complex (sometimes

overly so). A prediction regression model looks at a given sample and estimates/'predicts' for

data not observed in the sample. A forecasting regression model looks at a time series of

responses by a given respondent, and from the trend (sometimes accounting for phenomena

such as seasonality), forecasts the next response or series of responses, for that given

respondent, to that given question.

Thus, a prediction for a member of a current population that is not in the currently observed

sample, is based on the regression relationship between all members of the sample and some

given set of regressor data - or more sets if there are more regressors. It is therefore important

to stratify these data into groups for which a single simple model fits well. These groupings may

not technically be strata if stratification is defined only to enhance a more aggregate level

estimate, as each group might be published separately, and a small area estimation scheme could

be involved (see Knaub(1999)).

At any rate, subpopulation groups are formed, within each only one model is to be used per

question, and one is interested in predicting for missing data based on regressor data, perhaps

from a previous census of the same data elements. This may happen in official statistics when

there is an annual census and monthly samples, or a monthly census and weekly samples.

However, when forecasting, one is often only looking at a given single response from a given

respondent (though it could be at a more aggregate level), and at each point in a time series, the

respondent has a value that contributes to that series - or a missing value there as well - and a

trend is approximated, such that one might forecast a coming response. This, of course, means

that any unanticipated change that may impact the series will contribute unknown error, which

is most substantial if those unanticipated changes are of primary importance. Thus the further

forward a forecast, the worse it might become, not only due to variance, but also changes that

could greatly impact the model could occur at any time.

Forecasting may help in planning, and if not too long term, may often be fairly accurate, as long

as a change point in the time series has not occurred. But if you are looking at a current sample

and want to estimate for missing data, prediction is what is needed, especially if you are

interested in a change, say a market change in an economic application, that is currently

occurring. A forecast cannot know about such a change.

To summarize, a prediction based on a relationship between a sample and regressor data is used

to basically impute for missing data, whatever the cause. Grouping data for modeling purposes

is very important, so the concept of stratification is important. However, a forecast for a missing

value in a sample is based only on the past history of that respondent, for that question. It is only

possible if there is a history (time series) for that respondent, so can only be used for imputation

for nonresponse or edit failed data. If there are missing values in the time series for that

respondent, that increases uncertainty associated with such an imputed value, and if there is a

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page3

break in the time series due to any current event, that may greatly decrease accuracy. This is not

a problem with prediction as described. It uses the current data in the regression.

So, although the prediction and forecasting described above both use regression, it is applied

very differently. Do you want to know a forecast, or are you looking at what is currently

occurring? Using the term "prediction" may be confusing for the latter, especially as it may be

confused with time series forecasting, but these are the terms with which we are 'stuck.'

Consider: "Prediction" does not mean the same thing in Brewer(1999), as in Shmueli(2010),

though from their titles, “Design-based or prediction-based inference?” and “To Explain or to

Predict?” respectively, one might expect that they did. But as will be seen below, we have to

distinguish between prediction as typically used in survey statistics for current data, and time

series forecasting. The error structures are very different.

Relevant online literature: Prediction v Forecasting –

OECD Definition, and Prediction Examples from Michigan State University and Onlinestatbook

So, “prediction” can mean “forecasting,” but in statistics, it can have another meaning, as noted

in OECD(2005): It can be the value found for y in a regression equation, based on the regressor

data x, whether or not there is a “temporal element.” (Note

https://stats.oecd.org/glossary/detail.asp?ID=3792 in the references.)

In model-based estimation for survey sampling, one finds that it is the latter definition that is

used, and we should not confuse this survey definition of “prediction” with time series

“forecasting” for two reasons: (1) they involve very different types of regression application and

theory, and (2) the estimates for y and variances involve very different, incompatible

structures/mechanisms, with errors based on different manners of variation, estimated for

different purposes. -- Further, a time series forecast will ignore current changes, as they do not

use the current data for the variables of interest.

In an Internet search, one may also find examples, showing how to interpret a prediction, which

is not a time series forecast. One such example for such a prediction (for illustration, but not

showing the usual heteroscedasticity, which would realistically, generally be present), is found at

Stocks(1999). There you will find a nice explanation and illustration of the “regression

(prediction) line.” (See, https://www.msu.edu/user/sw/statrev/strv203.htm.)

Among other discussions/examples one may also find available on the internet, which may be of

interest, consider Onlinestatbook(2015). This resource shows an interesting reversal of usual

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page4

heteroscedasticity, comparing GPAs between high school and college (near the end of

https://www.msu.edu/user/sw/statrev/strv203.htm). One might think of this example as a kind

of forecast since you can use the high school data as regressor data and college data for y, and in

this case the regressor data occur first and might 'forecast' the college GPA, but it is definitely

not a time series forecast. Also, in other examples, both x regressor data and y sample data may

exist in the same time frame, as when electric plant capacity is used to predict for missing

generation data in a given, single time frame. But often, a previous census is used as regressor

data to be able to “predict” for missing data from a current sample, as in the electric sales data

illustrated in a scatterplot on the last page of Knaub(2013). This is prediction as found in survey

statistics, not a time series forecast. Here heteroscedasticity means an increase in the variance

of the prediction error for y, as x becomes larger. However, in the scatterplot on the last page of

Knaub(2014), the errors are small, and heteroscedasticity may be visually imperceptible, but still

mathematically present in that establishment survey example with real data. Note that

stratification or otherwise grouping of data is important, so that one model (regression) applies

per group. The scatterplot shown at the end of Knaub(2013) is for one such data group. Usually

for these cases, we have one very good regressor, often the same data element in a previous,

less frequently collected census, as stated, but not always. Often, for electric generation, the

same data element/variable from a previous census is the best regressor, but for wind-powered

generation, nameplate capacity can do much better. (Sometimes, multiple regression might

help, say for fuel switching: Knaub(2003).) Often the model-based classical ratio estimator (CRE)

is very robust for these data. (See Knaub(2005), Knaub(2013), Brewer(2002), and consider page

160 in Cochran(1977).)

So, to review, although using a high school GPA as a predictor for college GPA, may sound like

forecasting something you will discover later, it is not time series forecasting. Here, and in the

survey sampling context, "prediction" is the correct term, even if you never obtain those

observations - and unless you are studying test data, you generally never will obtain observations

to compare to these “predicted” values. (However, as in design-based sampling and estimation,

testing is very important.)

Aside: The GPA example illustrates very well the idea of prediction that is not a time series

forecast. However, it also has some interesting peculiarities: Both x and y are limited in range

from 0 to 4. Generally, as x becomes larger, the standard error of the prediction error for y

becomes larger, so the coefficient of heteroscedasticity (Knaub(2011a)) is greater than 0, even

though the ratio of standard error of prediction error of y to y may generally become smaller

with larger x. Here, that coefficient of heteroscedasticity would be a negative number. (?) But

this may make sense, because the high school GPA values are from a special subpopulation of

high school students: those motivated and able to attend college, or pressured and able to attend

college. No high school GPAs from those who did not go to college can be used. (Thus one should

not expect prediction of college GPAs for those who did not attend to be very accurate! This

subgroup has no data.) The peculiar heteroscedasticity may be because those more motivated

in high school who go to college tend to be more motivated there also, and variance of the

prediction error for y is reduced at higher x, rather than the usual increase. This situation also

produces a peculiar intercept, if using linear regression with an intercept term, as done in the

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page5

online example. That intercept is very large here, and may be interpreted as saying that if one

had a GPA of zero in high school, but somehow went to college (?), her/his predicted college GPA

would be about 1.1. This is a bit nonsensical. The problem appears to be that logically, this

regression should go through the origin. It may be somewhat nonlinear, but with such large

residuals nearer to the origin that this cannot be readily shown. A ratio estimate through the

origin, with negative coefficient of heteroscedasticity (alternative ratio estimators are discussed

in Sarndal, Swensson, and Wretman(1992)), may really be best here. Comparing the estimated

variances of the prediction errors between such a heteroscedastic regression model through the

origin, and the simple regression model that was used, would be interesting. Also, in the model,

as used online, it might be instructive to know how the estimated standard error of the intercept

compared to the magnitude of the estimated intercept.

Example from US Energy Information Weekly Petroleum Sample Surveys

Weekly sampling of petroleum information occurs at the US Energy Information Administration

(EIA) in various surveys which basically represent strata for overall categories that could have

been part of a larger survey. Single regressor data are available for the same data elements (same

variables) from previous monthly census surveys. Thus a (robust) classical ratio estimator (CRE)

is used to relate monthly census and weekly sample data. However, referring to chapter 4 of

Lohr(2010), this has been documented at the EIA in the design-based CRE format, but applied in

the model-based CRE format, and the EIA also currently fails to supply relative standard error

(RSE) estimates, as could easily be done. (See Knaub(2011b).) The EIA also uses exponential

smoothing time series forecasts to impute for nonresponse, or replace edit rejected data. Such

forecasts are often very close to the predicted values for each case, but the forecasts must of

necessity (ie, by definition) have generally inferior performance when the market changes in such

a way that will impact the variable of interest, say stock level for a given petroleum product.

Forecasting is completely incapable of discerning a sudden change in the current market - it does not use

the data relevant to that - so that in such cases, though forecasts and predictions may often have

nearly identical results, and time series forecasts can sometimes even perform better, this is not

what will happen, in general, when it most matters.

Put another way, it is not logical to mix regression-model/prediction-based estimation for missing

current survey data (whether missing because it was not in the sample, or because of

nonresponse), with time series forecasting imputation for nonresponse. They have different

goals. The former has sampling error variance due to missing data from the current sample. The

latter has variance due to time series trends, and at crucial times would be substantially biased.

The variances are not compatible, the time series cases assume a trend change cannot happen,

and any attempt to bootstrap an overall variance could therefore be greatly misleading.

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page6

If you have currently collected data for a current sample or partial census, you want imputed

data - whether for nonresponse or out-of-sample cases - to be compatible with the other current

data, not a forecast which makes no use of current data. To confuse these is a classic case of

"mixing apples and oranges."

Conclusions regarding difference between time series forecast and prediction

for missing data in a current survey:

A time series forecast is based on past trends for a single response data series. Autocorrelation

is an important consideration. Seasonality may be taken into account. Exponential smoothing can

be done to give more emphasis to more recent responses. The first forecast (after the end of the

response series to date) will be the best estimate, with forecasts deteriorating as you look further

into the future. The biggest problem will be break points in the coming series that are unknown

to the forecasting mechanism. A forecast is completely blind to that. Thus, for example, if a new

influence comes into play, say a new invention or a major merger within a market, a time series,

which can only be based on past history, cannot know this. Therefore, if one were to use a time

series forecast of any kind to impute for a nonresponse in a current sample survey or partial

census survey, it may often be acceptable, but not when it really matters. That is, a current

survey is used to learn what is happening ... well ... currently! If one is trying to be alert to a

current change in a market, collecting current data, one would wish to substitute predictions, for

all missing data, that are based on the current data collected as modeled by one or more sets of

regressor data. A time series forecast that looks at only the history of a nonrespondent will not

do this. Further, any variance estimate will be for forecasting, knowing that breaks in the series

are ignored, and such variance estimates are not compatible with variance estimates for the

current data. When you are collecting current data, and some are missing, you want to ‘predict’

(estimate really) for those missing data by taking advantage of the other current data that you

have, and how they relate to regressor data. If you are only interested in a forecast, you have

data for that before starting to collect the data of interest.

A further, unrelated problem, with forecasting for imputation for nonresponse and edit failed

data, is that these are generally the worst time series data available. Those that nonrespond or

provide low quality data, are not likely to do so only once. If they nonrespond more than one

cycle in a row, you probably will not even get a good forecast.

A prediction for missing data in a current survey is based on a group relationship - so good

grouping or stratification is important - between the sample data (or partial census), and one or

more regressors. This relationship, unlike a time series regression for individual respondent

history, is based instead on the current data that have been collected. (Better than PPS sampling,

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page7

you can customize size measures in estimation by question on a survey, not having to rely on one

size measure for all.)

So, a current sample survey is not compatible with a forecast. But it is compatible with prediction

as defined by model-based survey sampling inference. Further, because the model-based

classical ratio estimator (CRE) is quite robust, and good groupings/stratification can greatly

reduce any nonignorability in the predictions (ie, we use one unique mechanism/regression

model per group), one may successfully use prediction/model-based estimation for all missing

data. Use of forecasting for nonrespondents is not compatible, ignores the goal of the survey to

determine the current circumstances, and is thus a needless, counterproductive

complication. Further, Knaub(2013) can be used to find effective sample sizes by independent

group, knowing that one may need to experiment iteratively with its application to obtain a good

estimate of overall sample size needs. For establishment surveys, a stratified cutoff sample using

the CRE may often be useful. See Karmel and Jain(1987), on stratification by size, and

Knaub(2014) regarding stratification by category (such as type of oil/gas well, with depth of well

as a proxy measure for grouping, when considering production, say from shale versus non-shale

[traditional] wells).

Summary:

A time series forecast is for an individual series over time, not including current data, with

possible autocorrelation, including possibly other regressors and complexities, and a generally

more complex error structure. However, a prediction of y by one or more regressors that is not

‘temporal’ in nature is based on observations by group/strata. The error structure is generally

less complex, but is virtually never homoscedastic. (See Knaub(2007).) This form of prediction is

found in survey statistics, econometrics, and other fields when we need to estimate for missing

data from a cross sectional data set, for any case where there are missing data from among a set

of respondents or experimental subjects. - Many may often refer to prediction and forecasting

interchangeably, but it is best to maintain a distinction, as shown in this paper. These two types

of regression are not compatible, and it is not logical to mix them.

Consider survey statistics, for example: A (time series) forecast estimates what will happen if a

pattern remains the same. But what if the point is to detect when a change has just now

occurred? Then a ‘non-temporal’ prediction is needed, such as the CRE, not a forecast, such as

exponential smoothing.

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page8

References

Brewer, K.R.W. (1999). Design-based or prediction-based inference? stratified random vs

stratified balanced sampling. Int. Statist. Rev., 67(1), 35-47.

Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold:

London and Oxford University Press.

Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.

Karmel, T.S., and Jain, M. (1987), "Comparison of Purposive and Random Sampling Schemes for

Estimating Capital Expenditure," Journal of the American Statistical Association, Vol.82, pages 52-

57.

Knaub, J.R., Jr. (1999), “Using Prediction‐Oriented Software for Survey Estimation,” InterStat,

August 1999,

http://interstat.statjournals.net/YEAR/1999/abstracts/9908001.php?Name=908001 ‐ Short

version: “Using Prediction‐Oriented Software for Model‐Based and Small Area Estimation,”

Proceedings of the Survey Research Methods Section, American Statistical Association,

http://www.amstat.org/sections/srms/proceedings/papers/1999_115.pdf

https://www.researchgate.net/publication/261586154_Using_Prediction-

Oriented_Software_for_Survey_Estimation

Knaub, J.R., Jr. (2003), “Applied Multiple Regression for Surveys with Regressors of Changing

Relevance: Fuel Switching by Electric Power Producers,” InterStat, May 2003,

http://interstat.statjournals.net/YEAR/2003/abstracts/0305002.php?Name=305002

https://www.researchgate.net/publication/261586154_Using_Prediction-

Oriented_Software_for_Survey_Estimation

Knaub, J.R., Jr.(2005), “’Classical Ratio Estimator’ (Model-Based)” InterStat, October 2005,

http://interstat.statjournals.net/YEAR/2005/abstracts/0510004.php?Name=510004

https://www.researchgate.net/publication/261474011_The_Classical_Ratio_Estimator_%28Mo

del-Based%29

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page9

Knaub, J.R., Jr. (2007), “Heteroscedasticity and Homoscedasticity” in Encyclopedia of

Measurement and Statistics, Editor: Neil J. Salkind, Sage Publications, Vol. 2, pp. 431-432.

https://www.researchgate.net/publication/262972023_HETEROSCEDASTICITY_AND_HOMOSCE

DASTICITY

Knaub, J.R., Jr. (2011a), “Ken Brewer and the Coefficient of Heteroscedasticity as Used in Sample

Survey Inference,” Pakistan Journal of Statistics, Vol. 27(4), 2011, 397‐406, invited article for

special edition in honor of Ken Brewer’s 80th birthday, found at

http://www.pakjs.com/journals/27(4)/27(4)6.pdf .

https://www.researchgate.net/publication/261596397_KEN_BREWER_AND_THE_COEFFICIENT

_OF_HETEROSCEDASTICITY_AS_USED_IN_SAMPLE_SURVEY_INFERENCE

Knaub, J.R., Jr.(2011b), “Some Proposed Optional Estimators for Totals and their Relative

Standard Errors for a set of Weekly Cutoff Sample Establishment Surveys,” InterStat, July 2011,

http://interstat.statjournals.net/YEAR/2011/abstracts/1107004.php?Name=107004 .

https://www.researchgate.net/publication/261474159_Some_Proposed_Optional_Estimators_

for_Totals_and_their_Relative_Standard_Errors_for_a_set_of_Weekly_Quasi-

Cutoff_Sample_Establishment_Surveys

Knaub, J.R., Jr. (2013), “Projected Variance for the Model‐Based Classical Ratio Estimator:

Estimating Sample Size Requirements,” to be published in the Proceedings of the Survey Research

Methods Section, American Statistical Association,

https://www.amstat.org/sections/SRMS/Proceedings/y2013/Files/309176_82260.pdf,

https://www.researchgate.net/publication/261947825_Projected_Variance_for_the_Model-

based_Classical_Ratio_Estimator_Estimating_Sample_Size_Requirements

Knaub, J.R., Jr. (2014), “Efficacy of Quasi‐Cutoff Sampling and Model‐Based Estimation For

Establishment Surveys ‐ and Related Considerations, InterStat, January 2014,

http://interstat.statjournals.net/YEAR/2014/abstracts/1401001.php

https://www.researchgate.net/publication/261472614_Efficacy_of_Quasi-

Cutoff_Sampling_and_Model-

Based_Estimation_For_Establishment_Surveys_and_Related_Considerations

unpublished note When Prediction is not Forecasting

James R. Knaub, Jr … April 2015

Page10

Knaub, J.R., Jr.(2015), “Short Note on Various Uses of Models to Assist Probability Sampling and

Estimation,” unpublished note on ResearchGate.

https://www.researchgate.net/publication/274704886_Short_Note_on_Various_Uses_of_Mod

els_to_Assist_Probability_Sampling_and_Estimation

Lohr, S.L.(2010), Sampling: Design and Analysis, 2nd ed., Brooks/Cole.

Maddala, G.S. (2001), Introduction to Econometrics, 3rd ed., Wiley.

OECD(2005), Organization for Economic Cooperation and Development (OECD) Glossary of

Statistical Terms, https://stats.oecd.org/glossary/detail.asp?ID=3792, last updated August 11,

2005, citing this source: “A Dictionary of Statistical Terms, 5th edition, prepared for the

International Statistical Institute by F.H.C. Marriott. Published for the International Statistical

Institute by Longman Scientific and Technical.”

Onlinestatbook(2015), “Introduction to Linear Regression,” Online Statistics Education: An

Interactive Multimedia Course of Study, Downloaded April 24, 2015, Developed by Rice University

(Lead Developer), University of Houston Clear Lake, and Tufts University.

http://onlinestatbook.com/2/regression/intro.html,

home page: http://onlinestatbook.com/2/index.html

Särndal C.-E, Swensson B., and Wretman, J.(1992), Model Assisted Survey Sampling, Springer.

Shmueli, G.(2010), “To Explain or to Predict?” Statistical Science, Vol. 25, No. 3 (August 2010), pp.

289-310. Published by: Institute of Mathematical Statistics. Article Stable URL:

http://www.jstor.org/stable/41058949

Stocks, J.T.(1999), “Correlation and Regression: The Regression (Prediction) Line,” Basic Stats

Review, Michigan State University, https://www.msu.edu/user/sw/statrev/strv203.htm,

Home page: https://www.msu.edu/user/sw/statrev/strev.htm

Please note that there have been question and answer threads on ResearchGate regarding the

definitions of "forecasting," and "prediction" that may be of interest to the reader.