Content uploaded by J. Scott Armstrong

Author content

All content in this area was uploaded by J. Scott Armstrong on Dec 16, 2014

Content may be subject to copyright.

Principles of Forecasting: A Handbook for Researchers and Practitioners,

J. Scott Armstrong (ed.): Norwell, MA: Kluwer Academic Publishers, 2001.

The Forecasting Dictionary

Updated: October 23, 2000

J. Scott Armstrong

The Wharton School, University of Pennsylvania, Philadelphia PA 19104

"But ‘glory’ doesn't mean "a nice knock -down argument," Alice objected.

"When I use a word," Humpty Dumpty said, in a rather scornful tone, "it

means just what I choose it to mean—neither more nor less."

"The question is," said Alice, "whether you can make words mean so many

different things."

"The question is," said Humpty Dumpty, "which is to be master—that's all."

Through the Looking Glass

Lewis Carroll

This dictionary defines terms as they are commonly used in forecasting. The aims, not always met, are to:

?? provide an accurate and understandable definition of each term,

?? describe the history of the term,

?? demonstrate how the term is used in forecasting,

?? point out how the term is sometimes misused, and

?? provide research findings on the value of the term in forecasting.

Acknowledgments

Geoff Allen and Robert Fildes inspired me to develop a comprehensive forecasting dictionary, and they provided

much advice along the way. The Oxford English Dictionary was helpful in developing this dictionary, partly for

definitions but, more important, for ideas as to what a dictionary can be. Most of the authors of Principles of

Forecasting provided definitions. Definitions were also borrowed from the glossaries in Armstrong (1985) and

Makridakis, Wheelwright and Hyndman (1998). Stephen A. DeLurgio added terms and revised definitions

throughout. Eric Bradlow reviewed all statistical terms, and Fred Collopy reviewed the complete dictionary. Sandy

D. Balkin, Robert G. Brown, Christopher Chatfield, Philip A. Klein, and Anne B. Koehler also made good

suggestions, and many others provided excellent help. The Forecasting Dictionary was posted in full text on the

Forecasting Principles website in October 1999 and e-mail lists were notified in an effort to obtain further peer

review; many suggestions were received as a result. Mary Haight, Ling Qiu, and Mariam Rafi provided editorial

assistance.

PRINCIPLES OF FORECASTING

2

Abbreviations and Acronyms

Following are commonly used symbols. I give preference to Latin letters rather than Greek.

Symbol Description Symbol Description

A Actual value of a forecasted event MdRAE Median Relative Absolute Error

?,??? alpha, beta, and gamma: smoothing

factors in exponential smoothing for

average, trend, and seasonality,

respectively, they represent the weights

placed on the latest value

MSE Mean Square Error

APE Absolute Percentage Error n sample size (number o

f observations, that

is the number of decision units or

number of years in a time series)

ARMA AutoRegressive Moving Average OLS Ordinary Least Squares

ARIMA AutoRegressive Integrated Moving

Average PI Prediction Interval

b measure of the impact of variable x on

the dependent variable Y in regression

analysis

p probability

e error r correlation coefficient

F Forecast value R2 coefficient of determination

G Growth or trend (it can be negative) RAE Relative Absolute Error

GMRAE Geometric Mean of the Relative

Absolute Error RMSE Root Mean Square Error

h forecast horizon S Seasonal factor

j period of the year t time; also a measure of statistical

significance

MAD Mean Absolute Deviation v number of variables

MAE Mean Absolute Error w weighting factor

MAPE Mean Absolute Percentage Error X explanatory or causal variable

MAPE Adjusted Mean Absolute Percentage

Error; in which the denominator is the

average of the forecasted and actual

values. Also called the Symmetric

MAPE.

Y dependent variable (variable to be

forecasted)

THE FORECASTING DICTIONARY 3

Terms

Underlined terms are defined elsewhere in the dictionary.

Terms are linked to relevant pages in Principles of Forecasting using PoF xxx.

Acceleration. A change in the trend, also including a negative change (deceleration). Although there have been

attempts to develop quantitative models of acceleration for forecasting in the social and management sciences, these

have not been successful. Of course, if one has good knowledge about its cause and its timing, acceleration can be a

critical part of a forecast. Consider this when skydiving and you need to predict when to open a parachute. PoF xxx

Accuracy. See forecast accuracy.

ACF. See autocorrelation function.

Actuarial prediction. A prediction based on empirical relationships among variables. See econometric model.

Adaptive Conjoint Analysis (ACA). A method conceived by Rich Johnson (of Sawtooth Software, Inc.) in which

self-explicated data are combined with paired-comparison preferences to estimate respondents’ utility functions.

ACA is a computer-interactive method in which the self-explicated data collected from a respondent influence the

characteristics of the paired objects shown to the respondent. PoFxxx

Adaptive parameters. A procedure that reestimates the parameters of a model when new observations become

available.

Adaptive response rate. A rule that instructs the forecasting model (such as exponential smoothing) to adapt more

quickly when it senses that a change in pattern has occurred. In many time-series forecasting methods, a trade-off

can be made between smoothing randomness and reacting quickly to changes in the pattern. Judging from 12

empirical studies (Armstrong 1985, p. 171), this strategy has not been shown to contribute much to accuracy,

perhaps because it does not use domain knowledge. PoFxxx

Adaptive smoothing. A form of exponential smoothing in which the smoothing constants are automatically

adjusted as a function of forecast errors. (See adaptive response rate.) PoFxxx

Additive model. A model in which terms are added. See also multiplicative model.

Adjusted Mean Absolute Percentage Error (MAPE). The absolute error is divided by the average of the forecast

and actual values. This has also been referred to as the Unbiased Absolute Percentage Error (UAPE) and as the

symmetric MAPE (sMAPE).

Adjusted R2.. (See also R2.) R2 adjusted for loss in the degrees of freedom. R2 is penalized by adjusting for the

number of parameters in the model compared to the number of observations. At least three methods have been

proposed for calculating adjusted R2: Wherry’s formula [1-(1-R2)·(n-1)/(n-v)], McNemar’s formula [1-(1-R2)·(n-

1)/(n-v-1)], and Lord’s formula [1-(1-R2)(n+v-1)/(n-v-1)]. Uhl and Eisenberg (1970) concluded that Lord’s formula

is most effective of these for estimating shrinkage. The adjusted R2 is always preferred to R2 when calibration data

are being examined because of the need to protect against spurious relationships. According to Uhl and Eisenberg,

some analysts recommend that the adjustment include all variables considered in the analysis. Thus, if an analyst

used ten explanatory variables but kept only three, R2 should be adjusted for ten variables. This might encourage

analysts to do a priori analysis. PoFxxx

Adjustment. A change made to a forecast after it has been produced. Adjustments are usually based on judgment,

but they can also be mechanical revisions (such as to adjust the level at the origin by half of the most recent forecast

error).

PRINCIPLES OF FORECASTING

4

AIC (Akaike Information Criterion). A goodness of fit measure that penalizes model complexity (based on the

number of parameters). The method with the lowest AIC is thought to represent the best balance of accuracy and

complexity. Also see BIC, the Bayesian Information Criterion, which imposes a stronger penalty for complexity.

AID (Automatic Interaction Detector). A procedure that makes successive two-way splits in the data to find

homogeneous segments that differ from one another. Also called tree analysis. Predictions can be made by

forecasting the size and typical behavior for each segment. As its name implies, this procedure is useful for

analyzing situations in which interactions are important. On the negative side, it requires much data so that each

segment (cell size) is large enough (certainly greater than ten, judging from Einhorn’s [1972] results). The evidence

for its utility in forecasting is favorable but limited. Armstrong and Andress (1970) analyzed data from 2,717 gas

stations using AID and regression. To keep knowledge constant, exploratory procedures (e.g., stepwise regression)

were used. Predictions were then made for 3,000 stations in a holdout sample. The MAPE was much lower for AID

than for regression (41% vs. 58%). Also, Stuckert (1958) found trees to be more accurate than regression in

forecasting the academic success of about one thousand entering college freshmen. See also segmentation. PoFxxx

Akaike Information Criterion. See AIC.

Algorithm. A systematic set of rules for solving a particular problem. A program, function, or formula for analyzing

data. Algorithms are often used when applying quantitative forecasting methods.

Amalgamated forecast. A seldom-used term that means combined forecast. See combining forecasts.

Analogous time series. Time-series data that are expected to be related and are conceptually similar. Such series

are expected to be affected by similar factors. For example, an analyst could group series with similar causal forces.

Although such series are typically correlated, correlation is not sufficient for series to be analogous. Statistical

procedures (such as factor analysis) for grouping analogous series have not led to gains in forecast accuracy. See

Duncan, Gorr and Szczyula (2001). PoFxxx

Analogy. A resemblance between situations as assessed by domain experts. A forecaster can think of how similar

situations turned out when making a forecast for a given situation (see also analogous time series). PoFxxx

Analytic process. A series of steps for processing information according to rules. An analytic process is explicit,

sequential, and replicable.

Anchoring. The tendency of judges’ estimates (or forecasts) to be influenced when they start with a “convenient”

estimate in making their forecasts. This initial estimate (or anchor) can be based on tradition, previous history, or

available data. In one study that demonstrates anchoring, Tversky and Kahneman (1974) asked subjects to predict

the percentage of nations in the United Nations that were African. They selected an initial value by spinning a wheel

of fortune in the subject’s presence. The subject was asked to revise this number upward or downward to obtain an

answer. The information-free initial value had a strong influence on the estimate. Those starting with 10% made

predictions averaging 25%. In contrast, those starting with 65% made predictions averaging 45%. PoFxxx

Anticipations. See expectations.

A posteriori analysis. Analysis of the performance of a model that uses actual data from the forecast horizon. Such

an analysis can help to determine sources of forecast errors and to assess whether the effects of explanatory

variables were correctly forecasted. PoFxxx

A priori analysis. A researcher's analysis of a situation before receiving any data from the forecast horizon. A priori

analysis might rely on domain knowledge for a specific situation obtained by interviewing experts or information

from previously published studies. In marketing, for example, analysts can use meta-analyses to find estimates of

price elasticity (for example, see Tellis 1988) or advertising elasticity (Sethuraman and Tellis 1991). To obtain

information about prior research, one can search the Social Science Citation Index ( SSCI) or A Bibliography of

Business and Economic Forecasting (Fildes, Dews and Howell 1981). The latter contains references to more than

4,000 studies taken from 40 journals published from 1971 to 1978. A revised edition was published in 1984 by the

THE FORECASTING DICTIONARY 5

Manchester Business School, Manchester, England. It can guide you to older sources that are difficult to locate

using electronic searches. Armstrong (1985) describes the use of a priori analysis for econometric models. PoFxxx

AR model. See AutoRegressive model.

ARCH model. (Autoregressive conditionally heteroscedastic model.) A model that relates the current error variance

to previous values of the variable of interest through an autoregressive relationship. ARCH is a time-series model in

which the variance of the error term may change. Various formulations exist, of which the most popular is GARCH.

ARIMA. (AutoRegressive Integrated Moving Average model.) A broad class of time -series models that, when

stationarity has been achieved by differencing, follows an ARMA model. See stationary series.

ARMA model. (AutoRegressive Moving Average.) A type of time-series forecasting model that can be

autoregressive (AR), moving average (MA), or a combination of the two (ARMA). In an ARMA model, the series

to be forecast is expressed as a function of previous values of the series (autoregressive terms) and previous error

terms (the moving average terms). PoFxxx

Assessment center tests. A battery of tests to predict how well an individual will perform in an organization. Such

tests are useful when one lacks evidence on how a candidate has performed on similar tasks. The procedure is

analogous to combining forecasts. Hinrichs (1978) conducted a long-term follow-up of the predictive validity of

assessment centers. PoFxxx

Asymmetric errors. Errors that are not distributed symmetrically about the mean. This is common when trends are

expressed in units (not percentages) and when there are large changes in the variable of interest. The forecaster

might formulate the model with original data for a variety of reasons such as the presence of large measurement

errors. As a result, forecast errors would tend to be skewed, such that they would be larger for cases when the actual

(for the dependent variable) exceeded the forecasts. To deal with this, transform the forecasted and actual values to

logs and use the resulting errors to construct prediction intervals (which are more likely to be symmetric), and then

report the prediction intervals in original units (which will be asymmetric). However, this will not solve the

asymmetry problem for contrary series. For details, see Armstrong and Collopy (2000). PoFxxx

Asymptotically unbiased estimator. An estimator whose bias approaches zero as the sample size increases. See

biased estimator.

Attraction market-share model. A model that determines market share for a brand by dividing a measure of the

focal brand’s marketing attractiveness by the sum of the attractiveness scores for all brands assumed to be in the

competitive set. It is sometimes referred to as the US/(US + THEM) formulation. PoFxxx

Attributional bias. A bias that arises when making predictions about the behavior of a person (or organization)

based upon the person’s (or organization’s) traits, even when the situation is the primary cause of behavior. (See

Plous, 1993, Chapter 16.)

Autocorrelation. The correlation between values in a time series at time t and time t-k for a fixed lag k. Frequently,

autocorrelation refers to correlations among adjacent time periods (lag 1 autocorrelation). There may be an

autocorrelation for a time lag of one period, another autocorrelation for a time lag of two, and so on. The residuals

serve as surrogate values for the error terms. There are several tests for autocorrelated errors. The Box-Pierce test

and the Ljung-Box test check whether a sequence of autocorrelations is significantly different from a sequence of

zeros; the Durbin-Watson statistic checks for first-order autocorrelations. PoFxxx

Autocorrelation function (ACF). The series of autocorrelations for a time series at lags 1, 2, ... . A plot of the ACF

against the lag is known as the correlogra m. ACF can be used for several purposes, such as to identify the presence

and length of seasonality in a given time series, to identify time-series models for specific situations, and to

determine whether the data are stationary. See stationary series.

PRINCIPLES OF FORECASTING

6

Automatic forecasting program. A program that, without user instructions, selects a forecasting method for each

time series under study. Also see batch forecasting. The method-selection rules differ across programs but are

frequently based on comparisons of the fitting or forecasting accuracy of a number of specified methods. Tashman

and Leach (1991) evaluate these procedures. PoFxxx

Automatic Interaction Detector. See AID. PoFxxx

AutoRegressive (AR) model. A form of regression analysis in which the dependent variable is related to past

values of itself at varying time lags. An autoregressive model would express the forecast as a function of previous

values of that time series data (e.g., Yt = a + bYt-I + et, where a and b are parameters and et is an error term). PoFxxx

AutoRegressive Conditionally Heterosedastic model. See ARCH.

Availability heuristic. A rule of thumb whereby people assess the probability of an event by the ease with which

they can bring occurrences to mind. For example, which is more likely – to be killed by a falling airplane part or by

a shark? Shark attacks receive more publicity, so most people think they are more likely. In fact, the chance of

getting killed by falling airplane parts is 30 times higher. Plous (1993, Chapter 11) discusses the availability

heuristic. This heuristic can produce poor judgmental forecasts. It can be useful, however, in developing plausible

scenarios. PoFxxx

Backcasting. Predicting what occurred in a time period prior to the period used in the analysis. Sometimes called

postdiction, that is, predicting backward in time. It can be used to test predictive validity. Also, backcasting can be

used to establish starting values for extrapolation by applying the forecasting method to the series starting from the

latest period of the calibration data and going to the beginning of these data. See Armstrong (2001d) and PoFxxx

Backward shift operator. A notation aid where the letter B denotes a backward shift of one period. Thus, B

operating on Xt (noted as BXt) yields, by definition, Xt-1. Similarly BB or B2 is the same as shifting back by two

periods. A first difference (Xt-Xt-1) for a time series can be denoted (1 – B) Xt. A second-order difference is denoted

(1 – B)2 Xt . See differencing.

Baffelgab. Professional jargon that confuses more than it clarifies. Writing that sounds impressive while saying

nothing. The term bafflegab was coined in 1952 by Milton A. Smith, assistant general counsel for the American

Chamber of Commerce. He won a prize for the word and its definition: “multiloquence characterized by a

consummate interfusion of circumlocution and other familiar manifestations of abstruse expatiation commonly

utilized for promulgations implementing procrustean determinations by governmental bodies.” Consultants and

academics also use bafflegab. Armstrong (1980a) showed that academics regard journals that are difficult to read as

more prestigious than those that are easy to read. The paper also provided evidence that academics rated authors as

more competent when their papers were rewritten to make them harder to understand. Researchers in forecasting are

not immune to this affliction. PoFxxx

Base period. See calibration data.

Base rate. The typical or average behavior for a population. For example, to predict the expected box-office

revenues for a movie, use those for a typical movie. PoFxxx

Basic research. Research for which the researcher has no idea of its potential use and is not motivated by any

specific application. This is sometimes called pure research. One assumption is that eventually someone will find

out how to use the research. Another assumption is that if enough researchers do enough research, eventually

someone will discover something that is useful. PoFxxx

Basic trend. The long-term change in a time series. The basic trend can be measured by a regression analysis

against time. Also called secular trend. PoFxxx

Batch forecasting. Forecasting in which a prespecified set of instructions is used in forecasting individual time

series that are part of a larger group of time series. The forecasting method may be predesignated by the user or may

THE FORECASTING DICTIONARY 7

rely on automatic forecasting. If the group has a hierarchical structure (see product hierarchy), the batch-processing

program may allow reconciliation of item and group-level forecasts. For details and relevant software programs, see

Tashman and Hoover (2001). PoFxxx

Bayesian analysis. A procedure whereby new information is used to update previous information. PoFxxx

Bayesian Information Criterion. See BIC.

Bayesian methods. A recursive estimation procedure based on Bayes' theorem that revises the parameters of a

model as new data become available.

Bayesian pooling. A method that improves estimation efficiency or speed of adapting time-varying parameters

models by using data from analogous time series. PoFxxx

Bayesian Vector AutoRegressive (BVAR) model. A multivariate model whose parameters are based on

observations over time and a cross-section of observational units that uses a set of lagged variables and Bayesian

methods.

Benchmark forecasts. Forecasts used as a basis for comparison. Benchmarks are most useful if based on the

specific situation, such as forecasts produced by the current method. For general purposes, Mentzer and Cox (1984)

examined forecasts errors for various levels in the product hierarchy and for different horizons as shown here:

Typical Errors For Sales Forecasts (Entries are MAPEs)

Forecast Horizon

Level Under 3 Months 3 Months to 2 Years Over 2 Years

Industry

Corporate

Product group

Product Line

Product

8

7

10

11

16

11

11

15

16

21

15

18

20

20

26

Source: Mentzer and Cox’s (1984) survey results from 160 corporations are crude because most firms do

not keep systematic records. Further, the study was ambiguous in its definitions of the time

interval. “Under 3 months” probably refers to “monthly” in most cases, but the length of time is

not apparent for “Over 2 years.”

BFE (Bold Freehand Extrapolation). The process of extending an historical time series by judgment. See

judgmental extrapolation.

Bias. A systematic error; that is, deviations from the true value that tend to be in one direction. Bias can occur in any

type of forecasting method, but it is especially common in judgmental forecasting. Researchers have identified many

biases in judgmental forecasting. Bias is sometimes a major source of error. For example, Tull (1967) and Tyebjee

(1987) reported a strong optimistic bias for new product forecasting. Some procedures have been found to reduce

biases (Fischhoff and MacGregor 1982). Perhaps the most important way to control for biases is to use structured

judgment.

Biased estimator. An estimate in which the statistic differs from the population parameter. See asymptotically

unbiased estimator.

BIC (Bayesian Information Criterion). Also called the Schwarz criterion. Like the AIC, the BIC is a criterion

used to select the order of time -series models. Proposed by Schwarz (1978), it sometimes leads to less complex

models than the AIC. Several studies have found the BIC to be a better model selection criterion than the AIC.

BJ methods. See Box-Jenkins methods.

PRINCIPLES OF FORECASTING

8

Bold Freehand Extrapolation. See BFE.

Bootstrapping. In forecasting, bootstrapping typically refers to judgmental bootstrapping. Bootstrapping is also a

term used by statisticians to describe estimation methods that reuse a sample of data. It calls for taking random

samples from the data with replacement, such that the resampled data have similar properties to the original sample.

Applying these ideas to time-series data is difficult because of the natural ordering of the data. Statistical

bootstrapping methods are computationally intensive and are used when theoretical results are not available. To

date, statistical bootstrapping has been of little use to forecasters, although it might help in assessing prediction

intervals for cross-sectional data. PoFxxx

Bottom-up. A procedure whereby the lowest-level disaggregate forecasts in a hierarchy are added to produce a

higher-level forecast of the aggregate. (See also segmentation.) PoFxxx

Bounded values. Values that are limited. For example, many series can include only non-negative values. Some

have lower and upper limits. (Percentages are limited between zero and one hundred.) When the values are bounded

between zero and one, consider using a transformation such as the logit. If a transformation is not used, ensure that

the forecasts do not go beyond the limits. PoFxxx

Box-Jenkins (BJ) methods. The application of autoregressive-integrated-moving average (ARIMA) models to

time-series forecasting problems. Originally developed in the 1930s, the approach was not widely known until Box

and Jenkins (1970) published a detailed description. It is the most widely cited method in extrapolation, and it has

been used by many firms. Mentzer (1995) found that analysts in 38% of the 205 firms surveyed were familiar with

BJ, it was used in about one-quarter of these firms, and about 44% of those familiar with it were satisfied. This

satisfaction level can be compared with 72% satisfaction with exponential smoothing in the same survey. Contrary

to early expectations, empirical studies have shown that it has not improved forecast accuracy of extrapolation

methods. PoFxxx

Box-Pierce test. A test for autocorrelated errors. The Box-Pierce Q statistic is computed as the weighted sum of

squares of a sequence of autocorrelations. If the errors of the model are white noise, then the Box-Pierce statistic is

distributed approximately as a chi-square distribution with h – v degrees of freedom, where h is the number of lags

used in the statistic and v is the number of fitted parameters other than a constant term. It is sometimes known as a

portmanteau test. Another portmanteau test is the Ljung-Box test, which is a version of the Box-Pierce test.

Brainstorming. A structured procedure for helping a group to generate ideas. The basic rules are to suspend

evaluation and to keep the session short (say ten minutes). To use brainstorming effectively, one should first gain the

group’s agreement to use brainstorming. Then, select a facilitator who

- encourages quantity of ideas,

- encourages wild or potentially unpopular ideas,

- reminds the group not to evaluate (either favorably or unfavorably),

- does not introduce his or her own ideas, and

- records all ideas.

When people follow the above procedures carefully, brainstorming greatly increases the number of creative ideas

they suggest in comparison with traditional group meetings. This is because it removes some (but not all) of the

negative effects of the group process. Brainwriting (individual idea generation) is even more effective than

brainstorming, assuming that people will work by themselves. One way to do this is to call a meeting and then

allocate, say, ten minutes for brainwriting. Brainwriting is particularly effective because everyone can generate

ideas (i.e., no facilitator is needed). The sources of the ideas are not identified. Brainstorming or brainwriting can be

used with econometric models to create a list of explanatory variables and to find alternative ways of measuring

variables. It can also be used to create a list of possible decisions or outcomes that might occur in the future, which

could be useful for role-playing and expert opinions. Brainstorming is often confused with “talking a lot,” which is

one of the deplorable traits of unstructured or leaderless group meetings.

Brier score. A measure of the accuracy of a set of probability assessments. Proposed by Brier (1950), it is the

average deviation between predicted probabilities for a set of events and their outcomes, so a lower score represents

THE FORECASTING DICTIONARY 9

higher accuracy. In practice, the Brier score is often calculated according to Murphy’s (1972) partition into three

additive components. Murphy’s partition is applied to a set of probability assessments for independent-event

forecasts when a single probability is assigned to each event:

?? ?? ?????? T

ttttt

T

tt,c)(cn

N

)c(pn

N

c)c(B

1

22

1

11

1

where c is the overall proportion correct, ct is the proportion correct in category t, pt is the probability assessed for

category t, nt is the number of assessments in category t, and N is the total number of assessments. The first term

reflects the base rate of the phenomenon for which probabilities are assessed (e.g., overall proportion of correct

forecasts), the second is a measure of the calibration of the probability assessments, and the third is a measure of the

resolution. Lichtenstein, Fischhoff and Phillips (1982) provide a more complete discussion of the Brier score for the

evaluation of probability assessments.

Brunswick lens model. (See lens model.)

Business cycle. Periods of economic expansion followed by periods of economic contraction. Economic cycles tend

to vary in length and magnitude and are thought of as a separate component of the basic pattern contained in a time

series. Despite their popularity, the use of business cycles has not been shown to lead to more accurate forecasting.

PoFxxx

BVAR model. See Bayesian Vector AutoRegression model.

Calibrate. (1) To estimate relationships (and constant terms) for use in a forecasting model. (See also fit.) Some

software programs erroneously use the term forecast to mean calibrate. (2) To assess the extent to which estimated

probabilities agree with actual probabilities. In that case, calibration curves plot the predicted probability on the x-

axis and the actual probability on the y-axis. A probability assessor is perfectly calibrated when the events or

forecasts assigned a probability of X occur X percent of the time for all categories of probabilities assessed.

Calibration data. The data used in developing a forecasting model. (See also fit.) PoFxxx

Canonical correlations. A regression model that uses more than one dependent variable and more than one

explanatory variable. The canonical weights provide an index for the dependent variables but without a theory.

Despite a number of attempts, it seems to have no value for forecasting (e.g., Fralicx and Raju, 1982, tried but

failed).

Case-based reasoning. Reasoning based on memories of past experiences. Making inferences about new situations

by looking at what happened in similar cases in the past. (See analogy.)

Causal chain. A sequence of linked effects; for example, A causes B which then causes C. The potential for error

grows at each stage, thus reducing predictive ability. However, causal chains lead judgmental forecasters to think the

outcomes are more likely because each step seems plausible. Causal chains are useful in developing scenarios that

seem plausible. PoFxxx

Causal force. The net directional effect domain experts expect for a time series over the forecast horizon.

Armstrong and Collopy (1993) classified them as growth, decay, opposing, regressing, or supporting forces. The

typical assumption behind extrapolation is supporting, but such series are rare. Armstrong, Adya and Collopy (2001)

discuss evidence related to the use of causal forces. PoFxxx

Causal model. A model in which the variable of interest (the dependent variable) is related to various explanatory

variables (or causal variables) based on a specified theory.

PRINCIPLES OF FORECASTING

10

Causal relationship. A relationship whereby one variable, X, produces a change in another variable, Y, when

changes in X are either necessary or sufficient to bring about a change in Y, and when the change in X occurs before

the change in Y. Einhorn and Hogarth (1982) discuss causal thinking in forecasting.

Causal variable. A variable, X, that produces changes in another variable, Y, when changes in X affect the

probability of Y occurring, and a theory offers an explanation for why this relationship might hold.

Census Program X-12. A computer program developed by the U.S. Bureau of the Census. (See X-12 ARIMA

decomposition.) The program is available at no charge; details can be found at hops.wharton.upenn.edu/forecast

Census II. A refinement of the classical method that decomposes time series into seasonal, trend, cycle, and random

components that can be analyzed separately. The Census II method X-11 decomposition, has been superseded by the

X-12-ARIMA decomposition method. The programs contain excellent procedures for seasonal adjustments of

historical data. However, the developers did not seem to be concerned about how these factors should be used in

forecasting.

Central limit theorem. The sampling distribution of the mean of n independent sample values will approach the

normal distribution as the sample size increases regardless of the shape of the population distribution. This applies

when the sample size is large enough for the situation. Some people suggest 30 as adequate for a typical situation.

Chow test. A test that evaluates whether a subsample of data, excluded from the model when it was estimated, can

be regarded as indistinguishable from the data used for estimation. That is, it measures whether two samples of data

are drawn from the same population. If so, the coefficient estimates in each sample are considered to be identical.

For details, see an econometric textbook. An alternative viewpoint, which some favor, would be to use a priori

analysis to decide whether to combine estimates from different sets of data.

Classical decomposition method. A division of a time series into seasonal, trend, and error components. These

components can then be analyzed individually. See also Census II. PoFxxx

Classification method. (See segmentation.)

Clinical judgment. (See expert opinions.)

Coefficient. An estimate of a relationship in an econometric model.

Coefficient of determination. See R2.

Coefficient of inequality. See Theil’s U.

Coefficient of variation. The standard deviation divided by the mean. It is a measure of relative variation and is

sometimes used to make comparisons across variables expressed in different units. It is useful in the analysis of

relationships in econometric or judgmental bootstrapping models. Without variation in the data, one may falsely

conclude that a variable in a regression analysis is unimportant for forecasting. Check the coefficients of variation to

see whether the dependent and explanatory variables have fluctuated substantially. If they have not, seek other ways

of estimating the relationships. For example, one might use other time-series, cross-sectional, longitudinal or

simulated data. Alternatively, one could use a priori estimates as relationships, basing these on prior research or on

domain knowledge.

Cognitive dissonance. An uncomfortable feeling that arises when an individual has conflicting attitudes about an

event or object. The person can allay this feeling by rejecting dissonant information. For example, a forecast with

dire consequences might cause dissonance, so the person might decide to ignore the forecast. Another dissonance-

reduction strategy is to fire the forecaster.

Cognitive feedback. A form of feedback that includes information about the types of errors in previous forecasts

and reasons for these errors. PoFxxx

THE FORECASTING DICTIONARY 11

Coherence. The condition when judgmental inputs to a decision-making or forecasting process are internally

consistent with one another. For example, to be coherent, the probabilities for a set of mutually exclusive and

exhaustive events should sum to unity.

Cohort model. A model that uses data grouped into segments (e.g., age 6 to 8, or first year at college, or start-up

companies) whose behavior is tracked over time. Predictions are made for the cohorts as they age. Cohort models

are commonly used in demographic forecasting. For example, an analyst could forecast the number of students

entering high school in six years by determining the number of students currently in the third-grade cohort in that

region (assuming no deaths or net migration). PoFxxx

Cointegration. The co-movement of two or more non-stationary variables over time. If two variables are

cointegrated, regression of one variable on the other results in a set of residuals that is stationary. Existence of this

long-run equilibrium relationship allows one to impose parameter restrictions on a Vector AutoRegressive model

(VAR). The restricted VAR can be expressed in various ways, one of which is the error correction model. With

more than two non-stationary variables, it is possible to have more than one long-run equilibrium relationship

among them.

Combining forecasts. The process of using different forecasts to produce another forecast. Typically, the term

refers to cases where the combining is based on an explicit, systematic, and replicable scheme, such as the use of

equal weights. If subjective procedures are used for averaging, they should be fully disclosed and replicable.

Combining forecasts should not be confused with combining forecasting methods. Combining is inexpensive and

almost always improves forecast accuracy in comparison with the typical component. It also helps to protect against

large errors. See Armstrong (2001e) and PoFxxx

Commensurate measure. An explicit measure that is common to all elements in a category. If the category is a set

of candidates for a job and the task is to select the best candidate, a commensurate measure would be one that all

candidates have in common, such as their grade-point average in college. When trying to predict which candidate

will be most successful, selectors tend to put too much weight on commensurate measures, even if the measures are

irrelevant, thus reducing forecast accuracy (Slovic and McPhillamy 1974). PoFxxx

Comparison group. A benchmark group used for comparison to a treatment group when predicting the effects of a

treatment. See control group.

Compensatory model. A model that combines variables (cues) to form a prediction. It is compensatory because

high values for some cues can compensate for low values in other cues. Adding and averaging are compensatory

models.

Composite forecast. A combined forecast. (See combining forecasts.)

Composite index. A group of indicators that are combined to permit analysts to monitor economic activity. In

business-cycle analysis, composite indexes of leading, coincident, and lagging indicators have similar timing and are

designed to predict turning points in business cycles. See cyclical data.

Conditional forecast. A forecast that incorporates knowledge (or assumptions) about the values of the explanatory

variables over the forecast horizon. Also called an ex post forecast.

Confidence interval. An expression of uncertainty. The likelihood that the true value will be contained with a given

interval. The 95% confidence level is conventional but arbitrary; ideally, one would choose a limit that balances

costs and benefits, but that is seldom easy to do. In forecasting, the term confidence interval refers to the uncertainty

associated with the estimate of the parameter of a model, while the term prediction interval refers to the uncertainty

of a forecast. Confidence intervals play a role in judgmental bootstrapping and econometric models by allowing one

to assess the uncertainty for an estimated relationship (such as price elasticity). This, in turn, might indicate the need

for more information or for the development of contingency plans.

PRINCIPLES OF FORECASTING

12

Conjoint analysis. A methodology that quantifies how respondents trade off conflicting object characteristics

against each other in a compensatory model. For example, alternative products could be presented to subjects with

the features varied by experimental design. Subjects would be asked to state their preferences (through ratings,

rankings, intentions, or choices). The importance of each feature is assessed by statistical analysis. Software

packages are available to aid the process. See Wittink and Bergestuen (2001) and PoFxxx

Conjunction fallacy. The notion that the co-occurrence of two events is more likely than the occurrence of either

event alone. When people are asked to predict the outcomes of events, the added detail, especially when

representative of the situation, leads them to increase their estimate of the likelihood of their joint occurrence. For

example, in one study, people thought that President Reagan was more likely to provide more federal support for

unwed mothers and cut federal support to local governments than he was to simply provide more federal support for

unwed mothers (Tversky and Kahneman 1983). See representativeness.

Conjunctive model. A nonlinear model that combines variables (cues) to ensure that scores on all variables must be

high before the forecast generated by the model will be high.

Consensus. Agreement of opinions; the collective unanimous opinion of a number of persons. A feeling that the

group’s conclusion represents a fair summary of the conclusions reached by the individual members.

Consensus seeking. A structured process for achieving consensus. Consensus seeking can be useful in deciding how

to use a forecast. It can help groups to process information and to resolve conflicts. In practice, complete unanimity

is rare. However, each individual should be able to accept the group's conclusion. Consensus seeking requires the

use of a facilitator who helps the group to follow these guidelines:

- Avoid arguing for your own viewpoint. Present your position logically, then listen to the other members.

- Do not assume that someone must win when the discussion reaches a stalemate. Instead, restate the

problem or generate new alternatives.

- Do not change your mind simply to avoid conflict. Be suspicious when agreement seems to come too

quickly. Explore the reasons, and be sure that everyone accepts the solution.

- Avoid conflict-reducing techniques, such as majority vote, averages, coin flips, and bargaining. When a

dissenting member finally agrees, do not think the group must give way to their views on some later point.

- Differences of opinion are natural and expected. Seek them out and involve everyone in a discussion of

them. A wide range of opinions increases the chance that the group will find a better solution.

Alternatively, consensus has been used to assess the level of agreement among a set of forecasts. Higher consensus

often implies higher accuracy, especially when the forecasts are made independently. Ashton (1985) examined two

different forecast situations: forecasts of annual advertising sales for Time magazine by 13 Time, Inc. executives

given forecast horizons for one, two, and three quarters, and covering 14 years; and forecasts by 25 auditors of 40

firms’ problems, such as bankruptcy. Using two criteria, correlations and mean absolute deviation, she compared the

actual degree of agreement (between forecasts by different judges) against the accuracy of these judges. She also

compared each judge’s degree of agreement with all other judges and related this to that judge’s accuracy.

Agreement among judges did imply greater accuracy and this relationship was strong and statistically significant.

This adds evidence for using consensus as a proxy for confidence. PoFxxx

Conservatism. The assumption that things will proceed much as they have in the past. Originally a political term

that involved resistance to change. Conservatism is useful when forecasts involve high uncertainty. Given

uncertainty, judgmental forecasters should be conservative and they typically are. Some quantitative procedures,

such as regression analysis, provide conservative estimates. PoFxxx

Consistent trends . A condition that occurs when the basic trend and the recent trend extrapolations are in the same

direction. The basic trend is long term, such as that obtained by a regression against time. The recent trend is short

term, such as that obtained with an exponential smoothing model with a heavy weight on the most recent data.

Extrapolations of trends are expected to be more accurate when trends are consistent, as discussed under inconsistent

trends. PoFxxx

THE FORECASTING DICTIONARY 13

Construct validity (or conceptual validity or convergent validity). Evidence that an operational measure

represents the concept. Typically assessed by examining the correspondence among different operational measures

of a concept. PoFxxx

Consumer heterogeneity. Differences among people, either in terms of observable characteristics, such as

demographics or behavior, or in terms of unobservable characteristics, such as preferences or purchase intentions.

In some forecasting settings, it may be helpful to capture these types of differences as well as the factors that affect

the future behavior of individuals.

Contextual information. Information about explanatory variables that could affect a time-series forecast. The

contextual information that the forecaster has is called domain knowledge. PoFxxx

Contrary series. A series in which the historical trend extrapolation is opposite in direction to prespecified

expectations of domain experts. For example, domain experts might think that the causal forces should drive the

series up, but the historical trend is headed down. Contrary series can lead to large errors. Evidence to date suggests

that statistical trend estimates should be ignored for contrary series (Armstrong and Collopy 1993). In addition,

contrary series are expected to have asymmetric errors, even when expressed in logs (Armstrong and Collopy 2000).

See Armstrong, Adya and Collopy (2001). PoFxxx

Contrast group. See comparison group.

Control group. A group of randomly assigned people (or organizations) that did not receive a treatment. If random

assignment of treatments is not possible, look for a comparison group.

Convenience sample. A sample selected because of its low cost or because of time pressures. Convenience samples

are useful for pretesting intentions surveys or expert opinion studies. However, it is important to use probability

samples, not convenience samples, in conducting intentions studies.

Correlation (r). A standardized measure of the linear association between two variables. Its values range from –1,

indicating a strong negative relationship, through zero, which shows no relationship, to +1, indicating a strong

positive association. The correlation coefficient is the covariance between a pair of standardized variables. Curtis

and Alf (1969) and Ozer (1985) argue that r is a better measure of predictive ability than R2 (but neither is very

useful for time-series data). A strong correlation does not imply a causal relationship.

Correlation matrix. A set of correlation coefficients presented in the form of a matrix. Most computer programs

that perform multiple regression analysis show the correlation coefficients for each pair of variables. They can be

useful for assessing multicollinearity.

Correlogram. Graphical representation of the autocorrelation function.

Covariance. A measure of the variation between variables, say X and Y. The range of covariance values is

unrestricted. However, if the X and Y variables are first standardized, then covariance is the same as correlation and

the range of covariance (correlation) values is from –1 to +1.

Criterion variable. See dependent variable.

Cross-correlation. A standardized measure of association between values in one time series and those of another

time series. This statistic has the characteristics of a regular correlation coefficient.

Cross-sectional data. Data on a number of different units (e.g., people, countries, firms) for a single time period.

Cross-sectional data can be used to estimate relationships for a forecasting model. For example, using cross-

sectional data from different countries, one could assess how prices affect liquor sales. PoFxxx

PRINCIPLES OF FORECASTING

14

Cross-validation. A test of validity that consists of splitting the data using probability sampling, estimating the

model using one subsample, and testing it on the remaining subsample. More elaborate approaches such as double

cross-validation and the jackknife are discussed in Armstrong (2001d).

Croston’s method. See intermittent series.

Cue. A variable. In judgmental forecasting, a cue refers to a variable perceived by an expert.

Cumulative error. The total of all forecast errors (both positive and negative) over the forecast horizon. For

example, for forecasts for the next five years, the analyst would sum the errors (with signs) for the five forecasts.

This will approach zero if the forecast is unbiased.

Cumulative forecasting. The total value of a variable over several horizon periods. For example, one might forecast

total sales over the next year, rather than forecast sales for each of the 12 months.

Current status. The level at the origin of the forecast horizon.

Curve fitting. To fit historical time-series data to a functional form such as a straight line or a polynomial.

Cusum. Cumulative sum of forecast errors. The cusum is used in tracking signals .

Cyclical data. Time-series data that tend to go through recurring increases and decreases. See also business cycle.

This term is generally not used for seasonal variations within a year. Although it is difficult to forecast cycles,

knowledge that a time series is subject to cycles may be useful for selecting a forecasting method and for assessing

uncertainty. (See also long waves.) See Armstrong (2001c), Armstrong, Adya and Collopy (2001), and PoFxxx

Cyclical index. A number, usually standardized to have a mean of 100, that can help to identify repetitive patterns.

It is typically applied to annual time-series data, but can also be used for shorter periods, such as hours within a day.

(See also seasonal index.)

Damp. To reduce the size of an effect, as in “to damp the trend” (as contrasted to dampening, which would imply

some type of moisturizing and thus be senseless, or worse, for forecasters). Damped estimates are useful in the

presence of uncertainty. Thus, in making extrapolations over long horizons, one should damp. Seasonal factors can

also be damped if there is uncertainty. In addition, the effects in an econometric model can be damped in light of

uncertainty about the forecasts of the explanatory variables. See mitigation and Armstrong (2001c). PoFxxx

Damped trend. See damp.

Data Generating Process (DGP). A model of the system under investigation that is assumed to represent the

system and to be responsible for the observed values of the dependent variable. It is important to remember that the

model is based on assumptions; for real-world data in the social sciences, one can only guess at the DGP.

Decay forces. Forces that tend to drive a series down. For example, the costs for such technical products as

computers might fluctuate over time, but as long as the underlying forces are downward, they are classified as

decay. See Armstrong, Adya and Collopy (2001). PoFxxx

Deceleration. A decrease in the trend. See acceleration.

Decomposition. The process of breaking a problem into subproblems, solving them, and then combining the

solutions to get an overall solution. MacGregor (2001) provides evidence on the value of this procedure for

judgmental forecasting. Typically, decomposition refers to multiplicative breakdowns, but sometimes it applies to

additive breakdowns. Additive breakdowns, however, are usually called disaggregate forecasting or segmentation.

Time series are often decomposed by level, trend, cycle, seasonality, and error. PoFxxx

THE FORECASTING DICTIONARY 15

Degrees of freedom. The number of observations minus the number of parameters in a regression analysis. It is

sensible to include all variables considered for use in the model, not just those in the final version. The larger the

number of coefficients estimated, the larger the number of constraints imposed in the sample and the smaller the

number of observations left to provide precise estimates of the regression coefficients. A greater number of degrees

of freedom is often thought to provide more reliable estimates, but the relationship to reliability is weak. (See

adjusted R2.)

Delphi technique. A method for obtaining independent forecasts from an expert panel over two or more rounds,

with summaries of the anonymous forecasts (and perhaps reasons for them) provided after each round. Delphi has

been widely used in business. By applying well-researched principles, Delphi provides more accurate forecasts than

unstructured groups (Rowe and Wright 1999). The process can be adapted for use in face-to-face group meetings,

and is then called mini-Delphi or Estimate-Talk-Estimate (ETE). Rowe and Wright (2001) provide principles for the

use of the Delphi technique. PoFxxx

Demand. The need for a particular product or component. Demand can come from a number of sources (e.g.,

customer order or producer’s good). Demand can be forecast for each level in a supply chain. At the finished-goods

level, demand data often differ from sales data because demand does not necessarily result in sales (e.g., if there is

no stock there may be unfulfilled demand).

Dependent variable. The variable that is to be forecast; that is, the variable of interest to the researcher. In

regression analysis, it is the variable on the left side of the equation.

Deseasonalized data. See seasonal adjustment.

Detrend. To remove an upward or downward trend from a time series. Frequently, this is done by regressing a

series against time, then using the trend coefficient to remove the trend from the observations. Detrending data can

reveal patterns in the data. Detrending should be done prior to making seasonal adjustments. PoFxxx

Devil's advocate. A procedure whereby one person in a group is assigned the task of trying to find everything that

might be wrong in a forecast (or a plan), while the rest of the group defends it. This should be done as a structured

approach, perhaps with this role rotating among group members. (Someone adopting this role without permission

from the group can become unpopular.) Use the devil’s advocate procedure only for short time periods, say 20

minutes or less if done in a meeting. Cosier’s (1978) experiment showed that groups that used the devil’s advocate

procedure obtained more accurate predictions than those who solely argued in favor of a forecast. One would also

expect the devil’s advocate procedure to improve the calibration of prediction intervals. According to Cosier (1978)

and Schwenk and Cosier (1980), the “attack” is best presented in written form and in an objective manner; the use of

strong emotionally laden criticism should be avoided. This research is consistent with findings that peer review leads

to improvements in research papers. PoFxxx

DGP. See Data Generating Process.

Diagnostic checking. A step in time-series model building where the estimated errors of a model are examined for

independence, zero mean, constant variance, and other assumptions.

Dickey-Fuller test. A test to determine whether a time series is stationary or, specifically, whether the null

hypothesis of a unit root can be rejected. A time series can be nonstationary because of a deterministic trend (a

stationary trend or TS series) or a stochastic trend (a difference stationary or DS series) or both. Unit root tests are

intended to detect stochastic trend, although they are not powerful at doing so, and they can give misleading

inferences if a deterministic trend is present but is not allowed for. The augmented Dickey-Fuller test, which adds

lagged dependent variables to the test equation, is often used. Adding the lagged variables (usually at the rate

corresponding to n/3, where n is the sample size) removes distortions to the level of statistical significance but

lowers the power of the test to detect a unit root when one is present. There is a difference between forecasting with

trend-stationary (TS) and difference-stationary (DS) models (though probably little difference in point forecasts and

intervals for short horizons, h = 1 or 2). The point forecasts of a TS series change by a constant amount (other things

being equal) as the forecast horizon is incremented. Their prediction intervals are almost constant. The point

PRINCIPLES OF FORECASTING

16

forecasts of a DS series are constant as the horizon is increased (like naive no-change forecasts), other things being

equal, while the prediction intervals widen rapidly. There is a vast literature on unit roots. The expression "unit root

test$" ($ indicates a wildcard) generated 281 hits in the Econolit database of OVID (as of mid-December, 1999),

although when it was combined with “forecast$,” the number fell to 12. Despite this literature, we can say little

about the usefulness of a unit-root test, such as the Dickey-Fuller test, as part of a testing strategy to improve

forecasting accuracy. Meese and Geweke (1984) examined 150 quarterly and monthly macroeconomic series and

found that forecasts from detrended data (i.e., assuming TS) were more accurate than forecasts from differenced

data. Campbell and Perron (1991) conducted a Monte Carlo simulation with an ARMA (1,1) Data Generating

Process and samples of 100. When there was an autoregressive unit root or near unit root (.95 or higher), an

autoregressive model in differences forecasted better at h = 1 and h = 20 horizons. When there was an autoregressive

unit root and the moving average parameter was 0.9 or less, the model in differences was also better. Otherwise the

AR model in levels with a trend variable was better. Since most economic series appear to contain a unit root, the

Campbell and Perron study seems to call for using a DS model, exactly the opposite of the strategy indicated by

Meese and Geweke. But what if the parameter values are unknown? Campbell and Perron also considered a mixed

strategy: Use a levels model if the augmented Dickey-Fuller test and the Phillips-Perron test for a unit root were

both rejected at the five percent level of significance; otherwise use a model in differences. Such a strategy gave

almost as good results as using the better model given knowledge of the parameter values. This slender evidence

provides some support for using a unit-root test to select a forecasting model. Maddala and Kim (1998) provide a

helpful summary. PoFxxx

Differencing. A time series of successive differences (Xt – Xt-1). When a time series is non-stationary, it can often be

made into a stationary series by taking first differences of the series. If first differences do not convert the series to

stationary form, then one can create first differences of first differences. This is called second-order differencing. A

distinction is made between a second-order difference and a second difference (Xt – Xt-2 ). See backward shift

operator.

Diffusion. The spreading of an idea or an innovation through a population. Typically, an innovation such as

television is initially used by a small number of people. The number of new users per year increases rapidly, then,

after stabilizing, decreases as unsatisfied demand for the innovation dies away. Meade and Islam (2001) examine the

use of diffusion models for time-series extrapolation. Rogers (1995), based on an extensive review of the literature,

updated his conclusions that the speed of diffusion depends on: (1) the relative advantage of the product over

existing products, (2) compatibility with existing solutions, (3) divisibility (the user can try part of the idea), (4)

communicability, (5) complexity, (6) product risks (will it actually provide the benefits?), and (7) psychological

risks (e.g., will people laugh at me if I adopt this new product or idea?).

Diffusion index. The percentage of components in a selected collection of time-series indicators that are increasing.

Given one hundred components of the same size, the index would be 40 percent when 40 were expanding, and zero

when none were increasing.

Disaggregation. See segmentation.

Disconfirming evidence. Evidence that refutes one’s beliefs or forecasts. Substantial evidence shows that people

do not use disconfirming evidence effectively, especially if received on a case-by-case basis. Tetlock (1999), in a

long-term study of political, economic, and military forecasts, shows how people use a variety of belief-system

defenses, which makes learning from history a slow process. PoFxxx

Discontinuity. A large shift in a time series that is expected to persist. The effect is usually a change in level but can

also be a change in trend. Trend discontinuities are difficult to estimate, so it might be best to assume that the change

occurred only in the level, although this is speculative. Discontinuities play havoc with quantitative approaches to

extrapolation (Armstrong and Collopy 1992). PoFxxx

Discrete event. A one-time event that causes outliers or changes in time-series patterns. Examples of such events

are a factory closing, a hurricane, or a change in the products offered.

Discriminant analysis. A variation of regression analysis used to predict group membership. The dependent

variable is based on categorical data. The simplest variation is a dependent variable with two categories (e.g.,

THE FORECASTING DICTIONARY 17

“accepted bribe” vs. “did not accept bribe,” “bid accepted” vs. “bid rejected,” or “survived medical operation” vs.

“died”). PoFxxx

Disjunctive model. A nonlinear judgment model that combines variables (cues) to ensure, say, that at least one cue

must take on a high value before the forecast generated by the model will be high.

Domain expert. A person who knows a lot about the situation being forecast, such as an expert in automotive

marketing, restaurant management, or the weather in a given region.

Domain knowledge. Expert’s knowledge about a situation, such as knowledge about a brand and its market. This

knowledge is a subset of the contextual information for a situation. PoFxxx

Double cross-validation. A procedure used to test predictive validity, typically with longitudinal or cross-sectional

data. The data to be analyzed are split into two roughly equal subsets. A model is estimated on one subset and its

ability to forecast is tested on the other half. The model is then estimated for the other subset, which is then used to

forecast for the first subset. This procedure requires a large sample size. (Also see jackknife.)

Double moving average. A moving average of a series of data that already represents a moving average. It provides

additional smoothing (the removal of more randomness than an equal-length single moving average).

Dummy variable. An explanatory variable that assumes only two values, 0 or 1. In a regression analysis , the

coefficient of a dummy variable shows the average effect on the level of the dependent variable when the dummy

variable assumes the value of 1. For example, a dummy variable might represent the presence or absence of capital

punishment in a geographical region, and its regression coefficient could show the effect of capital punishment on

the level of violent crime. More than two categories can be handled by using additional dummy variables; for

example, to represent three political affiliations (e.g., Republican, Democrat, or Other) in a model to predict election

outcomes, one could use two dummy variables ("Republican or not?" and "Democrat or not?"). One needs v-1

dummy variables to represent v variables. PoFxxx

Durbin-Watson statistic. A measure that tests for autocorrelation between error terms at time t and those at t + 1.

Values of this statistic range from 0 to 4. If no autocorrelation is present, the expected value is 2. Small values (less

than 2, approaching 0) indicate positive autocorrelation; larger values (greater than 2, approaching 4) indicate

negative autocorrelation. Is autocorrelation important to forecasting? It can tell you when to be suspicious of tests of

statistical significance, and this is important when dealing with small samples. However, it is difficult to find

empirical evidence showing that knowledge of the Durbin -Watson statistic leads to accurate forecasts or to well-

calibrated prediction intervals . Forecasters are fond of reporting the D-W statistic, perhaps because it is provided by

the software package. Do not use it for cross-sectional data as they have no natural order.

Dynamic regression model. A regression model that includes lagged values of the explanatory variable(s) or of the

dependent variable or both. The relationship between the forecast variable and the explanatory variable is modeled

using a transfer function. A dynamic regression model can predict what will happen if the explanatory variable

changes.

Eclectic research. A set of research studies having the same objective but using procedures that differ substantially

from one another. This has also been called the multi-trait multi-method approach, convergent validation, and

methodological triangulation. By varying the approach, one hopes to identify and compensate for mistakes and

biases. Eclectic research can be used to estimate parameters for econometric models and to assess their construct

validity. Armstrong (1985, pp. 205-214) provides examples and evidence on its value. PoFxxx

Econometric method. Originally, the application of mathematics to economic data. More specifically, the statement

of theory followed by the use of objective measurement methods, usually regression analysis. The econometric

method might be viewed as the thinking-man's regression analysis. It consists of one or more regression equations.

The method can be used in economics, in other social sciences (where some people refer to these as “linear

models”), and in the physical sciences. It can be applied to time series, longitudinal, or cross-sectional data. For a

detailed description of econometric methods, see Allen and Fildes (2001). PoFxxx

PRINCIPLES OF FORECASTING

18

Econometric model. One or more regression equations used to capture the relationship between the dependent

variable and explanatory variables. The analyst should use a priori analysis to specify a model (or a set of feasible

models) and then calibrate the model parameters by minimizing the sum of the squared errors in the calibration data.

The parameters can also be estimated by minimizing the least absolute values.

Economic indicator. A time series that has a reasonably stable statistical relationship to the whole economy or to

time series of particular interest. Coincident indicators are often used to identify turning points in aggregate

economic activity and leading indicators to forecast such turning points.

Efficient. The characteristic of a forecast or estimate that cannot be improved by further analysis of the calibration

data.

Elasticity. A measure of the relationship between two variables. Elasticity expresses the percentage change in the

variable of interest that is caused by a 1% change in another variable. For example, an income elasticity of +1.3 for

unit automobile sales means that a 1% increase in income will lead to an increase of 1.3% in the unit sales of

automobiles. It is typically easier to think about elasticities than about marginal propensities (which show the unit

change in the dependent variable Y when X is changed by one unit).

Encompassing model. A model whose forecast errors explain the errors produced by a second model.

Endogenous variable. A variable whose value is determined within the system. For example, in an econometric

model, the market price of a product may be determined within the model, thus making it an endogenous variable.

(See also exogenous variable.)

Ensemble. The average of a set of forecasts. This term is used in weather forecasting. See combining forecasts.

Environment. Conditions surrounding the situation. The environment includes information about the ranges and

distributions of cues, the correlations among them, and the relations between the cues and the event being judged. In

judgmental forecasting, the environment includes constraints on information available to the judge and on actions

the judge may take, as well as time pressures, requirements for documentation, and anything else that might affect

cognitive processes. Alternatively, environment refers to the general situation when using an econometric model.

Equilibrium correction model. See error correction model.

Error term. The difference between the actual values and the forecasted values. The error term is a random variable

at time t whose probability distribution is assumed to have a mean of zero and is usually assumed to have a constant

variance at all time periods and a normal distribution.

Error correction model. A model that explains changes in the dependent variable in terms of changes in the

explanatory variables as well as deviations from the long-run relationship between the dependent variable and its

determinants. Do error correction models lead to more accurate forecasts? The jury is still out. PoFxx.

Error cost function. The economic loss related to the size of errors. It is difficult to generalize about this. The

suggested procedure is to leave this aspect of the problem to the planners and decision makers.

Error distribution. The theoretical probability distribution of forecast errors. It is often assumed to be normal. In

the social sciences, this assumption is generally reasonable for short-interval time -series data (say, monthly or less),

but not for annual data.

Error ratio. The error of a selected forecasting method divided by that for a benchmark forecast. The term is

commonly used in judgmental forecasting. It is also used in quantitative forecasting. See Theil’s U and Relative

Absolute Error.

THE FORECASTING DICTIONARY 19

Estimate-Talk-Estimate (E-T-E). A structured procedure calling for independent and anonymous judgments,

followed by a group discussion, and another round of individual judgments. It is also called mini-Delphi. See Delphi

technique.

Estimation sample. See calibration data.

Estimation. Finding appropriate values for the parameters of an equation based on a criterion. The most commonly

used criterion is minimizing the Mean Squared Error. Sometimes an iterative procedure is needed to determine

parameter values that minimize this criterion for the calibration data.

E-T-E. See Estimate-Talk-Estimate.

Event modeling. A feature of some exponential smoothing programs that allows the user to specify the time of one

or more special events, such as irregular promotions and natural disasters, in the calibration data. For each type of

special event, the effect is estimated and the data adjusted so that the events do not distort the trend and seasonal

patterns of the time series. Some programs use a procedure called intervention analysis to model events.

Ex ante forecast. A forecast that uses only information that would have been available at the forecast origin; it does

not use actual values of variables from later periods. This term, often used interchangeably with unconditional

forecast, is what we normally think of as a forecast. It can refer to holdout data (assuming the values to be unknown)

or to a situation in which the event has not yet occurred (pure ex ante). See Armstrong 2001d.

Exogenous variable. A variable whose value is determined outside of the model. For example, in an econometric

model, the gross national product might be an exogenous variable.

Expectations surveys. Surveys of how people or organizations expect that they will behave in given situations. See

also intentions surveys. PoFxxx

Experimental data. Data from situations in which a researcher has systematically changed certain variables. These

data could come from laboratory experiments, in which the researcher controls most of the relevant environment, or

field experiments, in which the researcher controls only part of the relevant environment. (See quasi-experimental

data.)

Experiments. Changes in key variables that are introduced in a systematic way to allow for an examination of the

effects that one variable has on another. For example, a firm could charge different prices in different geographical

regions to assess price elasticity. In a sense, it involves doing something wrong (not charging the apparently best

price) to learn. In addition to helping analysts develop forecasting models, experiments are useful in persuading

decision makers to accept new forecasting methods. Whereas people are often willing to reject a new idea, they are

less likely to reject a request to do an experiment. Armstrong (1982b) conducted an experiment in which subjects

were asked to describe how they would gain acceptance of a model to predict the outcome of medical treatment for

patients. Only one of the 16 subjects said that he would try an experiment. Armstrong then presented the situation as

a role-playing case to 15 groups of health-care executives; only one group proposed an experiment, and this group

was successful at implementing change while all other groups failed. Finally, Armstrong gave 14 groups instructions

on how to propose experiments in this situation; of these, 12 were successful at gaining acceptance in role-playing

exercises. PoFxxx

Expertise. Knowledge or skill in a particular task. In forecasting, this might be assessed by the extent to which

experts’ forecasts are more accurate than those by nonexperts. See also seer-sucker theory.

Expert opinions. Predictions of how others will behave in a particular situation, made by persons with knowledge

the situation. Rowe and Wright (2001) discuss principles for the use of expert opinions. Most important forecasts

rely on unaided expert opinions. Research has led to many principles to improve forecasting with expert opinions.

For example, forecasters should obtain independent forecasts from 5 to 20 experts (based on research findings by

Ashton 1986; Hogarth 1978; and Libby and Blashfield 1978). PoFxxx

PRINCIPLES OF FORECASTING

20

Expert system. A model designed to represent procedures that experts use in making decisions or forecasts. Often,

these procedures are supplemented by other information, such as estimates from econometric models . The term has

also been applied to procedures for selecting forecasting methods. Armstrong, Adya and Collopy (2001) discuss

principles for developing expert systems for forecasting. PoFxxx

Explanation effect. The increase in the perceived likelihood of an event’s occurrence that results from explaining

why the event might occur. This effect is relevant to conjoint analysis and to expert opinions (Arkes 2001). On the

positive side, it can cause decision makers to pay attention to a possible outcome; as a result, it can contribute to

scenarios. PoFxxx

Explanatory variable. A variable included in an econometric model to explain fluctuations in the dependent

variable. (See also causal variable.)

Exploratory research. Research carried out without hypotheses. The data are allowed to speak for themselves.

Exploratory research can be a worthless or even dangerous practice for forecasters. On the other hand, it might

provide ideas that can subsequently be tested. It is most useful in the early stages of a project when one knows little

about the problem.

Exponential smoothing. An extrapolation procedure used for forecasting. It is a weighted moving average in which

the weights are decreased exponentially as data becomes older. For most situations (but not all), it is more accurate

than moving averages (Armstrong 2001c). In the past, exponential smoothing was less expensive than a moving

average because it used only a few values to summarize the prior data (whereas an n-period moving average had to

retain all n values). The low cost of computer storage has reduced this advantage. When seasonal factors are difficult

to measure, moving averages might be preferred to exponential smoothing. For example, a 12-month moving

average might be useful in situations with much seasonal variation and less than four years of data. A

comprehensive treatment of exponential smoothing is provided in Gardner (1985). See also Holt-Winters

exponential smoothing method and state-space model. PoFxxx

Ex post forecast. A forecast that uses information from the situation being forecast. The actual values of the causal

variables are used, not the forecasted values; however, the parameters are not updated. This term is used

interchangeably with conditional forecast. It can help in assessing predictions of the effects of change in explanatory

variables.

Extrapolation. A forecast based only on earlier values of a time series or on observations taken from a similar set

of cross-sectional data. Principles for extrapolation are described in Armstrong (2001c). PoFxxx

Face validity. Expert opinion that a procedure represents what it purports to represent. To obtain a judgment on face

validity, ask a few experts what they expect. For example, you might ask them to specify variables and relationships

for an econometric model. Agreement among experts is evidence of face validity.

Facilitator. A group member whose only role is to help the group to function more effectively by following a

structured procedure. One of the dominant conclusions about judgmental forecasting is that structure contributes to

forecast accuracy.

Factor analysis. A statistical procedure for obtaining indices from variables by combining those that have high

correlations with one another. Factor analysis has been used to develop predictive indices, but this has not been

successful; Armstrong (1985, p. 223) reports on eight studies, all failures in this regard.

Feature identification. The identification of the conditions (features) of a set of data. Features can help select an

extrapolation method, as described in Armstrong, Adya and Collopy (2001).

Features. Operational measures of the characteristics of time-series or cross-sectional data. Examples include basic

trend, coefficient of variation, and discontinuity. PoFxxx

THE FORECASTING DICTIONARY 21

Feedback. Information that experts receive about the accuracy of their forecasts and the reasons for the errors.

Accurate, well-summarized feedback is probably the primary basis experts have for improving their judgmental

forecasts. The manner in which feedback is provided is critical because people tend to see what they want to see or

what they expect. When feedback is well-summarized, frequent, and when it contains explanations for the events,

judgmental forecasters can become well-calibrated. Weather forecasters receive this kind of feedback, and they are

almost perfectly calibrated: it rains on 80% of the days on which they predict an 80% chance of rain (Murphy and

Winkler 1984). Well-structured feedback is especially important when it involves disconfirming evidence. PoFxxx

File. A collection of data.

Filter. A process developed in engineering for eliminating random variations (high or low frequencies) in an attempt

to ensure that only the true pattern remains. For example, a filter might adjust outliers to be within two or three

sigmas (standard deviations) of forecasted or fitted values.

First differences. See differencing.

Fisher exact test. A nonparametric test used to assess relationships among variables in a 2 ? 2 table when samples

are small. Siegel and Castellan (1988) provide details on calculating this and other nonparametric statistics.

Fit. The degree to which a model explains (statistically speaking) variations in the calibration data. Fit is likely to be

misleading as a criterion for selecting and developing forecasting models, because it typically has only a weak

relationship to ex ante forecast accuracy (Armstrong 2001d). Fit tends to favor complex models, and these models

often do not hold up in forecasting, especially when using time-series data. Nevertheless, Pant and Starbuck (1990)

found a modest relationship between fit (when using MAPE) and short-term forecasts for 13 extrapolation methods.

It is more relevant when working with cross-sectional data. PoFxxx

Focus group. A group convened to generate ideas, where a facilitator uses nondirective interviewing to stimulate

discussion. Fern (1982) found that such groups are most useful when, in the real situation, people’s responses

depend to some extent on their peers’ beliefs. This could include responses to visible products, such as clothing or

automobiles. Focus groups might be used to generate ideas about variables for judgmental bootstrapping or conjoint

analysis when the forecasting problem involves visible products. In general, however, there are better (and less

expensive) ways to obtain information, such as personal interviews. Focus groups should not be used to make

forecasts. (Alas, in the real world, they are used to make poor but convincing forecasts.) PoFxxx

Forecast. A prediction or estimate of an actual value in a future time period (for time series) or for another situation

(for cross-sectional data). Forecast, prediction, and prognosis are typically used interchangeably.

Forecast accuracy. The optimist’s term for forecast errors.

Forecast competition. A competition in which forecasters are provided with the same calibration data, and they

independently make forecasts for a set of holdout data. Ideally, prior to the competition, competitors should state

hypotheses on the conditions under which their methods will be most accurate. Then they submit forecasts to an

administrator who calculates the forecast errors. There have been a number of competitions for extrapolation

methods (for example, see the M-Competition).

Forecast criteria. Factors used to evaluate and compare different forecasting techniques. Forecast accuracy is

generally considered the most important criterion, but Yokum and Armstrong (1995) showed that others, such as

ease of interpretation and cost savings, may be as important when the forecasting situation or the forecaster’s role is

considered.

Forecast error. The difference between the forecasted value (F) and the actual value (A). By convention, the error

is generally reported as F minus A. Forecast errors serve three important functions: (1) The development of

prediction intervals. Ideally, the errors should be obtained from a test that closely resembles the actual forecasting

situation. (2) The selection (or weighting) of forecasting methods. Thus, one can analyze a large set of forecasts and

then select based on which method produced the more accurate forecasts. In such evaluations, the error term should

PRINCIPLES OF FORECASTING

22

be immune to the way the series is scaled (e.g., multiplying one of the series by 1,000 should not affect the accuracy

rankings of various forecasting methods). Generally, the error measure should also be adjusted for the degree of

difficulty in forecasting. Finally, the measure should not be overly influenced by outliers. The Mean Squared Error,

which has been popular for years, should not be used for forecast comparisons because it is not independent of scale

and it is unreliable compared to alternative measures. More appropriate measures include the APE (and the

MdMAPE when summarizing across series) and the Relative Absolute Erros (and the MdRAE when summarizing

across series). (3) Refining forecasting models, where the error measures should be sensitive to changes in the

models being tested. Here, medians are less useful; the APE can be summarized by its mean (MAPE) and the RAE

by its geometric mean (GmRAE). Armstrong and Collopy (1992a) provide empirical evidence to support these

guidelines, and the measures are discussed in Armstrong (2001d).

Forecast horizon. The number of periods from the forecast origin to the end of the time period being forecast.

Forecast interval. See prediction interval.

Forecast validity. See predictive validity.

Forecast variable. The variable of interest. A variable that is predicted by some other variable or variables; it is also

called the dependent variable or response variable.

Forecasting. Estimating in unknown situations. Predicting is a more general term and connotes estimating for any

time series, cross-sectional, or longitudinal data. Forecasting is commonly used when discussing time series.

Forecasting competition. See forecast competition.

Forecasting engine. The module of a forecasting system containing the procedures for the estimation and

validation of forecasting models.

Forecasting model. A model developed to produce forecasts. It should be distinguished from a measurement model.

A forecasting model may draw upon a variety of measurement models for estimates of key parameters. A forecaster

might rely on different models for different parts of the forecasting problem, for example, using one model to

estimate the level in a time -series forecast and another to forecast change.

Forecasting support system. A set of procedures (typically computer based) that supports forecasting. It allows the

analyst to easily access, organize, and analyze a variety of information. It might also enable the analyst to

incorporate judgment and monitor forecast accuracy.

Framing. The way a question is asked. Framing can have an important effect upon subjects’ responses, so it is

important to ensure that questions are worded properly. The first influential treatment of this issue was by Payne

(1951). Much useful work followed, summarized by Sudman and Bradburn (1982). Knowledge of this work is

important in conducting intentions studies, eliciting expert opinions, and using methods that incorporate judgmental

inputs. Consider the effect of the wording in the following example provided by Norman R. F. Maier: “A man

bought a horse for $60 and sold it for $70. Then he bought it back again for $80 and sold it for $90. How much

money did he make in the horse trading business?” Almost half of the respondents answered incorrectly. Now

consider this question: “A man bought a horse for $60 and sold it for $70. Then he bought a pig for $80 and sold it

for $90. How much money does he make in the animal trading business?” Almost all respondents get the correct

answer to this version of the question ($20). Tversky and Kahneman (1981) demonstrated biases in peoples’

responses to the way that questions are framed. For example, they asked subjects to consider a hypothetical situation

in which a new disease is threatening to kill 600 people. In Program A, 200 people will be saved, while in Program

B, there is a one-third chance of saving all 600 people, but a two-thirds chance of saving none of them. In this case,

most respondents chose Program A (which is positively framed in terms of saving lives). However, when the

question was reframed with Program A leading to 400 deaths, and Program B as having a one-third chance that

nobody would die and a two-thirds chance that that all would die, then the majority of respondents chose Program B

(this alternative is negatively framed in terms of losing lives). This negative way of framing the question caused

people to respond differently, even though the two problems are identical. This example implies that framing could

THE FORECASTING DICTIONARY 23

play a role in writing scenarios. The discovery of biases due to framing seems to outpace research on how to avoid

them. Unfortunately, telling people about bias usually does little to prevent its occurrence. Beach, Barnes and

Christensen-Szalanski (1986) concluded that observed biases may arise partly because subjects answer questions

other than those the experimenter intended. Sudman and Bradburn (1982) provide a number of solutions. Two

procedures are especially useful: (1) pretest questions to ensure they are understood, and (2) ask questions in

alternative ways and compare the responses. Plous (1993, chapter 6) provides additional suggestions on framing

questions.

F-test. A test for statistical significance that relies on a comparison of the ratio of two mean square errors. For

example, one can use the ratio of "mean square due to the regression" to "mean square due to error" to test the

overall statistical significance of a regression model. F = t2 (see t-test).

Function. A formal statement of the relationship between variables. Quantitative forecasting methods rely on

functional relationships between the item to be forecast and previous values of that item, previous error values, or

explanatory variables.

Functional form. A mathematical statement of the relationship between an explanatory variable (or time) and the

dependent variable.

Gambler’s fallacy. The notion that an unusual run of events, say a coin coming up heads five times in a row,

indicates a likelihood of a change on the next event to conform with the expected average (e.g., that tails is more

likely than heads on the next toss). The reason, gamblers say, is the law of averages. They are wrong. The gambler’s

fallacy was examined by Jarvik (1951).

Game theory. A formal analysis of the relationships between competing parties who are subject to certain rules.

The Prisoner's Dilemma is one of the more popular games that had been studied. Game theory seems to provide

insight into complex situations involving conflict and cooperation. Brandenburger and Nalebuff (1996) describe

such situations. Although game theory has been the subject of enormous research, no evidence exists that it is

helpful in forecasting. To be useful, the rules of the game must match the real world, and this is typically difficult to

do. In contrast, role playing provides a way to represent the actual situation, and it has been shown to produce

accurate predictions in such cases (Armstrong 2001a). PoFxxx

GARCH. A Generalized AutoRegressive Conditionally Heteroscedastic model contains an equation for changing

variance. GARCH models are primarily used in the assessment of uncertainty. A GARCH equation of order (p, q)

assumes that the local variance of the erro r terms at time t is linearly dependent on the squares of the last p values of

the error terms and the last p values of the local variances. When q is zero, the model reduces to an ARCH model.

Generalized least squares (GLS). A method for estimating a forecasting model’s parameters that drops the

assumption of independence of errors and uses an estimate of the errors’ interrelationships. In the Ordinary-Least-

Squares (OLS) estimation of a forecasting model, it is assumed that errors are independent of each other and do not

suffer from heteroscedasticity. Whether GLS is useful to forecasters has not been established. OLS generally

provides sufficient accuracy.

Genetic algorithm. A class of computational heuristics that simulate evolutionary processes using insights from

population dynamics to perform well on an objective function. Some analysts speculate that competition among

forecasting rules will help to develop a useful forecasting model, but it is difficult to find empirical support for that

viewpoint.

Global assessment. An overall estimate (in contrast to an explicit estimate of parts of a problem). An expert

forecast made without an explicit analysis. (See also intuition.)

Goodness of fit. A measure of how well a model explains historical variations in calibration data. PoFxxx

Growth cycle. See deviation cycle.

PRINCIPLES OF FORECASTING

24

Growth forces. Forces that tend to drive a series up. For example, actively marketing a product and participating in

a developing market are growth forces. Growth forces could be found for products such as computers since the

1960s. PoFxxx

Heteroscedasticity. Nonconstant variances in a series (e.g., differing variability in the error terms over the range of

data). Often found when small values of the error terms correspond to small values of the original time series and

large error terms correspond to large values. This makes it difficult to obtain good estimates of parameters in

econometric models. It also creates problems for tests of statistical significance. Log-log models generally help to

reduce heteroscedasticity in economic data.

Heuristic. From the Greek word, meaning to discover or find. Heuristics are trial-and-error procedures for solving

problems. They are simple mental operations that conserve effort. Heuristics can be used in representing expert

systems.

Hierarchical model. A model made up of submodels of a system. For example, a hierarchical model of a market

like automobiles could contain models of various submarkets, like types of automobiles, then brands.

Hierarchy of effects. A series of psychological processes through which a person becomes aware of a new product

or service and ultimately chooses to adopt or reject it. Hierarchy of effects models can be used to forecast behavioral

changes, such as programs to reduce smoking. These processes consist of sequential stages, including awareness,

knowledge, liking, preference, and choice. Forecasting models can be developed for each of these stages by

including policy variables critical to that stage (e.g., promotions for awareness, informational advertising for

knowledge, and comparative advertising for liking).

Hindsight bias. A tendency to exaggerate in hindsight how accurately one predicted or would have been able to

predict by foresight. Sometimes referred to as the “I knew it all along” effect. Forecasters usually "remember" that

the forecasts were more accurate. Because of hindsight bias, experts may be overconfident about later forecasts. To

reduce hindsight bias, ask forecasters to explicitly consider how past events might have turned out differently. Much

research on hindsight bias was apparently stimulated by Fischhoff (1975), which was cited by about 400 academic

studies as of the end of 1999. A meta-analysis was published by Cristensen-Szalanski (1991). For a discussion of

principles relating hindsight bias to forecasting, see Fischoff (2001) and PoFxxx

Hit rate. The percentage of forecasts of events that are correct. For example, in conjoint analysis , the hit rate is the

proportion of correct choices among alternative objects in a holdout task.

Holdout data. Data withheld from a series that are not used in estimating parameters. These holdout data can then

be used to compare alternative models. See post-sample evaluation and ex ante forecast. For a discussion of the

types of holdout data, see Armstrong (2001d).

Holdout task. In conjoint analysis, respondents use holdout data to make choices from sets of alternative objects

described on the same attributes (Wittink and Bergesteum 2001). Ideally, holdout choice sets have characteristics

that resemble actual choices respondents will face in the future.

Holt's exponential smoothing method. An extension of single exponential smoothing that allows for trends in the

data. It uses two smoothing parameters, one for the level and one for the trend. (See discussion in Armstrong 2001c.)

Holt-Winters' exponential smoothing method. An extension of Holt's exponential smoothing method that includes

seasonality (Winters 1960). This form of exponential smoothing can be used for less-than-annual periods (e.g., for

monthly series). It uses smoothing parameters to estimate the level, trend, and seasonality. An alternative approach

is to deseasonalize the data (e.g., via Census Program X-12), and then use exponential smoothing. There is little

evidence on which seasonality procedure is most accurate. See state-space model.

Homoscedasticity. Variability of error that is fairly constant over the range of the data.

Horizon. See forecast horizon.

THE FORECASTING DICTIONARY 25

Identification. A step in building a time -series model for ARMA and ARIMA in which one uses summary

statistics, such as autocorrelation functions or partial autocorrelation functions, to select appropriate models for the

data. The term is also used for econometric models.

Illusion of control. An erroneous belief that one can control events. People who have no control over events often

think they can control them. As Mark Twain said in describing a fight. “Thrusting my nose firmly between his teeth,

I threw him heavily to the ground on top of me.” Even gamblers have an illusion of control (Langer and Roth 1975).

Inconsistent trends. A condition for time series when the basic (long-term) trend and the recent (short-term) trend

are forecasted to be in opposite directions. When it occurs, trend extrapolation is risky. One strategy is to blend the

two trends as one moves from the short to the long term. A more conservative strategy is to forecast no trend. For

evidence on how inconsistent trends affect forecast errors, see Armstrong, Adya and Collopy (2001). See also

consistent trends. PoFxxx

Independent variable. A variable on the right-hand side of a regression. It can be used as a predictor. It includes

time, prior values of the dependent variable, and causal variables. See explanatory variable.

Index numbers. Numbers that summarize the level of economic activity. For example, the Federal Reserve Board

Index of Industrial Production summarizes a number of variables that indicate the overall level of industrial

production activity. Index numbers can control for scale in forecasting.

Index of Predictive Efficiency (IPE). IPE = (E1-E2)/ E1, where E1 is the error for the benchmark forecast, which

might be based, say, on the method currently used. The measure was proposed by the sociologists, Ohlin and

Duncan (1949), for cross-sectional data. The comparison to a benchmark is also used in Theil’s U and in the

Relative Absolute Error.

Inductive technique. A technique that searches through data to infer statistical patterns and relationships. For

example, judgmental bootstra pping induces rules based on forecasts by an expert.

Initializing. The process of selecting or estimating starting values when analyzing calibration data.

Innovation. In general, something new. Forecasters use the term to refer to the disturbance term in a regression or to

an event that causes change in a time series. (Also see diffusion.)

Input-output analysis. An examination of the flow of goods among industries in an economy or among branches of

an organization. An input-output matrix is used to show interindustry or interdepartmental flows of goods or

services in the economy, or in a company and its markets. The matrix can be used to forecast the effects of a change

in one industry on other industries (e.g., the effects of a change in oil prices on demand for cars, then steel sales,

then iron ore, and then limestone.) Although input-output analysis led to one Nobel prize (Wassily Leontief’s in

1964), its predictive validity has not been well-tested. However, Bezdek (1974), in his review of 16 input-output

forecasts in seven countries made between 1951 and 1972, concluded that input-output forecasts were more accurate

than those from alternative techniques.

Instabilities. Changes resulting from unidentified causes in the pattern of a time series, such as a discontinuity or a

change in the level, trend, or seasonal pattern.

Integrated. A characteristic of time-series models (the I in ARIMA models) in which one or more of the

differences of the time-series data are included in the model. The term integrated is used because the original series

may be recreated from a differenced series by summation.

Intentions survey. A survey of how people say they will act in a given situation. See also expectations surveys and

Juster scale. Especially useful for new products, but also used to supplement behavioral data (such as sales) as

shown in Armstrong, Morwitz and Kumar (2000). See Morwitz (2001). PoFxxx

PRINCIPLES OF FORECASTING

26

Interaction. A relationship between a predictor variable (X1) and the dependent variable (Y) that depends upon the

level of another predictor variable (X2). (There may be main effects as well.) To address problems containing

interaction, consider a program such as AID. It is difficult to find evidence that interaction terms in regression

analysis contribute to forecast accuracy.

Intercept. The constant term in regression analysis. The regression’s intersection with the Y-axis. If the explanatory

variable X is 0, then the value of the forecast variable, Y, will be the intercept value. The intercept has no meaning in

the traditional log-log model; it is simply a scaling factor.

Interdependence. A characteristic of two or more variables that are mutually dependent. Thus, a change in the

value of one of the variables would correlate with a change in the value of the other variable. However, correlation

does not imply interdependence.

Intermittent demand. See intermittent series.

Intermittent series. A term used to denote a time series of non-negative integer values where some values are zero.

For example, shipments to a store may be zero in some periods because a store’s inventory is too large. In this case,

the demand is not zero, but it would appear to be so from the data. Croston’s method (Croston 1972) was proposed

for this situation. It contains an error that was corrected by Rao (1973). Willemain et al. (1994) provide evidence

favorable to Croston’s method. Other procedures such as aggregating over time can also be used to solve the

problem. See Armstrong (2001c). PoFxxx

Interpolation. The process of using some observations to estimate missing values in a series.

Interrater reliability. The amount of agreement between two or more raters who follow the same procedure. This is

important for judgmental forecasting or for assessing conditions in a forecasting problem or when using judgmental

inputs for an econometric model.

Interrupted series. See intermittent series.

Interval scale. A measurement scale where the intervals are meaningful, but the zero point of the scale is not

meaningful (e.g., the Fahrenheit scale for temperature).

Intervention analysis. A procedure to assess the effects on the forecast variable of large changes such as a new

advertising campaign, strike, or reduced tax. Intervention models can use dummy variables to represent

interventions.

Intuition. A person’s immediate apprehension of an object without the use of any reasoning process. An

unstructured judgmental impression. Intuitions may be influenced by subconscious cues . When one has much

experience and there are many familiar cues, intuition can lead to accurate forecasts. However, it is difficult to find

published studies in which intuition is superior to structured judgment.

Ipsative scores. An individual’s rating of the relative importance of an item compared with other items. Ipsative

scores do not allow for comparisons among people; e.g., Lloyd likes football better than basketball, while Bonnie

likes basketball better than football. Does Bonnie like basketball better than Lloyd likes basketball? You do not have

enough information to answer that question. Hence, when using intentions or preferences to forecast, ipsative scores

can be misleading and difficult to interpret. Guard against this problem by finding other ways for framing questions.

Irregular demand. See intermittent series.

Jackknife. A procedure for testing predictive validity with cross-sectional data or longitudinal data. Use N-1

observations to calibrate the forecasting model, then make a forecast for the remaining observation. Replace that

observation and draw a new observation. Repeat the process until predictions have been made for all observations.

Thus, with a sample of 57 observations, you can make an out-of-sample forecast for each of the 57 observations.

This procedure is also called N-way cross validation.

THE FORECASTING DICTIONARY 27

Judgmental adjustment. A subjective change that a forecaster makes to a forecast produced by a model. Making

such changes is controversial. In psychology, extensive research on cross-sectional data led to the conclusion that

one should not subjectively adjust forecasts from a quantitative model. Meehl (1954) summarized a long stream of

research on personnel selection and concluded that employers should not meet job candidates because that would

lead them to improperly adjust a model’s prediction as to their success. In contrast, studies on economic time series

show that judgmental adjustments sometimes help, although mechanical adjustments seem to do as well. Armstrong

(1985, pp. 235-238) summarizes seven studies on this issue. The key is to identify the conditions under which to

make adjustments. Adjustments seem to improve accuracy when the expert has knowledge about the level.

Judgmental adjustments are common. According to Sanders and Mandrodt’s (1990) survey of forecasters at 96 US

corporations, about 45% of the respondents claimed that they always made judgmental adjustments to statistical

forecasts, while only 9% said that they never did. The main reasons the respondents gave for revising quantitative

forecasts were to incorporate “knowledge of the environment” (39%), “product knowledge” (30%), and “past

experience” (26%). While these reasons seem sensible, such adjustments are often made by biased experts. In a

survey of members of the International Institute of Forecasters, 269 respondents were asked whether they agreed

with the following statement: “Too often, company forecasts are modified because of political considerations.” On a

scale from 1 = “disagree strongly” to 7 = “agree strongly,” the mean response was 5.4. (Details on the survey are

provided in Yokum and Armstrong 1995.) In Fildes and Hastings’ (1994) survey of 45 managers in a large

conglomerate, 64% of them responded “forecasts are frequently politically motivated.” For a discussion on

principles for making subjective adjustments of extrapolations, see Sanders and Ritzman (2001). PoFxxx

Judgmental bootstrapping. An inductive method of assessing how a person makes a judgmental decision or

forecast. The model is inferred statistically by regressing the factors used by an expert against the expert’s forecasts.

The procedure can also be used for forecasts by a group. See Armstrong (2001b). PoFxxx

Judgmental extrapolation. A subjective extension of time-series data. A time series extended by freehand, also

known as bold free hand extrapolation (BFE). This can be done by domain experts, who can use their knowledge as

well as the historical data. Most research to date, however, has been done with subjects having no domain

knowledge. Interestingly, naive extrapolations have often proven to be as accurate as quantitative extrapolations,

perhaps because subjects see patterns that are missed by the quantitative methods. This finding is difficult to believe.

In fact, the first paper reporting this finding was soundly rejected by the referees and was published only because the

editor, Spyros Makridakis, overrode the referees. The paper (Lawrence, Edmundson and O’Conner 1985) went on to

become one of the more highly cited papers in the IJF and it stimulated much useful research on the topic.

Judgmental extrapolations can sometimes be misleading. In a series of studies, Wagenaar (1978) showed that people

can misperceive exponential growth. For a simple example, ask people to watch as you fold a piece of paper a few

times. Then ask them to guess how thick it will be if you fold it another 40 times. They will usually reply that it will

be a few inches, some say a few feet, and occasionally someone will say a few miles. But if they calculated it, they

would find that it would extend past the moon. Despite the above findings, when the forecaster has substantial

domain knowledge, judgmental extrapolation may be advantageous, especially when large changes are involved. For

a discussion of principles related to judgmental extrapolation, see Webby, O’Connor and Lawrence (2001).

Judgmental forecasting. A subjective integration of information to produce a forecast. Such methods can vary from

unstructured to highly structured.

Judgmental revision. See judgmental adjustment.

Jury of executive opinion. Expert opinions produced by executives in the organization.

Juster scale. An 11-point scale for use in expectations surveys and intentions surveys. The scale was proposed by

Juster (1964, 1966), who compared an 11-point scale with a 3-point scale (definite, probable, maybe) in measuring

intentions to purchase automobiles. Data were obtained from 800 randomly selected respondents, the long scale

being administered to them a few days after the short scale. Subsequent purchasing behavior of these respondents

indicated that the longer probability scale was able to explain about twice as much of the variance among the

subsequent behavior of the judges as was the shorter scale. In addition, the mean value of the probability distribution

for the 800 respondents on the 11-point scale provided a better estimate of the purchase rate for this group than the

short scale. Day et al. (1991) concluded that Juster’s 11-point purchase probability scale provides substantially better

predictions of purchase behavior than intention scales. They based their conclusion on the evidence from their two

PRINCIPLES OF FORECASTING

28

New Zealand studies and prior research by Juster (1966), Byrnes (1964), Stapel (1968), and Gabor and Granger

(1972). PoFxxx

Kalman filter. An estimation method (for fitting the calibration data) based on feedback of forecast errors that

allows model parameters to vary over time. (See state space model.)

Kendall rank correlation. A nonparametric measure of the association between two sets of rankings. It is an

alternative to the Spearman rank correlation. Siegel and Castellan (1988) describe this measure and its power. This

statistic is useful for comparing methods when the number of forecasts is small, the distribution of the errors is

unknown, or outliers exist, such as with financial data. (See statistical significance.)

Lag. A difference in time between an observation and a previous observation. Thus, Yt-k lags Yt by k periods. See

also lead.

Lagged values. See lag.

Lagging index. A lagging index is a summary measure of aggregate economic activity. The last measured

indication of a business cycle turning point is sometimes an indication of the next business cycle turn. Some people

speculate that the lagging index, when inverted, might anticipate the next business cycle turn.

Lead. A difference in time between an observation and a future observation. Thus, Yt+k leads Yt by k periods. See

also lag.

Lead time. The time between two related events. For example, in inventory and order entry systems, the lead time is

the interval between the time an order is placed and the time it is delivered (also called delivery time).

Leading indicator. An economic indicator whose peaks and troughs in the business cycle are thought to lead

subsequent turning points in the general economy or some other economic series. But do they really? Here is what

William J. Bennett, former U.S. Secretary for Education, said about the U.S. Census Bureau’s Index of Leading

Economic Indicators in the Wall Street Journal on 15 March 1993: "These 11 measurements, taken together,

represent the best means we now have of . . . predicting future economic trends." This appears to be a common

viewpoint on leading economic indicators. Research on leading economic indicators began in the late 1930s. In

1950, an index of eight leading indicators was developed using data from as far back as 1870. Use of the method

spread to at least 22 countries by the end of the century. By the time the U.S. Commerce Department turned the

indicators over to the Conference Board in the early 1990s, there had been seven revisions to improve the data.

There has long been criticism of leading indicators. Koopmans (1947), in his review of Burns and Mitchell’s early

work, decried the lack of theory. Few validation studies have been conducted. Auerbach (1982), in a small-scale test

involving three-month-ahead ex-ante forecasts of unemployment, found that the use of leading indicators reduced

the RMSE slightly in tests covering about 24 years. Diebold and Rudebusch (1991) examined whether the addition

of information from the Composite Leading Index (CLI) can improve upon extrapolations of industrial production.

They first based the extrapolations on regressions against prior observations of industrial production and developed

four models. Using monthly data from 1950 through 1988, they then prepared ex ante forecasts for one, four, eight,

and twelve periods ahead using successive updating. The extrapolations yielded a total of 231 forecasts for each

model for each forecast horizon. The results confirmed prior research showing that ex post forecasts are improved

by use of the CLI. However, inclusion of CLI information reduced ex ante forecast accuracy, especially for short-

term forecasts (one to four months ahead). Their findings are weak as they come from a single series. In general

then, while leading indicators are useful for showing where things are now, we have only weak evidence to support

their use as a forecasting tool. For more on leading indicators, see Lahiri and Moore (1991). PoFxxx

Least absolute values. Regression models are usually estimated using Ordinary Least Squares (OLS). An

alternative method is to minimize the sum of absolute errors between the actual observation and its “predicted”

(fitted) value for calibration data, a procedure known as least absolute value estimation (LAV). According to

Dielman (1986), the LAV method as a criterion for best fit was introduced in 1757. About half a century later, in

1805, least squares was developed. Using Monte Carlo simulation studies, Dielman concluded that, in cases in

THE FORECASTING DICTIONARY 29

which outliers are expected, LAV provides better forecasts than does least squares and is nearly as accurate as least

squares for data that have normally distributed errors.

Least squares estimation. The standard approach for estimating parameters in a regression analysis , based on

minimizing the sum of the squared deviations between the actual and fitted values of the criterion (dependent)

variable in the calibration data. (See Ordinary Least Squares.)

Lens model. A conceptual model, proposed by Brunswick (1955), that shows how an expert receives feedback in a

situation. The model is related to judgmental bootstrapping and econometric methods, as shown here.

The Brunswick Lens Model of Feedback

X1

X2

X3

X4

Actual

results:

AJudge’s

forecasts:

F

(A –F)

Econometric model Judgmental bootstrapping model

b1

b2

b3

b4

1

ˆ

b

2

ˆ

b

3

ˆ

b

4

ˆ

b

The X’s are causal variables. The solid lines represent relationships. The b’s represent estimated relationships

according to the actual data, while the b

ˆ’s represent relationships as seen by the judge. The dashed line represents

feedback on the accuracy of the judge’s predictions. The judgmental bootstrapping model can provide feedback to

the judge on how she is making forecasts. The econometric model provides information on the actual relationships.

Actual outcomes and a record of forecasts are needed to assess accuracy. Given that the econometric model provides

better estimates of relationships, one would expect that such feedback would be the most effective way to improve

the accuracy of an expert’s forecasts. Newton (1965), in a study involving the prediction of grade-point averages for

53 students, found that feedback from the econometric model was more effective in improving accuracy than was

feedback about accuracy or information from the bootstrapping model. For a further discussion on the use of the lens

model in forecasting, see Stewart (2001).

Level. The value of a time series at the origin of the forecast horizon (i.e., at time t0). The current situation.

Lewin’s change process. Efforts to implement change should address three phases: Unfreezing, change, and

refreezing. In discussing this process, Lewin (1952) used an analogy to ice; it is difficult to change the shape of ice

unless you first unfreeze it, then change it and refreeze it. Similarly, when trying to introduce a new forecasting

procedure, first ask the clients what they are willing to change (unfreezing). To change, propose experiments.

Refreezing involves rewarding new behavior (e.g., showing that the new forecasting procedure continues to be

useful). For the change to succeed, the clients should have control over the three stages (for example, they would

define how to determine whether the new forecasting method was successful). A number of studies show that

change efforts in organizations are more successful when they address the three phases explicitly (e.g., see review of

studies provided in Armstrong 1982b). This process can also be used when seeking changes as a result of a forecast.

PoFxxx

Linear model. A term used (especially by psychologists) to denote a regression model. The linear model is typically

based on causal relationships that are linear in the parameters. In other words, the variables might be transformed in

various ways, but these transformed variables are related to each other in a linear fashion, such as Y = a + b1x1 +

b2x2. See econometric model.

PRINCIPLES OF FORECASTING

30

Ljung-Box test. A version of the Box-Pierce test for autocorrelated errors.

Local trend. See recent trend.

Logarithmic transformation. By taking logs of the dependent and explanatory variables , one might be able to

remove heteroscedasticity and to model exponential growth in a series. In such a model, the coefficients represent

elasticities that are constant over the forecast range; this is a standard assumption in economics.

Logistic. A special case of diffusion in which the probability of a population member adopting an innovation is

proportional to the number of current adopters within the population. It is a mathematical representation of

“keeping up with the Joneses.” If the number of adopters is Yt and a is the saturation level, then the equation

bt

tce1

a

Y?

?

?

describes the growth of the number of adopters of the innovation over time (b and c are constants controlling the rate

of growth). For a discussion of the logistic and related diffusion curves for forecasting, see Meade and Islam (2001).

Logit. A transformation used when the values for the dependent variable are bounded by zero and one, but are not

equal to zero or one. (The log of zero is minus infinity and it cannot be computed.) Thus, it is appropriate for series

based on percentages, such as market-share predictions. Transform the dependent variable as follows:

logit ?

?

?

?

?

?

?

??

?p

p

(Y) 1

log

Log-log model. A model that takes the logs (to the base e or base 10) of the Y and X variables. (See logarithmic

transformation.) Econometric models are often specified as log-log under the assumption that elasticities are

constant. This is done to better represent behavioral relationships, to make it easier to interpret the results, to permit

a priori analysis, and to better represent the relationships.

Longitudinal data. Data that represent a collection of values recorded between at least two times for a number of

decision units. (See panel data.) For example, one might examine data on 30 countries in 1950 and on the same

countries in 2001 in order to determine whether changes in economic well-being are related to reported happiness

levels.

Long range. The period of time over which large changes are expected. Long range for the bread industry might be

20 years, while long range for the internet industry might be one year.

Long-run effect. The full effect that a change in a causal variable has on the dependent variable. In a regression

model, where Y = a + bX, a shift in X has an instantaneous effect (of b) on Y. In dynamic regression, there are lags

in either X or Y in the model. A shift in X also has a long-run effect, which may either amplify or damp the short-run

effect. When using causal variables in a forecasting model, one is typically concerned with long-run effects. Thus, it

is inadvisable to formulate a model on first differences.

Long-run relationship. An effect of a predictor (X) on the dependent variable (Y) that is expected to hold over a

long forecast horizon. (See long-run effect.)

Long waves. Very long-term business cycles. A Russian economist, Nikolai D. Kondratieff, introduced the term in a

series of papers in the 1920s arguing that “on the basis of the available data, the existence of long waves of cyclical

character is very probable.” Kondratieff (1935) presented no theory as to why cycles of 40 to 60 years should be

characteristic of capitalist countries, but he did associate various “empirical characteristics” with phases of his long

waves, which he professed to find in France, England, the United States, Germany, and the “whole world.”

According to his predictions, a long decline would have begun in the 1970s and continue until the first decade of the

21st century. People actually paid attention to such strange ideas.

THE FORECASTING DICTIONARY 31

Loss function. An expression that represents the relationship between the size of the forecast error and the

economic loss incurred because of that error. PoFxxx

MAD (Mean Absolute Deviation). An estimate of variation. It is an alternative to the standard deviation of the

error. The ratio of standard deviation to MAD is 1.25 for normal distributions, and it ranges from 1.0 to 1.5 in

practice. See Mean Absolute Error.

Market potential. The maximum total sales that might be obtained for a given product. (Also see saturation level.)

Markov chains. A method of analyzing the pattern of decision-making units in moving from one behavior state to

another. Construct a transition matrix to show the proportion of times that the behavior in one trial will change

(move to another state) in the next trial. If the transition process remains stable and if the sample of actors is

representative of the entire population, the matrix can be used to forecast changes. However, there is a problem.

Forecasts are most useful when changes occur. But given the assumption of stability, Markov chains are risky for

predicting behavior when organizations make efforts to change behavior and thus to change the transition matrix.

Markov chains have been recommended for predictions in marketing when people are assumed to go through

various states in using a product (e.g., trial, repeat purchase, and adoption) and for cases in which consumers

purchase different brands. Early published applications of Markov chains covered problems such as predicting

changes in the occupational status of workers, identifying bank loans that will go into default, and forecasting sales

in the home-heating market. Despite many research publications on Markov chains, I have been unable to find

accounts of research that supports their predictive validity. Armstrong and Farley (1969) compared Markov chains

with simple extrapolations in forecasting store visits and Markov chains produced no gains in accuracy. PoFxxx

Martingale. A sequence of random variables for which the expected value of the series in the next time period is

equal to the actual value in the current time period. A martingale allows for non-constant variance; a random walk

does not.

Maximum likelihood estimation. A method of estimating the parameters in an equation by maximizing the

likelihood of the model given the data. For regression analysis with normally distributed errors, maximum likelihood

estimation is equivalent to Ordinary Least Squares estimation.

M-Competition. The term used for the series of three comparative studies of extrapolation methods organized by

Spyros Makridakis, starting with the 1,001 time-series competition in Makridakis et al. (1982) and including

Makridakis et al. (1993) and Makridakis and Hibon (2000). In each study, a number of different experts prepared

extrapolations for holdout data. The accuracies of the various methods were then compared by the study’s lead

author. Raw data and information about these competitions can be found at hops.wharton.upenn.edu/forecast.

PoFxxx

Mean Absolute Deviation. See MAD and mean absolute error.

Mean Absolute Error (MAE). The average error when ignoring signs. This can be useful in assessing the cost of

errors, such as for inventory control (also called MAD).

Mean Absolute Percentage Error (MAPE). The average of the sum of all the percentage errors for a data set,

taken without regard to sign. (That is, the absolute values of the percentage errors are summed and the average is

computed.)

Mean Percentage Error (MPE). The average of all of the percentage errors for a given data set. The signs are

retained, so it serves as a measure of bias in a forecasting method.

Mean Squared Error (MSE). The sum of the squared forecast errors for each of the observations divided by the

number of observations. It is an alternative to the mean absolute deviation, except that more weight is placed on

larger errors. (See also Root Mean Square Error.) While MSE is popular among statisticians, it is unreliable and

difficult to interpret. Armstrong and Fildes (1995) found no empirical support for the use of the MSE or RMSE in

forecasting. Fortunately, better measures are available as discussed in Armstrong (2001d).

PRINCIPLES OF FORECASTING

32

Measurement error. Failures, mistakes, or shortcomings in the way a concept is measured.

Measurement model. A model used to obtain estimates of parameters from data. For example, an estimate of price

elasticity for a product from household survey data. The measurement model is not the same as the forecasting

model.

Median. The value of the middle item in a series of items arranged in order of magnitude. For an even number of

items, it is the average of the two in the middle. Medians are often useful in forecasting when the historical data or

the errors contain outliers.

Meta-analysis. A systematic and quantitative study of studies. In meta-analysis, an “observation” is a finding from a

study. Meta-analysis was applied in 1904 by Karl Pearson, who combined data from British military tests and

concluded that the then-current practice of vaccination against intestinal fever was ineffective (Mann 1994).

Although meta-analysis had also been used for decades in personnel psychology, Glass (1976) introduced the term.

In meta-analysis, one uses documented procedures to (1) search for studies, (2) screen for relevant studies, (3) code

results (a survey of the authors of the studies can be used to help ensure that their findings have been properly

coded), and (4) provide a quantitative summary of the findings. The primary advantages of meta-analysis are that it

helps to obtain all relevant studies and that it uses information in an objective and efficient manner. Cooper and

Rosenthal (1980) found that meta-analysis was more effective than traditional (unstructured) literature reviews.

Meta-analyses are useful in making generalizations, such as which forecasting method is best in a given situation.

Meta-analyses are also useful when estimating relationships for an econometric model (see a priori analysis ). When

aggregating results across studies with small sample sizes, it may be useful to follow the procedures for assessing

statistical significance described by Rosenthal (1978). Since 1980, meta-analysis has been popular in many fields.

Mini-Delphi. See Estimate-Talk-Estimate.

Misspecification test. A test that indicates whether the data supporting the building of the model violate

assumptions. When an econometric model is estimated, for example, it is generally assumed that the error term is

independent of other errors (lack of autocorrelation) and of the explanatory variables, and that its distribution has a

constant variance (homoscedasticity).

Mitigation. The reduction of the effects of a factor on a forecast. It is useful to mitigate the forecast of changes

when one faces uncertainty in the forecast. In econometric models , this can be done by reducing the magnitude of a

relationship or by reducing the amount of change that is forecast in the explanatory variable. It is difficult to find

studies on mitigation. However, in Armstrong (1985, pp. 238-242), mitigation produced large and statistically

significant error reductions for predictions of camera sales in 17 countries over a six-year horizon. The concept has

been valuable in extrapolation, where it is called damping. This term is similar to the term shrinking, and it avoids

confusion with the term shrinkage.

Model. A representation of the real world. In forecasting, a model is a formal statement about variables and

relationships among variables.

Monte Carlo simulation. A procedure for simulating real-world events. First, the problem is decomposed; then a

distribution (rather than a point estimate) is obtained for each of the decomposed parts. A trial is created by drawing

randomly from each of the distributions. The procedure is repeated for many trials to build up a distribution of

outcomes. Monte Carlo simulation can be used to estimate prediction intervals.

Months for Cyclical Dominance (MCD). The number of months, on average, before the cyclical change dominates

the irregular movement in a time series. The MCD is designed to offset the volatility in a time series so that cyclical

phases can be seen (Shiskin 1957).

Moving average. An average of the values in the last n time periods. As each new observation is added, the oldest

one is dropped. A smoothed estimate of the level can be used to forecast future levels. Trends can be estimated by

averaging changes in the most recent n' periods (n' and n generally differ). This trend can then be incorporated in the

forecast. The value of n reflects responsiveness versus stability in the same way that the choice of smoothing

THE FORECASTING DICTIONARY 33

constant does in exponential smoothing. For periods of less than a year, if the data are subject to seasonal variations,

n should be large enough to contain full cycles of seasonal factors. Thus, for monthly data, one could use 12, 24, or

36 months, and so on. Differential weights can be applied, as is done by exponential smoothing. PoFxxx

Moving origin. See successive updating.

Multicollinearity. A measure of the degree of correlation among explanatory variables in a regression analysis . This

commonly occurs for nonexperimental data. Parameter estimates will lack reliability if there is a high degree of

covariation between explanatory variables, and in an extreme case, it will be impossible to obtain estimates for the

parameters. Multicollinearity is especially troublesome when there are few observations and small variations in the

variables. PoFxxx

Multiple correlation coefficient. Often designated as R, this coefficient represents a standardized (unit free)

relationship between ?

Yand Y ( ?

Yis the result when Y is regressed against explanatory variables X1, X2, . . . Xk,). It is

customary to deal with this coefficient in squared form (i.e., R2). See R2 and adjusted R2.

Multiple hypotheses. The strategy whereby a study compares two or more reasonable hypotheses or methods.

Although it goes back to a paper published by T. C. Chamberlin in 1890 (reprinted in Chamberlain 1965), it is used

occastionally in the social sciences. Results are often not meaningful in absolute terms, so the value of an approach

(or theory) should be judged relative to current practice or to the next best method (or theory). PoFxxx

Multiple regression. An extension of simple regression analysis that allows for more than one explanatory variable

to be included in predicting the value of a forecast variable. For forecasting purposes, multiple regression analysis is

often used to develop a causal or explanatory model. (See econometric method.)

Multiplicative model. A model in which some terms are multiplied together. An alternative is an additive model.

Multi-state Kalman Filter. A univariate time -series model designed to react quickly to pattern changes. It

combines models using Bayesian estimation.

Multivariate ARMA model. ARMA models that forecast several mutually dependent time series. Each series is

forecast using a function of its own past, the past of each of the other series, and past errors. See dynamic regression

model.

Naive model. A model that assumes things will behave as they have in the past. In time series, the naive model

extends the latest observation (see random walk model). For cross-sectional data, the base rate can serve as a naive

model.

Neftci probability approach. A technique for forecasting business-cycle turning points developed by Neftci (1982).

It signals cyclical turning points by calculating the likelihood that the economic environment has changed. A

turning-point probability signal occurs when the estimated probability reaches some preset level of statistical

confidence (say 90% or 95%). The likelihoods are based on (1) the probability that the latest observation comes

from a recession (or a recovery) sample, (2) the chance of recession (or recovery) given the length of the current

cyclical phase in comparison to the historical average, and (3) the comparison of 1 and 2 with the previous month's

probability estimate.

Neural networks. Information paradigms inspired by the way the human brain processes information. They can

approximate almost any function on a closed and bounded range and are thus known as universal function

approximators. Neural networks are black-box forecasting techniques, and practitioners must rely on ad hoc

methods in selecting models. As a result, it is difficult to understand relationships among the variables in the model.

Franses and Van Dijk (2000) describe how to compute elasticities from neural nets. See Remus and O’Connor

(2001). PoFxxx

NGT. See Nominal Group Technique.

PRINCIPLES OF FORECASTING

34

Noise. The random, irregular, or unexplained component in a measurement process. Noise can be found in cross-

sectional data as well as in time-series data.

Nominal dollars. Current values of dollars. To properly examine relationships for time-series data, dollar values

should be expressed in real (constant) dollars; that is, they should be adjusted for inflation. A complicating factor for

adjusting is that the U.S. government has overstated inflation by about one percent per year.

Nominal Group Technique (NGT). A group of people who do not communicate with one another as they make

decisions or forecasts. Such groups are used in the Delphi technique, as described by Rowe and Wright (2001).

Nominal scale. Measurement that classifies objects (e.g., yes or no; red, white, or blue; guilty or innocent).

Noncompensatory model. A model that employs a nonlinear relationship combining cues to make a forecast. It is

noncompensatory because low (high) values for some cues cannot be offset in their contribution by high (low)

values in other cues. Conjunctive and disjunctive models are two noncompensatory models.

Nondirective interviewing. A style of interviewing in which the interviewer asks only general questions and

encourages the interviewee to discuss what he considers important. The interviewer probes for additional details and

does not introduce ideas or evaluate what is said. This approach is useful in determining what factors enter into a

person’s decision making. Thus, it could help in identifying variables for judgmental bootstrapping, conjoint

analysis, or econometric models. It can also be useful in developing a structured questionnaire, such as might be

used for intentions surveys. Here are some guidelines for the interview.

Start by explaining what you would like to learn – e.g., “what factors cause changes in the sales of your

primary product?” If a general opener does not draw a response, try something more specific – e.g.,

“perhaps you could describe how product x did last year?”

During the interview:

- Do not evaluate what the interviewee says. If he feels that he is being judged, he is likely to

reveal less.

- Let the interviewee know that you’re interested in what he says and that you understand. To find

out more about a particular subject that is mentioned by the interviewee, ask for elaboration –

e.g., “that’s interesting, tell me more.” Or you may use a reflection of the interviewee’s

comments – “You seem to think that . . .” often picking up the last few words used by the

interviewee.

- Do not interrupt. Let the interviewee carry the conversation once he gets going.

- Do not bring in your own ideas during the interview.

- Do not worry about pauses in the conversation. People may get uncomfortable during pauses,

but do not be in a hurry to talk if it is likely that the interviewee is thinking.

Nonexperimental data. Data obtained with no systematic manipulation of key variables. Regression analysis is

particularly useful in handling such data as it assesses the partial effects of each variable by statistically controlling

for other variables in the equation. If the variables do not vary or the explanatory variables are highly correlated with

one another, nonexperimental data cannot be used to estimate relationships.

Nonlinear estimation. Estimation procedures that are not linear in the parameters. Nonlinear techniques exist for

minimizing the sum of squared residuals. Nonlinear estimation is an iterative procedure, and there is no guarantee

that the final solution is the best for the calibration data. What does this have to do with forecasting in the social

sciences? Little research exists to suggest that nonlinear estimation will contribute to forecast accuracy, while

Occam’s razor suggests that it is a poor strategy.

Nonlinearity. A characteristic exhibited by data that shows substantial inflection points or large changes in trends.

THE FORECASTING DICTIONARY 35

Nonparametric test. A test of statistical significance that makes few assumptions about the distribution of the data.

A nonparametric test is useful for comparing data when some observations (or some forecast errors) are outliers and

when the error distributions depart substantially from normal distributions.

Nonresponse bias. A systematic error introduced into survey research, for example, in intentions surveys, because

some people in the sample do not respond to the survey (or to items in a questionnaire). Because those interested in

the topic are more likely to respond, it is risky to assume that nonresponders would be similar to responders in

reporting about their intentions. To avoid this bias, obtain high response rates. By following the advice in Dillman

(2000), one should be able to achieve well over a 50% response rate for mail surveys, and often as much as 80%. To

estimate nonresponse bias, try to get responses from a subsample of nonrespondents. Armstrong and Overton (1977)

provide evidence showing than an extrapolation of trends across waves in responses to key questions, such as “How

likely are you to purchase . . .?” will help to correct for nonresponse error.

Nonstationarity. See stationary series.

Nowcasting. Applying a forecasting procedure to obtain an estimate of the current situation or level at the origin.

Nowcasting is especially important when data are subject to much error and when short-term forecasts are needed. It

is also useful when a model may provide a poor estimate of the level; for example, regression analysis often

provides poor estimates of the level at t0 for time-series data. Combined estimates can improve the estimate of the

current level. These can draw upon extrapolation, judgment, and econometric models. Such a procedure can help to

reduce forecast error, as shown in Armstrong (1970). PoFxxx

Null hypothesis. A proposition that is assumed to be true. One examines outcomes (e.g., from an experiment) to

see if they are consistent with the null hypothesis. Unfortunately, the null hypothesis is often selected for its

convenience rather than for its truth. The rejection of an unreasonable null hypothesis (or nil hypothesis) does not

advance knowledge. For example, testing against the null hypothesis that income unrelated to the sales of

automobiles would be foolish at best and might even be misleading (see statistical significance). Unfortunately, null

hypotheses are frequently misused in science (Hubbard and Armstrong 1992).

Number-of-attribute-levels effect. An artificial result in decompositional conjoint analysis that results from

increasing the number of (intermediate) levels for an attribute in a conjoint study while holding other attribute levels

constant; this increases the estimated impact of the attribute on preferences. See Wittink and Bergestuen (2001).

N-way cross validation. See jackknife.

Observation. A measurement of a characteristic for a given unit (e.g., person, country, firm) for a given period of

time.

Occam's Razor. The rule that one should not introduce complexities unless absolutely necessary. “It is vain to do

more what can be done with less,” according to William of Occam (or Ockham) of England in the early 1300s.

Occam’s razor applies to theories about phenomena and methods.

OLS. See Ordinary Least Squares.

Omitted variable. An explanatory variable that should be part of a model but has been excluded. Its exclusion can

lead to biased and inefficient estimates of the remaining parameters in the model. Omitting it causes no problem in

the estimation of the included variables if it is constant for the calibration data, or if its variations are uncorrelated

with the included variables. Its exclusion can lead to inaccurate forecasts if it changes over the forecast horizon.

Operational measure. A description of the steps involved in assigning numbers to a variable. It should be specific

enough so others can carry out the same procedure. Ideally, operational procedures are representative of the concept

that is being measured. Even seemingly simple concepts might be difficult to operationalize, such as estimating the

price of computers year by year.

PRINCIPLES OF FORECASTING

36

Opposing forces. Forces that are expected to move against the direction of the historical trend. An example is

inventory levels relative to sales: When inventories get too large, holding costs lead managers to reduce their levels,

thus opposing the trend. When inventories are too small, service suffers, prompting decisions to hold larger

inventories, again, opposing the trend. See Armstrong, Adya and Collopy (2001). PoFxxx

Optimism. A state of mind that causes a respondent to forecast that favorable events are more likely to occur than is

justified by the facts. Also known as wishful thinking. This has long been recognized. For example, Hayes (1936)

surveyed people two weeks before the 1932 U.S. presidential election. Of male factory workers who intended to

vote for Hoover, 84% predicted he would win. Of those who intended to vote for Roosevelt, only 6% thought

Hoover would win. Many of us are susceptible to this bias. We think we are more likely to experience positive than

negative events (Plous 1993, pp. 134-135). Warnings about the optimism bias (e.g., “People tend to be too optimistic

when making such estimates”) help only to a minor extent. Analogies may help to avoid optimism. PoFxxx

Ordinal scale. A method of measuring data that allows only for ranking. The intervals between observations are not

meaningful.

Ordinary Least Squares (OLS). The standard approach to regression analysis wherein the goal is to minimize the

sum of squares of the deviations between actual and predicted values in the calibration data. Because of its

statistical properties, it has become the predominant method for regression analysis. However, it has not been shown

to produce more accurate forecasts than least absolute values .

Origin. The beginning of the forecast horizon. (Also, see level.)

Outcome feedback. Information about an outcome corresponding to a forecast. For example, how often does it rain

when the weather forecaster says the likelihood is 60%? (See also lens model.)

Outlier. An observation that differs substantially from the expected value given a model of the situation. An outlier

can be identified j