The structured-analogies method is likely to be useful for forecasting whenever experts know about similar situations from the past or when databases of situations that are more-or-less analogous to the target are available. Structured analogies was developed to forecast the decisions that people will make in conflict situations such as buyer-seller negotiations, employer-union disputes, commercial competition, hostile takeover bids, civil unrest, international trade negotiations, counter-terrorism, and warfare. Decisions in conflict situations are difficult to forecast. For example, when experts use their unaided judgment to make predictions about such situations, their forecasts are no better than guessing. The structured-analogies method makes better use of experts by eliciting in a formal way i) their knowledge about situations that were similar to the target situation, and ii) their judgments of the similarity of these situations to the target. An administrator then analyzes the information the experts provide to derive forecasts. Research to date suggests that these forecasts are likely to be more accurate than forecasts from experts' unaided judgment. The materials in this course mostly relate to the problem of conflict forecasting. For other applications, such as predicting software costs or demand forecasting, the tasks of formally describing the target situation and identifying and rating analogies will be more straightforward because the structures of these situations are likely to be relatively homogenous.
Simulated interaction was developed to forecast the decisions that people will make in conflict situations such as buyer-seller negotiations, employer-union disputes, commercial competition, hostile takeover bids, civil unrest, international trade negotiations, counter-terrorism, and warfare. These situations can be characterized as conflicts involving a small number of parties that are interacting with each other, perhaps indirectly. There is often a great deal of money at stake in such situations and, in the cases of civil unrest, terrorism, and warfare, lives. And yet predictions are typically made using unaided judgment. Research has shown that simulated interaction provides forecasts that are more accurate than those from unaided experts.
One hundred and fifty-one subjects were randomly divided into two groups of roughly equal size. One group was asked to respond to a decomposed version of a problem and the other group was presented with the direct form of the problem. The results provided support for the hypotheses that people can make better judgments when they use the principle of decomposition; and that decomposition is especially valuable for those problems where the subject knows little. The results suggest that accuracy may be improved if the subject provides the data and the computer analyzes it, than if both steps were done implicitly by the subjects.
Direct assessment and protocol analysis were used to examine the processes that experts employ to make forecasts. The sessions with the experts yielded rules about when various extrapolation methods are likely to be most useful in obtaining accurate forecasts. The use of a computer-aided protocol analysis resulted in a reduction in the total lime required to code an expert's knowledge. The implications for overcoming the "knowledge acquisition bottleneck" are considered.
Problems in the use of factor analysis for deriving theory are illustrated by means of an example in which the underlying factors are known. The actual underlying model is simple and it provides a perfect explanation of the data. While the factor analysis 'explains' a large proportion of the total variance, it fails to identify the known factors in the model, The illustration is used to emphasize that factor analysis, by itself, may be misleading as far as the development of theory is concerned. The use of a comprehensive, and explicit à priori analysis is proposed so that there will be independent criteria for the evaluation of the factor analytic results.
This paper describes a five-step procedure for meta-analysis. Especially important was the contacting of authors of prior papers. This was done primarily to improve the accuracy of the coding; it also helped to identify unpublished research and to supply missing information. Application of the five-step procedure to the issue of return postage in mail surveys yielded significantly more papers and produced more definitive conclusions than those derived from traditional reviews. This meta-analysis indicated that business reply postage is seldom costeffective because first class postage yields an additional 9% return. Business reply rates were lower than for other first class postage in each of the 20 comparisons.
The demand for research on forecasting is strong. This conclusion is based on the high number of citations to papers published about research on forecasting, and upon the number of subscriptions for journals devoted to forecasting. The supply of research papers is also large, following a rapid growth in the 1960s and 1970s. This research has produced important findings. Despite this, a comparison of published research versus the needs expressed in two surveys of academics and practitioners showed that numerous gaps still exist. A review of the literature also supported this conclusion that the research being produced does not match up well against the research desired. Suggestions are made as to what research is needed and how it should be conducted.
This paper identifies and analyses previously published studies on annual earnings forecasts. Comparisons of forecasts produced by management, analysts, and extrapolative techniques indicated that: (1) management forecasts were superior to professional analyst forecasts (the mean absolute percentage errors were 15.9 and 17.7, respectively, based on five studies using data from 1967–1974) and (2) judgemental forecasts (both management and analysts) were superior to extrapolation forecasts on 14 of 17 comparisons from 13 studies using data from 1964–1979 (the mean absolute percentage errors were 21.0 and 28.4 for judgement and extrapolation, respectively).These conclusions, based on recent research, differ from those reported in previous reviews, which commented on less than half of the studies identified here.
When we first began publication of the Journal of Forecasting, we reviewed policies that were used by other journals and also examined the research on scientific publishing. Our findings were translated into a referee's rating form that was published in the journal [Armstrong (1982a)]. These guidelines were favorably received. Most referees used the Referee's Rating Sheet (Exhibit 1 provides an updated version) and some of them wrote to tell us that they found it helpful in communicating the aims and criteria of the journal.
Eighteen empirical studies from fourteen different researchers provide evidence that prepaid monetary incentives have a strong positive impact on the response rate in mail surveys. One of these studies is described here and an attempt is made to generalize from all eighteen about the relationship between size of incentives and reduction in nonresponse. These generalizations should be of value for the design of mail survey studies.
In 1982, the Journal of Forecasting published the results of a forecasting competition organized by Spyros Makridakis (Makridakis et al., 1982). In this, the ex ante forecast errors of 21 methods were compared for forecasts of a variety of economic time series, generally using 1001 time series. Only extrapolative methods were used, as no data were available on causal variables. The accuracies of methods were compared using a variety of accuracy measures for different types of data and for varying forecast horizons. The original paper did not contain much interpretation or discussion. Partly this was by design, to be unbiased in the presentation. A more important factor, however, was the difficulty in gaining consensus on interpretation and presentation among the diverse group of authors, many of whom have a vested interest in certain methods. In the belief that this study was of major importance, we decided to obtain a more complete discussion of the results. We do not believe that the data speak for themselves.
Twenty-five years ago, the International Institute of Forecasters was established “to bridge the gap between theory and practice”. Its primary vehicle was the Journal of Forecasting and is now the International Journal of Forecasting. The Institute emphasizes empirical comparisons of reasonable forecasting approaches. Such studies can be used to identify the best forecasting procedures to use under given conditions, a process we call evidence-based forecasting. Unfortunately, evidence-based forecasting meets resistance from academics and practitioners when the findings differ from currently accepted beliefs. As a consequence, although much progress has been made in developing improved forecasting methods, the diffusion of useful forecasting methods has been disappointing. To bridge the gap between theory and practice, we recommend a stronger emphasis on the method of multiple hypotheses and on invited replications of important research. It is then necessary to translate the findings into principles that are easy to understand and apply. The Internet and software provide important opportunities for making the latest findings available to researchers and practitioners. Because researchers and practitioners believe that their areas are unique, we should organise findings so that they are relevant to each area and make them easily available when people search for information about forecasting in their area. Finally, progress depends on our ability to overcome organizational barriers.
We hypothesized that multiplicative decomposition would improve accuracy only in certain conditions. In particular, we expected it to help for problems involving extreme and uncertain values. We first reanalyzed results from two published studies. Decomposition improved accuracy for nine problems that involved extreme and uncertain values, but for six problems with target values that were not extreme and uncertain, decomposition was not more accurate. Next, we conducted experiments involving 10 problems with 280 subjects making 1078 estimates. As hypothesized, decomposition improved accuracy when the problem involved the estimation of extreme and uncertain values. Otherwise, decomposition often produced less accurate predictions.
Abstract An econometric model for the U. S. lodging market was developed from time series data. Estimates from this model were then combined with preliminary sales estimates. The resulting combination greatly improved the estimates of the "final sales" figures and also reduced the error in two forecasting tests. These improvements were achieved at a low cost.
When causal forces are specified, the expected direction of the trend can be compared with the trend based on extrapolation. Series in which the expected trend conflicts with the extrapolated trend are called contrary series. We hypothesized that contrary series would have asymmetric forecast errors, with larger errors in the direction of the expected trend. Using annual series that contained minimal information about causality, we examined 671 contrary forecasts. As expected, most (81%) of the errors were in the direction of the causal forces. Also as expected, the asymmetries were more likely for longer forecast horizons; for six-year-ahead forecasts, 89% of the forecasts were in the expected direction. The asymmetries were often substantial. Contrary series should be flagged and treated separately when prediction intervals are estimated, perhaps by shifting the interval in the direction of the causal forces. Copyright © 2001 by John Wiley & Sons, Ltd.
This paper reviews the empirical research on forecasting in marketing. In addition, it presents results from some small scale surveys. We offer a framework for discussing forecasts in the area of marketing, and then review the literature in light of that framework. Particular emphasis is given to a pragmatic interpretation of the literature and findings. Suggestions are made on what research is needed.
Research on forecasting is extensive and includes many studies that have tested alternative methods in order to determine which ones are most effective. We review this evidence in order to provide guidelines for forecasting for marketing. The coverage includes intentions, Delphi, role playing, conjoint analysis, judgmental bootstrapping, analogies, extrapolation, rule-based forecasting, expert systems, and econometric methods. We discuss research about which methods are most appropriate to forecast market size, actions of decision makers, market share, sales, and financial outcomes. In general, there is a need for statistical methods that incorporate the manager's domain knowledge. This includes rule-based forecasting, expert systems, and econometric methods. We describe how to choose a forecasting method and provide guidelines for the effective use of forecasts including such procedures as scenarios.
Those making environmental decisions must not only characterize the present, they must also forecast the future. They must do so for at least two reasons. First, if a no-action alternative is pursued, they must consider whether current trends will be favorable or unfavorable in the future. Second, if an intervention is pursued instead, they must evaluate both its probable success given future trends and its impacts on the human and natural environment. Forecasting, by which I mean explicit processes for determining what is likely to happen in the future, can help address each of these areas.
Earlier versions of this Tree appear in various publications and presentations. Note that the presence of a method in the Tree does not signal its predictive validity. For evidence on which methods are valid, see "Forecasting Methods and Principles: Evidence-based Checklists."
Much has been learned in the past half century about producing useful forecasts. Those new to the area may be interested in answers to some commonly asked questions. A. Forecasting, the field B. Types of forecasting problem C. Common sense and forecasting D. Choosing the best method E. Assessing strengths and weakness of procedures F. Accuracy of forecasts G. Examining alternative policies H. New products I. Behavior in conflicts, such as negotiations and wars J. Effect of changing technology K. Stocks and commodities L. Gaining acceptance M. Keeping up-to-date N. Reading to learn more O. Help on forecasting P. References on forecasting
When financial columnist James Surowiecki wrote The Wisdom of Crowds, he wished to explain the successes and failures of markets (an example of a "crowd") and to understand why the average opinion of a crowd is frequently more accurate than the opinions of most of its individual members. In this expanded review of the book, Scott Armstrong asks a question of immediate relevance to forecasters: Are the traditional face-to-face meetings an effective way to elicit forecasts from forecast crowds (i.e. teams)? Armstrong doesn't believe so. Quite the contrary, he explains why he considers face-to-face meetings a detriment to good forecasting practice, and he proposes several alternatives that have been tried successfully.
The systems approach uses two basic ideas. First, one should examine objectives before considering ways of solving the problem; and second, one should begin by describing the system in general terms before proceeding to the specific. From J. Scott Armstrong. Long Range Forecasting: From Crystal Ball to Computer; 2nd Edition. New York: Wiley, pp. 13-22.
Problem How to help practitioners, academics, and decision makers use experimental research findings to substantially reduce forecast errors for all types of forecasting problems. Methods Findings from our review of forecasting experiments were used to identify methods and principles that lead to accurate forecasts. Cited authors were contacted to verify that summaries of their research were correct. Checklists to help forecasters and their clients undertake and commission studies that adhere to principles and use valid methods were developed. Leading researchers were asked to identify errors of omission or commission in the analyses and summaries of research findings. Findings Forecast accuracy can be improved by using one of 15 relatively simple evidence-based forecasting methods. One of those methods, knowledge models, provides substantial improvements in accuracy when causal knowledge is good. On the other hand, data models – developed using multiple regression, data mining, neural nets, and “big data analytics” – are unsuited for forecasting. Originality Three new checklists for choosing validated methods, developing knowledge models, and assessing uncertainty are presented. A fourth checklist, based on the Golden Rule of Forecasting, was improved. Usefulness Combining forecasts within individual methods and across different methods can reduce forecast errors by as much as 50%. Forecasts errors from currently used methods can be reduced by increasing their compliance with the principles of conservatism (Golden Rule of Forecasting) and simplicity (Occam’s Razor). Clients and other interested parties can use the checklists to determine whether forecasts were derived using evidence-based procedures and can, therefore, be trusted for making decisions. Scientists can use the checklists to devise tests of the predictive validity of their findings.
Katsikopoulos et al. (2021) found that the simple and easily understood recency heuristic-which uses a single historical observation to forecast week-ahead percentage of doctor visits associated with influenza symptoms-reduced forecast errors by nearly one-half compared to Google Flu Trends' (GFT's) complex and opaque machine learning model-which uses "big data". This research note examines whether the accuracy of forecasts can be further improved by using another simple forecasting method (Green & Armstrong, 2015) that takes account of the observation that infection rates can trend, and does so in a conservative way (Armstrong, Green, and Graefe, 2015) by damping recent trends toward zero.
1. Policy decisions require scientific long-term forecasts of temperature, the impacts of temperature changes, and effects of policies – No scientific forecasts exist 2. Climate data and knowledge are uncertain, and climate is complex – The situation calls for simple methods and conservative forecasts 3. The no-change benchmark performs well – IPCC projection errors are 12 times higher for long-term. 4. Causal policy models with CO2 have low credibility and poor validation. 5. AGW alarm analogous to many failed predictions.
The precautionary principle is a political principle, not a scientific one. The principle is used to urge the cessation or avoidance of a human activity in situations of uncertainty, just in case that activity might cause harm to human health or the natural environment. In practice, the precautionary principle is invoked when an interest group identifies an issue that can help it to achieve its objectives. If the interest group is successful in its efforts to raise fears about the issue, the application of the scientific method is rejected and a new orthodoxy is imposed.
"Do we face dangerous global warming?" Scott Armstrong gave a talk at Lehigh University on June 7, 2019 about research with Kesten Green on scientific climate forecasting. The talk was invited by the graduating Class of 1959, of which he is a member, for their 60th Reunion.
Purpose: Commentary on M4-Competition and findings to assess the contribution of data models--such as from machine learning methods--to improving forecast accuracy. Methods: (1) Use prior knowledge on the relative accuracy of forecasts from validated forecasting methods to assess the M4 findings. (2) Use prior knowledge on forecasting principles and the scientific method to assess whether data models can be expected to improve accuracy relative to forecasts from previously validated methods under any conditions. Findings: Prior knowledge from experimental research is supported by the M4 findings that simple validated methods provided forecasts that are: (1) typically more accurate than those from complex and costly methods; (2) considerably more accurate than those from data models. Limitations: Conclusions were limited by incomplete hypotheses from prior knowledge such as would have permitted experimental tests of which methods, and which individual models, would be most accurate under which conditions. Implications: Data models should not be used for forecasting under any conditions. Forecasters interested in situations where much relevant data are available should use knowledge models.
In the mid-1900s, there were two streams of thought about forecasting methods. One stream-led by econometricians-was concerned with developing causal models by using prior knowledge and evidence from experiments. The other was led by statisticians, who were concerned with identifying idealized "data generating processes" and with developing models from statistical relationships in data, both in the expectation that the resulting models would provide accurate forecasts. At that time, regression analysis was a costly process. In more recent times, regression analysis and related techniques have become simple and inexpensive to use. That development led to automated procedures such as stepwise regression, which selects "predictor variables" on the basis of statistical significance. An early response to the development was titled, "Alchemy in the behavioral sciences" (Einhorn, 1972). We refer to the product of data-driven approaches to forecasting as "data models." The M4-Competition (Makridakis, Spiliotis, Assimakopoulos, 2018) has provided extensive tests of whether data models-which they refer to as "ML methods"-can provide accurate extrapolation forecasts of time series. The Competition findings revealed that data models failed to beat naïve models, and established simple methods, with sufficient reliability to be of any practical interest to forecasters. In particular, the authors concluded from their analysis, "The six pure ML methods that were submitted in the M4 all performed poorly, with none of them being more accurate than Comb and only one being more accurate than Naïve2" (p. 803.) Over the past half-century, much has been learned about how to improve forecasting by conducting experiments to compare the performance of reasonable alternative methods. On the other hand, despite billions of dollars of expenditure, the various data modeling methods have not contributed to improving forecast accuracy. Nor can they do so, as we explain below.
Commentary of the findings of the M4-Competition presented by Scott Armstrong at the M4 Conference on Monday, December 10, 2018 in New York City.
Problem Do conservative econometric models that comply with the Golden Rule of Forecasting pro- vide more accurate forecasts? Methods To test the effects of forecast accuracy, we applied three evidence-based guidelines to 19 published regression models used for forecasting 154 elections in Australia, Canada, Italy, Japan, Netherlands, Portugal, Spain, Turkey, U.K., and the U.S. The guidelines direct fore- casters using causal models to be conservative to account for uncertainty by (I) modifying effect estimates to reflect uncertainty either by damping coefficients towards no effect or equalizing coefficients, (II) combining forecasts from diverse models, and (III) incorporating more knowledge by including more variables with known important effects. Findings Modifying the econometric models to make them more conservative reduced forecast errors compared to forecasts from the original models: (I) Damping coefficients by 10% reduced error by 2% on average, although further damping generally harmed accuracy; modifying coefficients by equalizing coefficients consistently reduced errors with average error reduc- tions between 2% and 8% depending on the level of equalizing. Averaging the original regression model forecast with an equal-weights model forecast reduced error by 7%. (II) Combining forecasts from two Australian models and from eight U.S. models reduced error by 14% and 36%, respectively. (III) Using more knowledge by including all six unique vari- ables from the Australian models and all 24 unique variables from the U.S. models in equal- weight “knowledge models” reduced error by 10% and 43%, respectively. Originality This paper provides the first test of applying guidelines for conservative forecasting to estab- lished election forecasting models. Usefulness Election forecasters can substantially improve the accuracy of forecasts from econometric models by following simple guidelines for conservative forecasting. Decision-makers can make better decisions when they are provided with models that are more realistic and fore- casts that are more accurate.
Problem: Multiple regression analysis (MRA) is commonly used to develop forecasting models that inform policy and decision making, but the technique does not appear to have been validated for that purpose. Methods: The predictive validity of published least squares MRA models is tested against naive benchmarks, alternative methods that are either plausible or commonly used, and evidence-based forecasting methods. The out-of-sample errors of forecasts from the MRA models are compared with the errors of forecasts from models developed from the same data on the basis of cumulative relative absolute error (CumRAE), and the unscaled mean bounded relative absolute error (UMBRAE). Findings: Results from tests using ten models for diverse problems found that while the MRA models performed well against most of the alternatives tested for most problems, out-of-sample (n-1) forecasts from models estimated using least absolute deviation were mostly more accurate. Originality: This paper presents the first stage of a project to comprehensively test the predictive validity of MRA relative to models derived using diverse alternative methods. Usefulness: The findings of this research will be useful whether they turn out to support or reject the use of MRA models for important policy and decision-making tasks. Validation of MRA for forecasting would provide a stronger argument for the use of the method than is currently available, while the opposite finding would identify opportunities to improve forecast accuracy and hence decisions.
This article proposes a unifying theory, or the Golden Rule, of forecasting. The Golden Rule of Forecasting is to be conservative. A conservative forecast is consistent with cumulative knowledge about the present and the past. To be conservative, forecasters must seek out and use all knowledge relevant to the problem, including knowledge of methods validated for the situation. Twenty-eight guidelines are logically deduced from the Golden Rule. A review of evidence identified 105 papers with experimental comparisons; 102 support the guidelines. Ignoring a single guideline increased forecast error by more than two-fifths on average. Ignoring the Golden Rule is likely to harm accuracy most when the situation is uncertain and complex, and when bias is likely. Non-experts who use the Golden Rule can identify dubious forecasts quickly and inexpensively. To date, ignorance of research findings, bias, sophisticated statistical procedures, and the proliferation of big data, have led forecasters to violate the Golden Rule. As a result, despite major advances in evidence-based forecasting methods, forecasting practice in many fields has failed to improve over the past half-century.
The Golden Rule of Forecasting is a general rule that applies to all forecasting problems. The Rule was developed using logic and was tested against evidence from previously published comparison studies. The evidence suggests that a single violation of the Golden Rule is likely to increase forecast error by 44%. Some commentators argue that the Rule is not generally applicable, but do not challenge the logic or evidence provided. While further research might provide useful findings, available evidence justifies adopting the Rule now. People with no prior training in forecasting can obtain the substantial benefits of following the Golden Rule by using the Checklist to identify biased and unscientific forecasts at little cost.
Problem: How to help practitioners, academics, and decision makers use experimental research findings to substantially reduce forecast errors for all types of forecasting problems. Methods: Findings from our review of forecasting experiments were used to identify methods and principles that lead to accurate forecasts. Cited authors were contacted to verify that summaries of their research were correct. Checklists to help forecasters and their clients practice and commission studies that adhere to principles and use valid methods were developed. Leading researchers were asked to identify errors of omission or commission in the analyses and summaries of research findings. Findings: Forecast accuracy can be improved by using one of 15 relatively simple evidence-based forecasting methods. One of those methods, knowledge models, provides substantial improvements in accuracy when causal knowledge is good. On the other hand, data models—developed using multiple regression, data mining, neural nets, and “big data analytics”—are unsuited for forecasting. Originality: Three new checklists for choosing validated methods, developing knowledge models, and assessing uncertainty are presented. A fourth checklist, based on the Golden Rule of Forecasting, was improved. Usefulness: Combining forecasts within individual methods and across different methods can reduce forecast errors by as much as 50%. Forecasts errors from currently used methods can be reduced by increasing their compliance with the principles of conservatism (Golden Rule of Forecasting) and simplicity (Occam’s Razor). Clients and other interested parties can use the checklists to determine whether forecasts were derived using evidence-based procedures and can, therefore, be trusted for making decisions. Scientists can use the checklists to devise tests of the predictive validity of their findings. Key words: combining forecasts, data models, decomposition, equalizing, expectations, extrapolation, knowledge models, intentions, Occam’s razor, prediction intervals, predictive validity, regression analysis, uncertainty