Article

Humans vs. large language models: Judgmental forecasting in an era of advanced AI

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Despite these advancements, significant gaps remain in leveraging AI to comprehensively quantify and reduce the carbon footprint across the fashion supply chain. Existing models tend to overlook the cumulative environmental impact across stages, and many are limited to single supply chain areas without considering an integrated optimization strategy (Abolghasemi et al., 2024) . This study aims to assess the impact of comprehensive AI-driven models in achieving sustainability targets within the fashion industry, thus addressing these research gaps and providing actionable insights. ...
Article
Full-text available
This study explores how AI-driven optimization can help lower the carbon footprint of the fashion industry's supply chain. By using advanced AI models, the research delves into sustainable strategies across core areas—sourcing, production, logistics, and retail—focusing on reducing emissions, energy use, water consumption, and waste. Techniques like linear programming, genetic algorithms, and reinforcement learning showcase how AI can bring about real, measurable environmental benefits. The findings reveal that these optimized processes boost energy efficiency, save water, and cut down waste, making them valuable tools for sustainable supply chain management. The study wraps up with practical recommendations for adopting AI-based models to help the fashion industry embrace sustainability more effectively.
Article
Human forecasting accuracy improves through the “wisdom of the crowd” effect, in which aggregated predictions tend to outperform individual ones. Past research suggests that individual large language models (LLMs) tend to underperform compared to human crowd aggregates. We simulate a wisdom of the crowd effect with LLMs. Specifically, we use an ensemble of 12 LLMs to make probabilistic predictions about 31 binary questions, comparing them with those made by 925 human forecasters in a 3-month tournament. We show that the LLM crowd outperforms a no-information benchmark and is statistically indistinguishable from the human crowd. We also observe human-like biases, such as the acquiescence bias. In another study, we find that LLM predictions (of GPT-4 and Claude 2) improve when exposed to the median human prediction, increasing accuracy by 17 to 28%. However, simply averaging human and machine forecasts yields more accurate results. Our findings suggest that LLM predictions can rival the human crowd’s forecasting accuracy through simple aggregation.
Preprint
Full-text available
Advances in deep learning systems have allowed large models to match or surpass human accuracy on a number of skills such as image classification, basic programming, and standardized test taking. As the performance of the most capable models begin to saturate on tasks where humans already achieve high accuracy, it becomes necessary to benchmark models on increasingly complex abilities. One such task is forecasting the future outcome of events. In this work we describe experiments using a novel dataset of real world events and associated human predictions, an evaluation metric to measure forecasting ability, and the accuracy of a number of different LLM based forecasting designs on the provided dataset. Additionally, we analyze the performance of the LLM forecasters against human predictions and find that models still struggle to make accurate predictions about the future. Our follow-up experiments indicate this is likely due to models' tendency to guess that most events are unlikely to occur (which tends to be true for many prediction datasets, but does not reflect actual forecasting abilities). We reflect on next steps for developing a systematic and reliable approach to studying LLM forecasting.
Article
Full-text available
ChatGPT, a state-of-the-art large language model (LLM), is revolutionizing the AI field by exhibiting humanlike skills in a range of tasks that include understanding and answering natural language questions, translating languages, writing code, passing professional exams, and even composing poetry, among its other abilities. ChatGPT has gained an immense popularity since its launch, amassing 100 million active monthly users in just two months, thereby establishing itself as the fastest-growing consumer application to date. This paper discusses the reasons for its success as well as the future prospects of similar large language models (LLMs), with an emphasis on their potential impact on forecasting, a specialized and domain-specific field. This is achieved by first comparing the correctness of the answers of the standard ChatGPT and a custom one, trained using published papers from a subfield of forecasting where the answers to the questions asked are known, allowing us to determine their correctness compared to those of the two ChatGPT versions. Then, we also compare the responses of the two versions on how judgmental adjustments to the statistical/ML forecasts should be applied by firms to improve their accuracy. The paper concludes by considering the future of LLMs and their impact on all aspects of our life and work, as well as on the field of forecasting specifically. Finally, the conclusion section is generated by ChatGPT, which was provided with a condensed version of this paper and asked to write a four-paragraph conclusion.
Article
Full-text available
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
Article
Full-text available
In marketing analytics applications in OR, the modeler often faces the problem of selecting key variables from a large number of possibilities. For example, SKU level retail store sales are affected by inter and intra category effects which potentially need to be considered when deciding on promotional strategy and producing operational forecasts. But no research has yet put this well accepted concept into forecasting practice: an obvious obstacle is the ultra-high dimensionality of the variable space. This paper develops a four steps methodological framework to overcome the problem. It is illustrated by investigating the value of both intra- and inter-category SKU level promotional information in improving forecast accuracy. The method consists of the identification of potentially influential categories, the building of the explanatory variable space, variable selection and model estimation by a multistage LASSO regression, and the use of a rolling scheme to generate forecasts. The success of this new method for dealing with high dimensionality is demonstrated by improvements in forecasting accuracy compared to alternative methods of simplifying the variable space. The empirical results show that models integrating more information perform significantly better than the baseline model when using the proposed methodology framework. In general, we can improve the forecasting accuracy by 12.6 percent over the model using only the SKU's own predictors. But of the improvements achieved, 95 percent of it comes from the intra-category information, and only 5 percent from the inter-category information. The substantive marketing results also have implications for promotional category management. © 2015 Elsevier B.V. and Association of European Operational Research Societies (EURO) within the International Federation of Operational Research Societies (IFORS).
Article
Full-text available
Demand forecasting is central to decision making and operations in organisations. As the volume of forecasts increases, for example due to an increased product customisation that leads to more SKUs being traded, or a reduction in the length of the forecasting cycle, there is a pressing need for reliable automated forecasting. Conventionally, companies rely on a statistical baseline forecast that captures only past demand patterns, which is subsequently adjusted by human experts to incorporate additional information such as promotions. Although there is evidence that such process adds value to forecasting, it is questionable how much it can scale up, due to the human element. Instead, in the literature it has been proposed to enhance the baseline forecasts with external well-structured information, such as the promotional plan of the company, and let experts focus on the less structured information, thus reducing their workload and allowing them to focus where they can add most value. This change in forecasting support systems requires reliable multivariate forecasting models that can be automated, accurate and robust. This paper proposes an extension of the recently proposed Muliple Aggregation Prediction Algorithm (MAPA), which uses temporal aggregation to improve upon the established exponential smoothing family of methods. MAPA is attractive as it has been found to increase both the accuracy and robustness of exponential smoothing. The extended multivariate MAPA is evaluated against established benchmarks in modelling a number of heavily promoted products and is found to perform well in terms of forecast bias and accuracy. Furthermore, we demonstrate that modelling time series using multiple temporal aggregation levels makes the final forecast robust to model misspecification.
Article
Full-text available
This article introduces this JBR Special Issue on simple versus complex methods in forecasting. Simplicity in forecasting requires that (1) method, (2) representation of cumulative knowledge, (3) relationships in models, and (4) relationships among models, forecasts, and decisions are all sufficiently uncomplicated as to be easily understood by decision-makers. Our review of studies comparing simple and complex methods - including those in this special issue - found 97 comparisons in 32 papers. None of the papers provide a balance of evidence that complexity improves forecast accuracy. Complexity increases forecast error by 27 percent on average in the 25 papers with quantitative comparisons. The finding is consistent with prior research to identify valid forecasting methods: all 22 previously identified evidence-based forecasting procedures are simple. Nevertheless, complexity remains popular among researchers, forecasters, and clients. Some evidence suggests that the popularity of complexity may be due to incentives: (1) researchers are rewarded for publishing in highly ranked journals, which favor complexity; (2) forecasters can use complex methods to provide forecasts that support decision-makers’ plans; and (3) forecasters’ clients may be reassured by incomprehensibility. Clients who prefer accuracy should accept forecasts only from simple evidence-based procedures. They can rate the simplicity of forecasters’ procedures using the questionnaire at simple-forecasting.com.
Article
Full-text available
The widespread prevalence and persistence of misinformation in contemporary societies, such as the false belief that there is a link between childhood vaccinations and autism, is a matter of public concern. For example, the myths surrounding vaccinations, which prompted some parents to withhold immunization from their children, have led to a marked increase in vaccine-preventable disease, as well as unnecessary public expenditure on research and public-information campaigns aimed at rectifying the situation. We first examine the mechanisms by which such misinformation is disseminated in society, both inadvertently and purposely. Misinformation can originate from rumors but also from works of fiction, governments and politicians, and vested interests. Moreover, changes in the media landscape, including the arrival of the Internet, have fundamentally influenced the ways in which information is communicated and misinformation is spread. We next move to misinformation at the level of the individual, and review the cognitive factors that often render misinformation resistant to correction. We consider how people assess the truth of statements and what makes people believe certain things but not others. We look at people’s memory for misinformation and answer the questions of why retractions of misinformation are so ineffective in memory updating and why efforts to retract misinformation can even backfire and, ironically, increase misbelief. Though ideology and personal worldviews can be major obstacles for debiasing, there nonetheless are a number of effective techniques for reducing the impact of misinformation, and we pay special attention to these factors that aid in debiasing. We conclude by providing specific recommendations for the debunking of misinformation. These recommendations pertain to the ways in which corrections should be designed, structured, and applied in order to maximize their impact. Grounded in cognitive psychological theory, these recommendations may help practitioners—including journalists, health professionals, educators, and science communicators—design effective misinformation retractions, educational tools, and public-information campaigns.
Article
Full-text available
Sales forecasting is increasingly complex due to a range of factors, such as the shortening of product life cycles, increasingly competitive markets, and aggressive marketing. Often, forecasts are produced using a Forecasting Support System that integrates univariate statistical forecasts with judgment from experts in the organization. Managers then add information to the forecast, such as future promotions, potentially improving the accuracy. Despite the importance of judgment and promotions, papers devoted to studying their relationship with forecasting performance are scarce. We analyze the accuracy of managerial adjustments in periods of promotions, based on weekly data from a manufacturing company. Intervention analysis is used to establish whether judgmental adjustments can be replaced by multivariate statistical models when responding to promotional information. We show that judgmental adjustments can enhance baseline forecasts during promotions, but not systematically. Transfer function models based on past promotions information achieved lower overall forecasting errors. Finally, a hybrid model illustrates the fact that human experts still added value to the transfer function models.
Article
Full-text available
Interest in using artificial neural networks (ANNs) for forecasting has led to a tremendous surge in research activities in the past decade. While ANNs provide a great deal of promise, they also embody much uncertainty. Researchers to date are still not certain about the effect of key factors on forecasting performance of ANNs. This paper presents a state-of-the-art survey of ANN applications in forecasting. Our purpose is to provide (1) a synthesis of published research in this area, (2) insights on ANN modeling issues, and (3) the future research directions.
Article
Full-text available
Quantitative forecasting techniques are not much used in organizations. Instead, organizations rely on the judgement of managers working close to the product market. Increasingly however, developments at the interface between marketing and operations require more accurate forecasting. Quantitative marketing models have that potential. Drawing on theories from the 'diffusion of innovation' literature and results on 'the barriers to effective implementation', this paper first considers those factors that should be included in any complete evaluation of market forecasting. Using this framework and based on detailed survey work in a multi-divisional organization, the paper then describes how this company produces its market forecasts, and the perceptions of its managers as to inadequacies in the procedures. Reasons are proposed as to why quantitative forecasting techniques are not effectively used. The paper concludes with a discussion of the causes behind the organization's mismanagement of their forecasting activity and how these activities might best be improved.
Article
Full-text available
Negative information tends to influence evaluations more strongly than comparably extreme positive information. To test whether this negativity bias operates at the evaluative categorization stage, the authors recorded event-related brain potentials (ERPs), which are more sensitive to the evaluative categorization than the response output stage, as participants viewed positive, negative, and neutral pictures. Results revealed larger amplitude late positive brain potentials during the evaluative categorization of (a) positive and negative stimuli as compared with neutral stimuli and (b) negative as compared with positive stimuli, even though both were equally probable, evaluatively extreme, and arousing. These results provide support for the hypothesis that the negativity bias in affective processing occurs as early as the initial categorization into valence classes.
Article
Full-text available
The analysis of repeated-measures data presents challenges to investigators and is a topic for ongoing discussion in the Archives of General Psychiatry. Traditional methods of statistical analysis (end-point analysis and univariate and multivariate repeated-measures analysis of variance [rANOVA and rMANOVA, respectively]) have known disadvantages. More sophisticated mixed-effects models provide flexibility, and recently developed software makes them available to researchers. To review methods for repeated-measures analysis and discuss advantages and potential misuses of mixed-effects models. Also, to assess the extent of the shift from traditional to mixed-effects approaches in published reports in the Archives of General Psychiatry. The Archives of General Psychiatry from 1989 through 2001, and the Department of Veterans Affairs Cooperative Study 425. Studies with a repeated-measures design, at least 2 groups, and a continuous response variable. The first author ranked the studies according to the most advanced statistical method used in the following order: mixed-effects model, rMANOVA, rANOVA, and end-point analysis. The use of mixed-effects models has substantially increased during the last 10 years. In 2001, 30% of clinical trials reported in the Archives of General Psychiatry used mixed-effects analysis. Repeated-measures ANOVAs continue to be used widely for the analysis of repeated-measures data, despite risks to interpretation. Mixed-effects models use all available data, can properly account for correlation between repeated measurements on the same subject, have greater flexibility to model time effects, and can handle missing data more appropriately. Their flexibility makes them the preferred choice for the analysis of repeated-measures data.
Chapter
Despite advances in predictive analytics there is much evidence that algorithm-based forecasts are often subject to judgmental adjustments or overrides. This chapter explores the role of scenarios in supporting the role of judgment when algorithmic (or model-based) forecasts are available. Scenarios provide powerful narratives in envisioning alternative futures and play an important role in both planning for uncertainties and challenging managerial thinking. Through offering structured storylines of plausible futures, scenarios may also enhance forecasting agility and offer collaborative pathways for information sharing. Even though the potential value of using scenarios to complement judgmental forecasts has been recognized, the empirical work remains scarce. A review of the relevant research suggests the merit of supplying scenarios to judgmental forecasters is mixed and can result in an underestimation of the extent of uncertainty associated with forecasts, but a greater acceptance of model-based point predictions. These findings are generally supported by the results of a behavioral experiment that we report. This study was used to examine the effects of scenario tone and extremity on individual and group-based judgmental predictions when a model-based forecast was available. The implications of our findings are discussed with respect to (i) eliciting judgmental forecasts using different predictive formats, (ii) sharing scenarios with varying levels of optimism and pessimism, and (iii) incorporating scenario approaches to address forecast uncertainty.KeywordsScenarioJudgmentForecastUncertainty
Preprint
Our research examines how to integrate human judgment and statistical algorithms for demand planning in an increasingly data-driven and automated environment. We use a laboratory experiment combined with a field study to compare existing integration methods with a novel approach: Human-Guided Learning. This new method allows the algorithm to use human judgment to train a model using an iterative linear weighting of human judgment and model predictions. Human-Guided Learning is more accurate vis-à-vis the established integration methods of Judgmental Adjustment , Quantitative Correction of Human Judgment, Forecast Combination, and Judgment as a Model Input. Human-Guided Learning performs similarly to Integrative Judgment Learning, but under certain circumstances, Human-Guided Learning can be more accurate. Our studies demonstrate that the benefit of human judgment for demand planning processes depends on the integration method. K E Y W O R D S behavioral experiment, demand planning, digitization, field study, forecasting, human judgment, machine learning Highlights • Human judgment is an essential element of demand forecasting but requires integration into the forecasting process. • Giving people too much influence may introduce more noise than signal. Our research examines different ways of integrating human judgment into a forecasting process and shows that an effortless way of doing so-by allowing people to indicate that a special event is affecting a forecast and an algorithm to estimate the impact of that event-performs remarkably well in comparison to other methods.
Article
Time series are often presented graphically, and forecasters often judgmentally extrapolate graphically presented data. However, graphs come in many different formats: here, we examine the effect of format when non-experts make forecasts from data presented as bar charts, line graphs, and point graphs. In four web-based experiments with over 4000 participants, we elicited judgmental forecasts for eight points that followed a trended time series containing 50 points. Forecasts were lower for bar charts relative to either line or point graphs. Factors potentially affecting these format effects were investigated: We found that the intensity of shading had no effect on forecasts and that using horizontal stepped lines led to higher forecasts than bars. We also found that participants added more noise to their forecasts for bars than for points, leading to worse performance overall. These findings suggest that format significantly influences judgmental time series forecasts.
Article
Despite improvements in statistical forecasting, human judgment remains fundamental to business forecasting and demand planning. Typically, forecasters do not rely solely on statistical forecasts; they also adjust forecasts according to their knowledge, experience, and information that is not available to statistical models. However, we have limited understanding of the adjustment mechanisms employed, particularly how people use additional information (e.g., special events and promotions, weather, holidays) and under which conditions this is beneficial. Using a multi-method approach, we first analyse a UK retailer case study exploring its operations and the forecasting process. The case study provides a contextual setting for the laboratory experiments that simulate a typical supply chain forecasting process. In the experimental study, we provide past sales, statistical forecasts (using baseline and promotional models) and qualitative information about past and future promotional periods. We include contextual information, with and without predictive value, that allows us to investigate whether forecasters can filter such information correctly. We find that when adjusting, forecasters tend to focus on model-based anchors, such as the last promotional uplift and the current statistical forecast, ignoring past baseline promotional values and additional information about previous promotions. The impact of contextual statements for the forecasting period depends on the type of statistical predictions provided: when a promotional forecasting model is presented, people tend to misinterpret the provided information and over-adjust, harming accuracy.
Article
Hierarchical forecasting is needed in many situations in the supply chain to support decision making. Top-down, bottom-up, and optimal linear combination methods are common in hierarchical forecasting. There is no universally optimal solution for hierarchical forecasting, and each method has some advantages and disadvantages. While top-down and bottom-up methods use only the information at the top and bottom levels, respectively, linear combinations use the individual sales forecasts from all series and levels and combine them linearly, often outperforming the conventional top-down and bottom-up methods. These methods do not directly utilise the explanatory information such as price and promotion status that may be available across different levels in the hierarchy, and their performance may be impacted by these external factors. We propose to use a multi-output regression model that utilises the explanatory variables from across hierarchical levels to simultaneously generate forecasts for all the series at the bottom level. We perform an in-depth analysis of 55 sets of fast-moving consumer goods time series and 3049 products of the M5 forecasting competition data. Our results show that our proposed algorithm effectively utilises explanatory variables from across the hierarchy to generate reliable forecasts for different hierarchical levels, especially in the presence of deep promotional discounts.
Article
In contrast to the conventional view that analysts forecast optimistically, we provide evidence of the Negativity Bias. Analysts show negative forecast bias associated with their relative local income growth, whether the growth is positive or negative. The bias is stronger for negative growth than for positive growth. The negative bias also directly affects the bias of the next analysts of the same and peer earnings being forecast. Our results suggest non-fundamental factors at work.
Article
Reliable demand forecasts are critical for effective supply chain management. Several endogenous and exogenous variables can influence the dynamics of demand, and hence a single statistical model that only consists of historical sales data is often insufficient to produce accurate forecasts. In practice, the forecasts generated by baseline statistical models are often judgmentally adjusted by forecasters to incorporate factors and information that are not incorporated in the baseline models. There are however systematic events whose effect can be quantified and modeled to help minimize human intervention in adjusting the baseline forecasts. In this paper, we develop and test a novel regime-switching approach to quantify systematic information/events and objectively incorporate them into the baseline statistical model. Our simple yet practical and effective model can help limit forecast adjustments to only focus on the impact of less systematic events such as sudden climate change or dynamic market activities. The model is validated empirically using sales and promotional data from two Australian companies. The model is also benchmarked against commonly employed statistical and machine learning forecasting models. Discussions focus on thorough analysis of promotions impact and benchmarking results. We show that the proposed model can successfully improve forecast accuracy and avoid poor forecasts when compared to the current industry practice which heavily relies on human judgment to factor in all types of information/events. The proposed model also outperforms sophisticated machine learning methods by mitigating the generation of extremely poor forecasts that drastically differ from actual sales due to changes in demand states.
Article
The demand for a particular product or service is typically associated with different uncertainties that can make them volatile and challenging to predict. Demand unpredictability is one of the managers’ concerns in the supply chain that can cause large forecasting errors, issues in the upstream supply chain and impose unnecessary costs. We investigate 843 real demand time series with different values of coefficient of variations (CoV) where promotion causes volatility over the entire demand series. In such a case, forecasting demand for different CoV require different models to capture the underlying behavior of demand series and pose significant challenges due to very different and diverse demand behavior. We decompose demand into baseline and promotional demand and propose a hybrid model to forecast demand. Our results indicate that our proposed hybrid model generates robust and accurate forecast and robust inventory performance across series with different levels of volatilities. We stress the necessity of decomposition for volatile demand series. We also model demand series with a number of well known statistical and machine learning (ML) models to investigate their forecast accuracy and inventory performance empirically. We found that ARIMA with covariate (ARIMAX) works well to forecast volatile demand series, but exponential smoothing with covariate (ETSX) has a poor performance. Support vector regression (SVR) and dynamic linear regression (DLR) models generate robust forecasts across different categories of demands with different CoV values. In terms of inventory performance, ARIMAX and combination models have superior performance to the other presented models. The hybrid algorithm also depicts robust performance across different series with different CoVs and has low inventory costs.
Article
This paper reviews the research literature on forecasting retail demand. We begin by introducing the forecasting problems that retailers face, from the strategic to the operational, as sales are aggregated over products to stores and to the company overall. Aggregated forecasting supports strategic decisions on location. Product-level forecasts usually relate to operational decisions at the store level. The factors that influence demand, and in particular promotional information, add considerable complexity, so that forecasters potentially face the dimensionality problem of too many variables and too little data. The paper goes on to evaluate evidence on the comparative forecasting accuracy. Although causal models outperform simple benchmarks, adequate evidence on machine learning methods has not yet accumulated. Methods for forecasting new products are examined separately, with little evidence being found on the effectiveness of the various approaches. The paper concludes by describing company forecasting practices, offering conclusions as to both research gaps and the barriers to improved practice.
Article
Demand forecasts are the lifeblood of supply chains. Academic literature and common industry practices indicate that demand forecasts are often subject to human interventions. Judgmental forecasting or judgmental forecast adjustments can cause both positive and negative repercussions to the rest of the supply chain. This paper provides the first systematic literature review of judgmental forecasting and adjustments focusing on key features that impact various decisions in supply chains. A carefully assembled and shortlisted literature pool is analyzed for systematic mapping of the published works using bibliometric tools. The primary sub streams of research within the broader scope of the field are synthesized from a rigorous keyword cluster analysis and a thorough discussion is presented. Our review concludes by encapsulating the key learnings from four decades of academic research in judgmental forecasting and suggests future research avenues to expand our understanding of the role of humans in demand forecasting and supply chain decision-making.
Article
Product forecasts are a critical input into sourcing, procurement, production, inventory, logistics, finance and marketing decisions. Numerous quantitative models have been developed and applied to generate and improve product forecasts. The use of human judgement, either solely or in conjunction with quantitative models, has been well researched in the academic literature and is a popular forecasting approach in industry practice. In the context of judgemental forecasting, methods that integrate an expert's judgement into quantitative forecasting models are commonly referred to as “integrating forecasting” methods. This paper presents a systematic review of the literature of judgemental demand forecasting with a focus placed on integrating methods. We explore the role of expert opinion and contextual information and discuss the application of behaviourally informed support systems. We also provide important directions for further research in these areas.
Article
Demand forecasting is critical to sales and operations planning (S&OP), but the effects of sales promotions can be difficult to forecast. Typically, a baseline statistical forecast is judgmentally adjusted on receipt of information from different departments. However, much of this information either has no predictive value or its value is unknown. Research into base rate discounting has suggested that such information may distract forecasters from the average uplift and reduce accuracy. This has been investigated in situations in which forecasters were able to adjust the statistical forecasts for promotions via a forecasting support system (FSS). In two ecologically valid experiments, forecasters were provided with the mean level of promotion uplift, a baseline statistical forecast, and quantitative and qualitative information. However, the forecasters were distracted from the base rate and misinterpreted the information available to them. These findings have important implications for the design of organizational S&OP processes, and for the implementation of FSSs.
Article
How effective are different approaches for the provision of forecasting support? Forecasts may be either unaided or made with the help of statistical forecasts. In practice, the latter are often crude forecasts that do not take sporadic perturbations into account. Most research considers forecasts based on series that have been cleansed of perturbation effects. This paper considers an experiment in which people made forecasts from time series that were disturbed by promotions. In all conditions, under-forecasting occurred during promotional periods and over-forecasting during normal ones. The relative sizes of these effects depended on the proportions of periods in the data series that contained promotions. The statistical forecasts improved the forecasting accuracy, not because they reduced these biases, but because they decreased the random error (scatter). The performance improvement did not depend on whether the forecasts were based on cleansed series. Thus, the effort invested in producing cleansed time series from which to forecast may not be warranted: companies may benefit from giving their forecasters even crude statistical forecasts. In a second experiment, forecasters received optimal statistical forecasts that took the effects of promotions into account fully. This increased the accuracy because the biases were almost eliminated and the random error was reduced by 20%. Thus, the additional effort required to produce forecasts that take promotional effects into account is worthwhile.
Article
Biased forecasts, particularly the inadequate adjustment from current values and excessive clustering, are increasingly explained as resulting from anchoring. However, experiments presented in support of this interpretation lack economic conditions, particularly monetary incentives, feedback for learning effects and an optimal strategy of unbiased predictions. In a novel forecasting experiment, we find monetary incentives to reduce anchoring for simple forecasting tasks only, while higher task complexity and risk increase the bias in spite of incentives for accuracy. Anchors ubiquitously reduce the forecasts’ variance, while individual cognitive abilities and learning effects show debiasing effects only in some conditions. Our results emphasize that biased forecasts and their specific variance can result from anchoring.
Article
Judgment: Its Role and Value for Strategy (S. Makridakis & A. Gaba). Scenario Planning: Scaffolding Disorganized Ideas about the Future (K. van der Heijden). Judgmental Forecasting and the Use of Available Information (M. O'Connor & M. Lawrence). Enhancing Judgmental Sales Forecasting: The Role of Laboratory Research (P. Goodwin). Heuristics and Biases in Judgmental Forecasting (F. Bolger & N. Harvey). Financial Forecasting with Judgment (D. Onkal-Atay). Reasoning with Category Knowledge in Probability Forecasting: Typicality and Perceived Variability Effects (G. Browne & S. Curley). The Use of Structured Groups to Improve Judgmental Forecasting (G. Rowe). How Bad Is Human Judgment? (P. Ayton). Integration of Statistical Methods and Judgment for Time Series Forecasting: Principles from Empirical Research (J. Armstrong & F. Collopy). Index.
Article
Previous work has shown that people use anchor-and-adjust heuristics to forecast future data points from previous ones in the same series. We report three experiments that show that they use different versions of this heuristic for different types of series. To forecast an untrended series, our subjects always took a weighted average of the long-term mean of the series and the last data point. In contrast, the way that they forecast a trended series depended on the serial dependences in it. When these were low, people forecast by adding a proportion of the last difference in the series to the last data point. When stronger serial dependences made this difference less similar to the next one, they used a version of the averaging heuristic that they employed for untrended series. This could take serial dependences into account and included a separate component for trend. These results suggest that people use a form of the heuristic that is well adapted to the nature of the series that they are forecasting. However, we also found that the size of their adjustments tended to be suboptimal. They overestimated the degree of serial dependence in the data but underestimated trends. This biased their forecasts.
Article
The paper considers the use of information by a panel of expert industry forecasters, focusing on their information-processing biases. The panel forecasts construction output by sector up to three years ahead. It is found that the biases observed in laboratory experiments, particularly ‘anchoring’, are observable. The expectations are formed by adjusting the previous forecast to take new information into account. By analysing forecast errors it is concluded that the panel overweight recently released information and do not understand the dynamics of the industry. However, their forecasts, both short and long term, are better than an alternative econometric model, and combining the two sources of forecasts leads to a deterioration in forecast accuracy. The expert forecasts can be ‘de-biased’, and this leads to the conclusion that it is better to optimally process information sources than to combine (optimally) alternative forecasts.
Article
Managers are often required to integrate their own forecasts with statistical forecasts. The studies reported in this paper examine the efficacy of allowing people to adjust their own forecasts in the light of statistical forecasts that are provided to them. Three experimental studies varied the reliability of the statistical forecasts and examined the performance of people over time. Issues of the form of the feedback and the use of decision support were also examined. The results unequivocally suggest that the effectiveness of judgemental adjustment depended on the statistical model's reliability and seasonally of time series. However, people had considerable difficulty placing less weight on their own forecasts (compared to the statistical forecasts) and this behaviour became more pronounced over time. Even provision of decision support did not improve performance at the task.
Article
Demand forecasting is a crucial aspect of the planning process in supply-chain companies. The most common approach to forecasting demand in these companies involves the use of a computerized forecasting system to produce initial forecasts and the subsequent judgmental adjustment of these forecasts by the company’s demand planners, ostensibly to take into account exceptional circumstances expected over the planning horizon. Making these adjustments can involve considerable management effort and time, but do they improve accuracy, and are some types of adjustment more effective than others? To investigate this, we collected data on more than 60,000 forecasts and outcomes from four supply-chain companies. In three of the companies, on average, judgmental adjustments increased accuracy. However, a detailed analysis revealed that, while the relatively larger adjustments tended to lead to greater average improvements in accuracy, the smaller adjustments often damaged accuracy. In addition, positive adjustments, which involved adjusting the forecast upwards, were much less likely to improve accuracy than negative adjustments. They were also made in the wrong direction more frequently, suggesting a general bias towards optimism. Models were then developed to eradicate such biases. Based on both this statistical analysis and organisational observation, the paper goes on to analyse strategies designed to enhance the effectiveness of judgmental adjustments directly.
Article
A number of recent studies have shown how important non-time series information (especially event information) is to forecast accuracy. This study examines the way people adjust time series for this additional information and how they cope with increasing amounts of it. It also examines the contribution of forecasting support systems (FSS) to help manage the information integration process. Results indicate that people benefit from the use of the decomposition-based decision aid in the task, but, unexpectedly, there was no greater benefit when information load was greatest. The characteristics of judgemental adjustments are discussed in relation to the trend of the series and the juxtaposition of information with random fluctuations.
Article
The past 25 years has seen phenomenal growth of interest in judgemental approaches to forecasting and a significant change of attitude on the part of researchers to the role of judgement. While previously judgement was thought to be the enemy of accuracy, today judgement is recognised as an indispensable component of forecasting and much research attention has been directed at understanding and improving its use. Human judgement can be demonstrated to provide a significant benefit to forecasting accuracy but it can also be subject to many biases. Much of the research has been directed at understanding and managing these strengths and weaknesses. An indication of the explosion of research interest in this area can be gauged by the fact that over 200 studies are referenced in this review.
Article
Judgmental adjustments of statistical forecasts are widely used for improving forecast accuracy. Despite the overall effectiveness of this method, it may allow forecasters to introduce biases in statistical forecasts when they judgmentally adjust them. This paper considers three types of bias: (1) optimism bias, (2) anchoring bias, and (3) overreaction bias. We explore the effects of particular individual differences, specifically personality, motivational orientation, and work locus of control, on forecasting biases. The results indicate that a forecaster’s personality and motivational orientation have significant effects on forecasting biases, whereas work locus of control has no effect on forecasting biases. Our analysis further indicates that experience, work locus of control and motivational orientation drive a forecaster’s willingness to judgmentally adjust a statistical forecast.
Article
In this paper we review methods that aim to aid the anticipation of rare, high-impact, events. We evaluate these methods according to their ability to yield well-calibrated probabilities or point forecasts for such events. We first identify six factors that can lead to poor calibration and then examine how successful the methods are in mitigating these factors. We demonstrate that all the extant forecasting methods — including the use of expert judgment, statistical forecasting, Delphi and prediction markets — contain fundamental weaknesses. We contrast these methods with a non-forecasting method that is intended to aid planning for the future — scenario planning. We conclude that all the methods are problematic for aiding the anticipation of rare events and that the only remedies are to either (i) to provide protection for the organization against the occurrence of negatively-valenced events whilst allowing the organization to benefit from the occurrence of positively-valenced events, or (ii) to provide conditions to challenge one's own thinking — and hence improve anticipation. We outline how components of devil's advocacy and dialectical inquiry can be combined with Delphi and scenario planning to enhance anticipation of rare events.
Article
We report two experiments designed to study the effect of data presentation format on the accuracy of judgemental forecasts. In the first one, people studied 44 different 20-point time series and forecast the 21st and 22nd points of each one. Half the series were presented graphically and half were in tabular form. Root mean square error (RMSE) in forecasts was decomposed into constant error (to measure bias) and variable error (to measure inconsistency). For untrended data, RMSE was somewhat higher with graphical presentation: inconsistency and an overforecasting bias were both greater with this format. For trended data, RMSE was higher with tabular presentation. This was because underestimation of trends with this format was so much greater than with graphical presentation that it overwhelmed the smaller but opposing effects that were observed with untrended series. In the second experiment, series were more variable but very similar results were obtained.
Article
Accurate forecasts are crucial to successful organizational planning. In 2001, 40 international experts published a set of principles to guide best practices in forecasting. Some of these principles relate to the use of management judgment. Most organizations use judgment at some stage in their forecasting process, but do they do so effectively? Although judgment can lead to significant improvements in forecasting accuracy, it can also be biased and inconsistent. The principles show how forecasters should use judgment and assess its effectiveness. We conducted a survey of 149 forecasters to examine the use of judgment based on these established principles and to investigate whether their forecasting procedures were consistent with the principles. In addition, we conducted four in-depth case studies. Although we found examples of good practice, we also discovered that many organizations would improve forecast accuracy if they followed basic principles such as limiting judgmental adjustments of quantitative forecasts, requiring managers to justify their adjustments in writing, and assessing the results of judgmental interventions.
Article
The general linear mixed model provides a useful approach for analysing a wide variety of data structures which practising statisticians often encounter. Two such data structures which can be problematic to analyse are unbalanced repeated measures data and longitudinal data. Owing to recent advances in methods and software, the mixed model analysis is now readily available to data analysts. The model is similar in many respects to ordinary multiple regression, but because it allows correlation between the observations, it requires additional work to specify models and to assess goodness-of-fit. The extra complexity involved is compensated for by the additional flexibility it provides in model fitting. The purpose of this tutorial is to provide readers with a sufficient introduction to the theory to understand the method and a more extensive discussion of model fitting and checking in order to provide guidelines for its use. We provide two detailed case studies, one a clinical trial with repeated measures and dropouts, and one an epidemiological survey with longitudinal follow-up.
Article
I study a budget-constrained, private-valuation, sealed-bid sequential auction with two incompletely-informed, risk-neutral bidders in which the valuations and income may be non-monotonic functions of a bidder's type. Multiple equilibrium symmetric bidding functions may exist that differ in allocation, efficiency and revenue. The sequence of sale affects the competition for a good and therefore also affects revenue and the prices of each good in a systematic way that depends on the relationship among the valuations and incomes of bidders. The sequence of sale may affect prices and revenue even when the number of bidders is large relative to the number of goods. If a particular good, say [alpha], is allocated to a strong bidder independent of the sequence of sale, then auction revenue and the price of good [alpha] are higher when good [alpha] is sold first.
Article
Anchoring is a pervasive judgment bias in which decision makers are systematically influenced by random and uninformative starting points. While anchors have been shown to affect a broad range of judgments including answers to knowledge questions, monetary evaluations, and social judgments, the underlying causes of anchoring have been explored only recently. We suggest that anchors affect judgments by increasing the availability and construction of features that the anchor and target hold in common and reducing the availability of features of the target that differ from the anchor. We test this notion of anchoring as activation in five experiments that examine the effects of several experimental manipulations on judgments of value and belief as well as on measures of cognitive processes. Our results indicate that prompting subjects to consider features of the item that are different from the anchor reduces anchoring, while increasing consideration of similar features has no effect. The anchoring-as-activation approach provides a mechanism for debiasing anchoring and also points to a common mechanism underlying anchoring and a number of other judgment phenomena.
Article
Subjects made a series of six forecasts for each of four 58-point time-series. The stimulus series were noisy sinusoids of various frequencies. Individual subjects′ forecasts were regressed on to the time period for which they were made. The residual variance in these regressions was greater when the stimulus series were noisier. This effect cannot have arisen from memory overload because simultaneous presentation of all points in the stimulus series meant that there were no memory demands. It cannot have been a side effect of a learning process because presenting each series just once excluded opportunities for learning. It cannot have been caused by deterministic rule switching consequent on feedback about a noisy process because no feedback was presented. Instead it appears that subjects tried to simulate the noise as well as the pattern in the series when making their forecasts. In support of this, similar results were obtained in a second experiment in which subjects were asked to simulate rather than to forecast the series. Two possible reasons for subjects′ simulation of series noise are discussed.
Language models are few-shot learners
  • Brown