Eric Stone’s research while affiliated with University of Massachusetts Boston and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Figure 1. Mean Brier score for independent, team-based prediction polls and prediction markets with 114 questions. Errors bars denote one standard error of the difference between each polling method and the prediction market. 
Table 1 Participants Across Elicitation Methods
Figure 2. Aggregate performance for independent and team-based prediction polls, varying temporal decay (2a) and recalibration (2b) parameters. All other parameters remain at their optimized levels. 
Table 2 Relative Performance as Judged by Four Scoring Rules
Figure 3. Performance for prediction markets for varying temporal decay (3a) and recalibration (3b) parameters. All other parameters remain at their optimized levels. 

+2

Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls
  • Article
  • Full-text available

March 2017

·

6,790 Reads

·

20 Citations

Management Science

·

Phillip Rescober

·

Eric Stone

·

[...]

·

We report the results of the first large-scale, long-term, experimental test between two crowdsourcing methods: prediction markets and prediction polls. More than 2,400 participants made forecasts on 261 events over two seasons of a geopolitical prediction tournament. Forecasters were randomly assigned to either prediction markets (continuous double auction markets) in which they were ranked based on earnings, or prediction polls in which they submitted probability judgments, independently or in teams, and were ranked based on Brier scores. In both seasons of the tournament, prices from the prediction market were more accurate than the simple mean of forecasts from prediction polls. However, team prediction polls outperformed prediction markets when forecasts were statistically aggregated using temporal decay, differential weighting based on past performance, and recalibration. The biggest advantage of prediction polls was at the beginning of long-duration questions. Results suggest that prediction polls with proper scoring feedback, collaboration features, and statistical aggregation are an attractive alternative to prediction markets for distilling the wisdom of crowds. This paper was accepted by Uri Gneezy, behavioral economics.

Download

Distilling the wisdom of crowds: Prediction markets vs. prediction polls

March 2017

·

220 Reads

·

173 Citations

Management Science

We report the results of the first large-scale, long-term, experimental test between two crowdsourcing methods: prediction markets and prediction polls. More than 2,400 participants made forecasts on 261 events over two seasons of a geopolitical prediction tournament. Forecasters were randomly assigned to either prediction markets (continuous double auction markets) in which they were ranked based on earnings, or prediction polls in which they submitted probability judgments, independently or in teams, and were ranked based on Brier scores. In both seasons of the tournament, prices from the prediction market were more accurate than the simple mean of forecasts from prediction polls. However, team prediction polls outperformed prediction markets when forecasts were statistically aggregated using temporal decay, differential weighting based on past performance, and recalibration. The biggest advantage of prediction polls was atthe beginning of long-duration questions. Results suggest that prediction polls with proper scoring feedback, collaboration features, and statistical aggregation are an attractive alternative to prediction markets for distilling the wisdom of crowds.



Identifying and Cultivating Superforecasters as a Method of Improving Probabilistic Predictions

May 2015

·

6,139 Reads

·

202 Citations

Perspectives on Psychological Science

Across a wide range of tasks, research has shown that people make poor probabilistic predictions of future events. Recently, the U.S. Intelligence Community sponsored a series of forecasting tournaments designed to explore the best strategies for generating accurate subjective probability estimates of geopolitical events. In this article, we describe the winning strategy: culling off top performers each year and assigning them into elite teams of superforecasters. Defying expectations of regression toward the mean 2 years in a row, superforecasters maintained high accuracy across hundreds of questions and a wide array of topics. We find support for four mutually reinforcing explanations of superforecaster performance: (a) cognitive abilities and styles, (b) task-specific skills, (c) motivation and commitment, and (d) enriched environments. These findings suggest that superforecasters are partly discovered and partly created-and that the high-performance incentives of tournaments highlight aspects of human judgment that would not come to light in laboratory paradigms focused on typical performance. © The Author(s) 2015.


Figure 1. Distribution of Brier scores over forecasters plotted against category bins of size .049. The category labeled .15 refers to Brier scores between .10 and .149. This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Table 1 Descriptive Statistics Mean SD Min Max Alpha
Table 2 Correlations Among Dispositional, Situational, and Behavioral Variables Std BS Ravens CRT ExCRT Numeracy AOMT Nfclo Foxhed PKY1 PKY2 Train Teams Npredq Nquest
Table 3 Indirect and Total Contributions in Mediation Analyses Independent Mediator Dependent Indirect p value Total p value
Figure 4. Structural equation model with standardized coefficients. See the online article for the color version of this figure. This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics

January 2015

·

1,713 Reads

·

190 Citations

Journal of Experimental Psychology Applied

This article extends psychological methods and concepts into a domain that is as profoundly consequential as it is poorly understood: intelligence analysis. We report findings from a geopolitical forecasting tournament that assessed the accuracy of more than 150,000 forecasts of 743 participants on 199 events occurring over 2 years. Participants were above average in intelligence and political knowledge relative to the general population. Individual differences in performance emerged, and forecasting skills were surprisingly consistent over time. Key predictors were (a) dispositional variables of cognitive ability, political knowledge, and open-mindedness; (b) situational variables of training in probabilistic reasoning and participation in collaborative teams that shared information and discussed rationales (Mellers, Ungar, et al., 2014); and (c) behavioral variables of deliberation time and frequency of belief updating. We developed a profile of the best forecasters; they were better at inductive reasoning, pattern detection, cognitive flexibility, and open-mindedness. They had greater understanding of geopolitics, training in probabilistic reasoning, and opportunities to succeed in cognitively enriched team environments. Last but not least, they viewed forecasting as a skill that required deliberate practice, sustained effort, and constant monitoring of current affairs.


Two Reasons to Make Aggregated Probability Forecasts More Extreme

June 2014

·

187 Reads

·

119 Citations

Decision Analysis

When aggregating the probability estimates of many individuals to form a consensus probability estimate of an uncertain future event, it is common to combine them using a simple weighted average. Such aggregated probabilities correspond more closely to the real world if they are transformed by pushing them closer to 0 or 1. We explain the need for such transformations in terms of two distorting factors: The first factor is the compression of the probability scale at the two ends, so that random error tends to push the average probability toward 0.5. This effect does not occur for the median forecast, or, arguably, for the mean of the log odds of individual forecasts. The second factor-which affects mean, median, and mean of log odds-is the result of forecasters taking into account their individual ignorance of the total body of information available. Individual confidence in the direction of a probability judgment (high/low) thus fails to take into account the wisdom of crowds that results from combining different evidence available to different judges. We show that the same transformation function can approximately eliminate both distorting effects with different parameters for the mean and the median. And we show how, in principle, use of the median can help distinguish the two effects.


Table 1. Average Brier Score Accuracy by Time Period in Years 1 and 2 
Fig. 1. Effects of training, teaming, and tracking on average Brier scores in Year 1 (Y1) and Year (Y2). The bars at the left show results for the no-training ("None"), probability-training ("Prob"), and scenario-training ("Scen") conditions; the bars at the right show results for independent forecasters ("Inds"), crowd-belief forecasters ("CBs"), team forecasters ("Teams"), and superforecasters ("SFs"). Error bars represent ±2 SEs.
Fig. 4. Average resolution score as a function of group influence ( " Inds " = independent forecasters; " CBs " = crowd-belief forecasters) in Year 1 (Y1) and as a function of group influence and tracking ( " SFs " = superforecasters) in Year 2 (Y2). Error bars represent ±2 SEs.  
Psychological Strategies for Winning a Geopolitical Forecasting Tournament

March 2014

·

1,845 Reads

·

272 Citations

Psychological Science

Five university-based research groups competed to recruit forecasters, elicit their predictions, and aggregate those predictions to assign the most accurate probabilities to events in a 2-year geopolitical forecasting tournament. Our group tested and found support for three psychological drivers of accuracy: training, teaming, and tracking. Probability training corrected cognitive biases, encouraged forecasters to use reference classes, and provided forecasters with heuristics, such as averaging when multiple estimates were available. Teaming allowed forecasters to share information and discuss the rationales behind their beliefs. Tracking placed the highest performers (top 2% from Year 1) in elite teams that worked together. Results showed that probability training, team collaboration, and tracking improved both calibration and resolution. Forecasting is often viewed as a statistical problem, but forecasts can be improved with behavioral interventions. Training, teaming, and tracking are psychological interventions that dramatically increased the accuracy of forecasts. Statistical algorithms (reported elsewhere) improved the accuracy of the aggregation. Putting both statistics and psychology to work produced the best forecasts 2 years in a row.


Fig. 1. Brier scores per IFP for ULinOp, core prediction market and marketcast 
Table 1 . Possible combinations between elicitation and aggregation methods
The Marketcast Method for Aggregating Prediction Market Forecasts

April 2013

·

288 Reads

·

3 Citations

Lecture Notes in Computer Science

We describe a hybrid forecasting method called marketcast. Marketcasts are based on bid and ask orders from prediction markets, aggregated using techniques associated with survey methods, rather than market matching algorithms. We discuss the process of conversion from market orders to probability estimates, and simple aggregation methods. The performance of marketcasts is compared to a traditional prediction market and a traditional opinion poll. Overall, marketcasts perform approximately as well as prediction markets and opinion poll methods on most questions, and performance is stable across model specifications.

Citations (7)


... Probability forecasts (vertical axis, 0 to 1) submitted to Metaculus by date (horizontal axis) in response to the title question Forecast aggregation, by which we can access the "wisdom of crowds" [Surowiecki, 2005], is well developed for static probability forecasts. With no information beyond the raw distribution one can use simple measures of central tendency such as the mean or median, more subtle measures such as the extremized mean [Atanasov et al., 2017], or more exotic statistics of the distribution [Powell et al., 2022]. If it is possible to measure information heterogeneity or forecaster quality, much more can be done, typically by constructing various forms of weighted pool [Ranjan and Gneiting, 2010, Clements and Harvey, 2011, Satopää et al., 2014, Budescu and Chen, 2015. ...

Reference:

Kairosis: A method for dynamical probability forecast aggregation informed by Bayesian change point detection
Distilling the wisdom of crowds: Prediction markets vs. prediction polls
  • Citing Article
  • March 2017

Management Science

... A popular method for consolidating these forecasts is the Delphi method (Armstrong, 2008;Delphi, 1975;Hyndman & Athanasopoulos, 2018). However, several researchers have discovered that these collective predictions, also referred to as ''wisdom of the crowds'' or ''collective intelligence'', are susceptible to inaccuracies and low precision, particularly when the individuals surveyed are pundits or uninformed laypersons (Atanasov et al., 2015;Modis, 1999). ...

Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls

Management Science

... The insights of Nassim Nicholas Taleb into rare and unpredictable events, or "Black Swans," have highlighted the limits of traditional models (2007). (Coates IV, 2012), John Coates has studied the physiological effects of trading, linking stress to decision-making (2012), and (Mellers et al., 2015), Barbara Mellers has explored judgment under uncertainty (2014). Hirshleifer (1998Hirshleifer ( , 2001Hirshleifer ( , 2012Hirshleifer ( and 2015, Hersh Shefrin (1988, 1994, 2001, 2008, John W. Payne (1982, 1988), Andrei Shleifer (1990, and 2004), Meir statman (1999,2000,2009,2017 and 2019) Finally, Daniel Kahneman's exploration of happiness and its impact on financial behavior (2006) and Michael Lewis's portrayal of market anomalies in "The Big Short" (2010) underscore the ongoing relevance of behavioral finance in understanding market inefficiencies and investor behavior. ...

Identifying and Cultivating Superforecasters as a Method of Improving Probabilistic Predictions

Perspectives on Psychological Science

... Forecast aggregation is widely studied. Many studies explore various aggregating methodologies theoretically and empirically such as Clemen and Winkler (1986);Stock and Watson (2004); Jose and Winkler (2008); Baron et al. (2014);Satopää et al. (2014). Our work focuses on prior-free forecast aggregation, where an ignorant aggregator without access to the exact information structure is required to integrate predictions provided by multiple experts. ...

Two Reasons to Make Aggregated Probability Forecasts More Extreme
  • Citing Article
  • June 2014

Decision Analysis

... Cognitive tests are advantageous because they can be quickly administered and scored. Past research has shown that the tests are correlated with forecasting accuracy (Himmelstein, Atanasov, & Budescu, 2021;Mellers et al., 2015), so that we can quickly obtain information about which forecasters will be accurate and which will be inaccurate. But there are many potential cognitive tests that we could use, without much information about which tests we should use. ...

The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics

Journal of Experimental Psychology Applied

... Already prediction-polled superforecasters outperform prediction markets, the best collectively intelligent mechanism to date, so the fact that shifting from a statistical aggregation rule to an algorithm leads to even better predictive power is remarkable. Prior research in both prediction markets and prediction polling has identified the choice of an aggregation function materially changes the value of the probability estimates of the system (Atanasov et al., 2013;Atanasov et al., 2016). ...

The Marketcast Method for Aggregating Prediction Market Forecasts

Lecture Notes in Computer Science

... The obvious challenge is that predictions about the future can only be evaluated when the future arrives, and so establishing cause and effect is difficult over decadal or centennial time scales. While feedback can be slow at long time scales (e.g., end of century), shorter term foresighting (e.g., years) can provide more rapid feedback on our ability to think about the future (Mellers et al. 2014). ...

Psychological Strategies for Winning a Geopolitical Forecasting Tournament

Psychological Science