
Pavel D. AtanasovIE University
Pavel D. Atanasov
PhD, Psychology & Decision Science, UPenn
Assistant Professor at IE Business School, Co-PI of Human Forest project
About
60
Publications
30,248
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,331
Citations
Introduction
Human Forest, clinical trial forecasting, crowdsourcing, identifying predictive skill, belief updating, forecast aggregation, human-machine hybrids, prediction markets.
Additional affiliations
Pytho LLC
Position
- Co-founder
July 2012 - July 2015
Publications
Publications (60)
We report the results of the first large-scale, long-term, experimental test between two crowdsourcing methods: prediction markets and prediction polls. More than 2,400 participants made forecasts on 261 events over two seasons of a geopolitical prediction tournament. Forecasters were randomly assigned to either prediction markets (continuous doubl...
Proper scoring rules can be used to incentivize a forecaster to truthfully report her private beliefs about the probabilities of future events and to evaluate the relative accuracy of fore-casters. While standard scoring rules can score forecasts only once the associated events have been resolved, many applications would benefit from instant access...
Laboratory research has shown that both underreaction and overreaction to new information pose threats to forecasting accuracy. This article explores how real-world forecasters who vary in skill attempt to balance these threats. We distinguish among three aspects of updating: frequency, magnitude, and confirmation propensity. Drawing on data from a...
How do we effectively combine historical data and human insights to predict complex outcomes? How well do human crowds compete with predictive algorithms? We provide the first description of the Human Forest method, which enables forecasters to define custom reference classes, query a historical database and review base rates specific to their sele...
What systems should we use to elicit and aggregate judgmental forecasts? Who should be asked to make such forecasts? We address these questions by assessing two widely-used crowd prediction systems: prediction markets and prediction polls. Our main test compares a prediction market against team-based prediction polls, using data from a large, multi...
High-stakes debates often pivot on clashing estimates of outcomes that one side sees as so improbable as not to deserve policy prioritization. These debates are especially intractable when they focus on rare events ranging from disasters (e.g., existential risks from Artificial Intelligence, nuclear war, or bioengineered pandemics) to surprising su...
Sound decision-making relies on accurate prediction for tangible outcomes ranging from military conflict to disease outbreaks. To improve crowdsourced forecasting accuracy, we developed SAGE, a hybrid forecasting system that combines human and machine generated forecasts. The system provides a platform where users can interact with machine models a...
High-stakes debates often pivot on clashing estimates of outcomes that one side sees as so improbable as not to deserve policy prioritization. These debates are especially intractable when they focus on rare events ranging from disasters (e.g., existential risks from Artificial Intelligence, nuclear war, or bioengineered pandemics) to surprising su...
Gender discrimination is present across various fields, but identifying the underlying mechanism is challenging. We demonstrate own-gender favouritism in a field setting that allows for clean identification of tastes versus beliefs: the One Bid game on the TV show The Price Is Right. Players must guess an item’s value without exceeding it, leaving...
Who is good at prediction? Addressing this question is key to recruiting and cultivating accurate crowds and effectively aggregating their judgments. Recent research on superforecasting has demonstrated the importance of individual, persistent skill in crowd prediction. This chapter takes stock of skill identification measures in probability estima...
Sound decision‐making relies on accurate prediction for tangible outcomes ranging from military conflict to disease outbreaks. To improve crowdsourced forecasting accuracy, we developed SAGE, a hybrid forecasting system that combines human and machine generated forecasts. The system provides a platform where users can interact with machine models a...
Psychologists typically measure beliefs and preferences using self-reports, whereas economists are much more likely to infer them from behavior. Prediction markets appear to be a victory for the economic approach, having yielded more accurate probability estimates than opinion polls or experts for a wide variety of events, all without ever asking f...
Who is good at prediction? Addressing this question is key to recruiting and cultivating accurate crowds and effectively aggregating their judgments. Recent research on superforecasting has demonstrated the importance of individual, persistent skill in crowd prediction. This chapter takes stock of skill identification measures in probability estima...
Little is known about the extent to which medical expert communities can anticipate the outcomes of clinical trials. In this study, we collected 33 expert probability distribution forecasts for an ongoing precision medicine cancer trial (NSABP-B47 or NCT01275677) on the primary outcome (incidence of disease free survival) in study and comparator ar...
Forecasting tournaments are misaligned with the goal of producing actionable forecasts of existential risk, an extreme-stakes domain with slow accuracy feedback and elusive proxies for long-run outcomes. We show how to improve alignment by measuring facets of human judgment that playcentral roles in policy debates but have long been dismissed as un...
Supplementary analyses mentioned in Atanasov et al. (2020).
A growing body of research indicates that forecasting skill is a unique and stable trait: forecasters with a track record of high accuracy tend to maintain this record. But how does one identify skilled forecasters effectively? We address this question using data collected during two seasons of a longitudinal geopolitical forecasting tournament. Ou...
Laboratory research has shown that both underreaction and overreaction to new information pose threats to forecasting accuracy. This article explores how real-world forecasters who vary in skill attempt to balance these threats. We distinguish among three aspects of updating: frequency, magnitude, and confirmation propensity. Drawing on data from a...
Objective
To explore the accuracy of combined neurology expert forecasts in predicting primary endpoints for trials.
Methods
We identified one major randomized trial each in stroke, multiple sclerosis (MS), and amyotrophic lateral sclerosis (ALS) that was closing within 6 months. After recruiting a sample of neurology experts for each disease, we...
Forecasting the future is a notoriously difficult task. To overcome this challenge, state-of-the-art forecasting platforms are "hybridized", they gather forecasts from a crowd of humans, as well as one or more machine models. However, an open challenge remains in how to optimally combine forecasts from these pools into a single forecast. We propose...
Forecasting of geopolitical events is a notoriously difficult task, with experts failing to significantly outperform a random baseline across many types of forecasting events. One successful way to increase the performance of forecasting tasks is to turn to crowdsourcing: leveraging many forecasts from non-expert users. Simultaneously, advances in...
Psychologists typically measure beliefs and preferences using self-reports, whereas economists are much more likely to infer them from behavior. Prediction markets appear to be a victory for the economic approach, having yielded more accurate probability estimates than opinion polls or experts for a wide variety of events, all without ever asking f...
Preparatory document for working paper and conference presentation on IARPA HFC RCT-A training concept, development, expected effects and protocol results.
Preparatory document for working paper and conference presentation on IARPA HFC RCT-A training protocol results, causality implications and possible selection biases at work.
Accountability
pressures are a ubiquitous feature of social systems: virtually everyone must
answer to someone for something. Behavioral research has, however, warned that
accountability, specifically a focus on being responsible for outcomes, tends
to produce suboptimal judgments. We qualify this view by demonstrating the
long-term adaptive benefi...
e21011
Background: First line (1L) systemic combination (combo) therapies for treatment of metastatic melanoma (MM) include targeted combo therapies such as dabrafenib+trametinib (D+T) or vemurafenib+cobimetinib for patients (pts) with BRAF mutation (BRAF+), or immunotherapy combo ipilimumab+nivolumab (I+N) for pts irrespective of BRAF status. The...
e21003
Background: Guidelines for metastatic melanoma (MM) recommend targeted therapy combination (combo) for patients (pts) with BRAF mutation (BRAF+) and immunotherapy for pts irrespective of BRAF status. The study objective was to describe real world characteristics and treatment patterns among mm pts treated with either dabrafenib+trametinib (D...
We report the results of the first large-scale, long-term, experimental test between two crowdsourcing methods: prediction markets and prediction polls. More than 2,400 participants made forecasts on 261 events over two seasons of a geopolitical prediction tournament. Forecasters were randomly assigned to either prediction markets (continuous doubl...
Proper scoring rules can be used to incentivize a forecaster to truthfully report her private beliefs about the probabilities of future events and to evaluate the relative accuracy of forecasters. While standard scoring rules can score forecasts only once the associated events have been resolved, many applications would benefit from instant access...
Individuals often make decisions that affect groups, yet the propensities of group representatives are not as well understood than those of independent decision makers or deliberating groups. We ask how responsibility for group payoffs − in the absence of group deliberation − affects the choice. The experiment utilizes the Interdependent Security D...
This article extends psychological methods and concepts into a domain that is as profoundly consequential
as it is poorly understood: intelligence analysis. We report findings from a geopolitical forecasting
tournament that assessed the accuracy of more than 150,000 forecasts of 743 participants on 199 events
occurring over 2 years. Participants we...
We introduce a new method for converting individual probability estimates (obtained through surveys) into market orders for use in a Continuous Double Auction prediction market. Our Survey-Powered Market Agent (SPMA) algorithm is based on actual forecaster behavior, and offers notable advantages over existing market agent algorithms such as Zero In...
What are the barriers to voluntary take-up of high-deductible plans? We address this question using a large-scale employer survey conducted after an open-enrollment period in which a new high-deductible plan was first introduced. Only 3% of the employees chose this plan, despite the respondents’ recognition of its financial advantages. Employees wh...
The CAD triad hypothesis (Rozin, Lowery, Imada, & Haidt, 1999) stipulates that, cross-culturally, people feel anger for violations of autonomy, contempt for violations of community, and disgust for violations of divinity. Although the disgust-divinity link has received some measure of empirical support, the results have been difficult to interpret...
Five university-based research groups competed to recruit forecasters, elicit their predictions, and aggregate those predictions to assign the most accurate probabilities to events in a 2-year geopolitical forecasting tournament. Our group tested and found support for three psychological drivers of accuracy: training, teaming, and tracking. Probabi...
Hypothetical choice studies suggest that physicians often take more risk for themselves than on their patient's behalf.
To examine if physicians recommend more screening tests than they personally undergo in the real-world context of breast cancer screening.
Within-subjects survey.
A national sample of female obstetricians and gynecologists (N = 13...
We describe a hybrid forecasting method called marketcast. Marketcasts are based on bid and ask orders from prediction markets, aggregated using techniques associated with survey methods, rather than market matching algorithms. We discuss the process of conversion from market orders to probability estimates, and simple aggregation methods. The perf...
Claims of taste based discrimination are common but difficult to prove in the field. Furthermore, much of the research on discrimination focuses on evaluation. However, discriminatory patterns of competition, such as use of aggressive strategies based on opponents' gender, may produce similar discriminatory outcomes. We report evidence for discrimi...
The hypothesis that psychometric instruments incorporating local idioms of distress predict functional impairment in a non-Western, war-affected population above and beyond translations of already established instruments was tested. Exploratory factor analysis was conducted on the War-Related Psychological and Behavioral Problems section of the Pen...
Are we more inclined to take risks for ourselves than on someone else's behalf? Risk taking for self and others were compared across four studies. Study 1 was a meta-analysis of 28 effects from 18 studies. Overall, choices for others were significantly more risk-averse than choices for self. Two features of the choice environment moderated these ef...
We examined the effects of framing and perceived vulnerability on dishonest behavior in competitive environments. Participants were randomly matched into pairs and took a short multiple-choice test, the relative score of which determined their merit-based payoffs. After learning about the test scores, participants were asked to report them, thus af...
The majority of research in conflict management focuses on conflict resolution: the process of reaching a mutually beneficial solution for the negotiating parties. However, some negotiations impart substantial negative externalities onto third parties, so "conflict resolution" is socially suboptimal. Bribery is one such example: potential bribe-giv...
Despite the variance in methods and results in the literature, the current review finds reliable evidence that choices for others tend to be more risk-averse than choices for the self. I term the tendency to avoid risks for others more than for one’s self double risk aversion. The term correctly implies that there is an increase in aversion to risk...
To compare total costs and risk of hypoglycemia in patients with type 2 diabetes (T2D) initiated on NPH insulin versus glargine in a real-world setting.
This study used claims data (10/2001 to 06/2005) from a privately insured U.S. population of adult T2D patients who were initiated on NPH or glargine following a 6-month insulin-free period. A samp...
Compare treatment patterns for patients with schizophrenia treated with olanzapine versus quetiapine in the Pennsylvania Medicaid population.
Patients (18-64 years) with a diagnosis of schizophrenia (ICD-9-CM: 295.xx) and treated with olanzapine or quetiapine were identified from the Pennsylvania Medicaid claims database (1999-2003). Patients were...
Compare annual health-care costs and resource utilization associated with olanzapine versus quetiapine for treating schizophrenia in a Medicaid population.
Adult schizophrenia patients were selected from deidentified Pennsylvania Medicaid claims database (1999–2003). Included patients were continuously enrolled and initiated with olanzapine or quet...
Objective:
To determine and compare the cost utilities of the tumour necrosis factor (TNF) antagonists adalimumab and infliximab as maintenance therapies for patients in the US with moderately to severely active Crohn's disease.
Methods:
Maintenance regimens of adalimumab (40 mg every other week) and infliximab (5 mg/kg) were compared using prim...