Article

Wise teamwork: Collective confidence calibration predicts the effectiveness of group discussion

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

‘Crowd wisdom’ refers to the surprising accuracy that can be attained by averaging judgments from independent individuals. However, independence is unusual; people often discuss and collaborate in groups. When does group interaction improve vs. degrade judgment accuracy relative to averaging the group's initial, independent answers? Two large laboratory studies explored the effects of 969 face-to-face discussions on the judgment accuracy of 211 teams facing a range of numeric estimation problems from geographic distances to historical dates to stock prices. Although participants nearly always expected discussions to make their answers more accurate, the actual effects of group interaction on judgment accuracy were decidedly mixed. Importantly, a novel, group-level measure of collective confidence calibration robustly predicted when discussion helped or hurt accuracy relative to the group's initial independent estimates. When groups were collectively calibrated prior to discussion, with more accurate members being more confident in their own judgment and less accurate members less confident, subsequent group interactions were likelier to yield increased accuracy. We argue that collective calibration predicts improvement because groups typically listen to their most confident members. When confidence and knowledge are positively associated across group members, the group's most knowledgeable members are more likely to influence the group's answers.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In small groups, decision typically follows discussion, and during discussion individual confidence translates into influence. Communication thus threatens group accuracy, unless confidence correlates positively with individual sensitivity (Sorkin et al., 2001;Bahrami et al., 2012;Silver et al., 2021). 9 Although it seems a natural next step, we are not aware of similar studies that include the possibility of delegation. ...
... Experts' accuracy is higher than non-experts': average accuracy per bloc is 63% for experts (v/s 58% for non-experts) in treatments with 0.05 coherence, and 59% (v/s 55.5%) in treatments with 0.03 coherence. 41 The frequency of blocs with accuracy below 50% is non-negligible (18% for coherence of 0.05, just above 23% for coherence of 0.03) and, surprisingly, persists when we aggregate over a larger number of tasks. Averaging at the subject level over all 120 tasks, 9% of subjects have accuracy below randomness with coherence 0.05, and 12% with coherence 0.03. ...
... In Experiment 2, accuracy is at best a very weak predictor of participation in voting, never significant at conventional levels and with the wrong sign in the probit estimation, confirming the high uncertainty in subjects' evaluation of their own accuracy. 41 Recall that experts are selected on the basis of accuracy in the previous two blocs. Average accuracy of the top 20% of respondents in each bloc is higher, but past performance, the criterion we used to define experts, seems a more accurate criterion for receiving delegated votes than unobservable current performance. ...
... In small groups, decision typically follows discussion, and during discussion individual confidence translates into influence. Communication thus threatens group accuracy, unless confidence correlates positively with individual sensitivity (Sorkin et al., 2001;Bahrami et al., 2012;Silver et al., 2021). 9 Although it seems a natural next step, we are not aware of similar studies that include the possibility of delegation. ...
... Averaging at the subject level over all 120 tasks, 9% of subjects have accuracy below randomness with coherence 0.05, and 12% with coherence 0.03. 41 If we want to study voting and information aggregation when information may be faulty, perceptual tasks can provide a very useful tool. ...
... Average accuracy of the top 20% of respondents in each bloc is higher, but past performance, the criterion we used to define experts, seems a more accurate criterion for receiving delegated votes than unobservable current performance. 41 Individual subjects' accuracies show high variability across blocs, evidence of random noise in perceiving and recording the stimulus in the brain, as formalized in psychophysics research. 42 The figure is almost identical if frequencies are calculated over the full sample. ...
... In small groups, decision typically follows discussion, and during discussion individual confidence translates into influence. Communication thus threatens group accuracy, unless confidence correlates positively with individual sensitivity (Sorkin et al., 2001;Bahrami et al., 2012;Silver et al., 2021). 9 Although it seems a natural next step, we are not aware of similar studies that include the possibility of delegation. ...
... Experts' accuracy is higher than non-experts': average accuracy per bloc is 63% for experts (v/s 58% for non-experts) in treatments with 0.05 coherence, and 59% (v/s 55.5%) in treatments with 0.03 coherence. 41 The frequency of blocs with accuracy below 50% is non-negligible (18% for coherence of 0.05, just above 23% for coherence of 0.03) and, surprisingly, persists when we aggregate over a larger number of tasks. Averaging at the subject level over all 120 tasks, 9% of subjects have accuracy below randomness with coherence 0.05, and 12% with coherence 0.03. ...
... In Experiment 2, accuracy is at best a very weak predictor of participation in voting, never significant at conventional levels and with the wrong sign in the probit estimation, confirming the high uncertainty in subjects' evaluation of their own accuracy. 41 Recall that experts are selected on the basis of accuracy in the previous two blocs. Average accuracy of the top 20% of respondents in each bloc is higher, but past performance, the criterion we used to define experts, seems a more accurate criterion for receiving delegated votes than unobservable current performance. ...
Preprint
Full-text available
Liquid Democracy is a voting system touted as the golden medium between representative and direct democracy: decisions are taken by referendum, but voters can delegate their votes as they wish. The outcome can be superior to simple majority voting, but even when experts are correctly identified, delegation must be used sparely. We ran two very different experiments: one follows a tightly controlled lab design; the second is a perceptual task run online where the precision of information is ambiguous. In both experiments, delegation rates are high, and Liquid Democracy underperforms both universal voting and the simpler option of allowing abstention.
... The Delphi method, in which a panel of experts engage in structured communication to repeatedly revise their opinions from mutual feedback until a consen sus emerges, is a popular example 62 . Although available evidence remains equivocal about when and how social interaction can yield a 'bonus gain' in collective accu racy over statistical aggregation 39,[63][64][65][66][67][68][69][70][71] , research suggests that wellcrafted behavioural interventions may generate accurate and robust group forecasts beyond the wisdom of crowds effect that hinges on statistical cancellation of individual informational errors via aggregation. For instance, in a forecasting tournament, groups of research ers competed to find new ways to provide the most accurate forecasts about complicated geopolitical events around the globe. ...
... Such a selforganization process within the super forecaster teams seems to imply that a highlevel cog nitive division of labour emerged between the expert group members through tacit or explicit coordination 87 . In other words, confidenceweighted aggregation was not algebraically executed within the mind of an individual or an arbitrator during consensus decisionmaking 50,51,76,77 , but was behaviourally achieved on the basis of proper shared metacognition regarding who should contribute and who should just listen to information and when 69 . Such a mutually agreedupon cognitive division of labour 24,87 , along with proper tracking of the division's efficiency over time, is likely to be the key to achieving accurate and robust group consensus beyond the wisdom of crowds. ...
Article
Full-text available
In humans and other gregarious animals, collective decision-making is a robust behavioural feature of groups. Pooling individual information is also fundamental for modern societies, in which digital technologies have exponentially increased the interdependence of individual group members. In this Review, we selectively discuss the recent human and animal literature, focusing on cognitive and behavioural mechanisms that can yield collective intelligence beyond the wisdom of crowds. We distinguish between two group decision-making situations: consensus decision-making, in which a group consensus is required, and combined decision-making, in which a group consensus is not required. We show that in both group decision-making situations, cognitive and behavioural algorithms that capitalize on individual heterogeneity are the key for collective intelligence to emerge. These algorithms include accuracy or expertise-weighted aggregation of individual inputs and implicit or explicit coordination of cognition and behaviour towards division of labour. These mechanisms can be implemented either as ‘cognitive algebra’, executed mainly within the mind of an individual or by some arbitrating system, or as a dynamic behavioural aggregation through social interaction of individual group members. Finally, we discuss implications for collective decision-making in modern societies characterized by a fluid but auto-correlated flow of information and outline some future directions. Collective intelligence emerges in group decision-making, whether it requires a consensus or not. In this Review, Kameda et al. describe cognitive and behavioural algorithms that capitalize on individual heterogeneity to yield gains in decision-making accuracy beyond the wisdom of crowds. View-only file is available. https://rdcu.be/cL3QB
... For example, the post hoc analyses of a relationships comparison task in group size 30 revealed that the accuracy of group judgments in proportion "2: 3: 0" (.56) did not improve than that in proportion "3: 2: 0" (.56). As a recent study pointed out [27] , these ndings imply that to achieve the wisdom of crowds, it is important to not only consider how con dent individuals feel but also what they know (and do not know). ...
... In this situation, although a group judgment is A according to a simple majority rule, it is B according to a weighted-con dence majority rule because the sum of con dence ratings in B (280) is larger than that in A (70). In addition, some previous studies have considered the con dence threshold of accepting individuals' judgments [27], [30], [33], [34] : If a person's con dence is above a certain threshold, then their judgment is accepted, and vice versa. This procedure is repeated until all the group members' judgments are evaluated. ...
Preprint
Full-text available
In group judgments in a binary choice task, the judgments of individuals with low confidence (i.e., they feel that the judgment was not correct) may be regarded as unreliable. Previous studies have shown that aggregating individuals’ diverse judgments can lead to high accuracy in group judgments, a phenomenon known as the wisdom of crowds. Therefore, if low-confidence individuals make diverse judgments between individuals and the mean of accuracy of their judgments is above the chance level (.50), it is likely that they will not always decrease the accuracy of group judgments. To investigate this issue, the present study conducted behavioral experiments using binary choice inferential tasks, and computer simulations of group judgments by manipulating group sizes and individuals’ confidence levels. Results revealed that (I) judgment patterns were highly similar between individuals regardless of their confidence levels; (II) the low-confidence group could make judgments as accurate as the high-confidence group, as the group size increased; and (III) even if there were low-confidence individuals in a group, they generally did not inhibit group judgment accuracy. The results suggest the usefulness of low-confidence individuals’ judgments in a group and provide practical implications for real-world group judgments.
... We re-analyse six published datasets 3,5,8,34,37,39 . These experiments all follow the same basic paradigm described in the literature review above: participants answer a numeric question, such as "how many candies are in this photograph" or "what is the budget of the US Department of Defense?" Participants then engage in some kind of communication process such as discussion or mediated numeric exchange, before providing a final post-communication answer. ...
Preprint
Prior research offers mixed evidence on whether and when communication improves belief accuracy for numeric estimates. Experiments on one-to-one advice suggest that communication between peers usually benefits accuracy, while group experiments indicate that communication networks produce highly variable outcomes. Notably, it is possible for a group's average estimate to become less accurate even as its individual group members -- on average -- become more accurate. However, the conditions under which communication improves group and/or individual outcomes remain poorly characterised. We analyse an empirically supported model of opinion formation to derive these conditions, formally explicating the relationship between group-level effects and individual outcomes. We reanalyze previously published experimental data, finding that empirical dynamics are consistent with theoretical expectations. We show that 3 measures completely describe asymptotic opinion dynamics: the initial crowd bias; the degree of influence centralisation; and the correlation between influence and initial biases. We find analytic expressions for the change in crowd and individual accuracy as a function of the product of these three measures, which we describe as the truth alignment. We show how truth alignment can be decomposed into calibration (influence/accuracy correlation), and herding (influence/averageness correlation), and how these measures relate to changes in accuracy. Overall, we find that individuals can and usually do improve even when groups get worse.
... Because team coaching directly benefits employees and gives them the chance to improve their behaviors, deepen their knowledge, and develop their abilities, it helps business organizations achieve better results. [9]. Because of globalization and ongoing innovation, the business world is extremely competitive and concentrated today. ...
Article
Full-text available
It is impossible to overstate the necessity of a strategic and practical approach in the workplace in order to maximize productivity these days. Teamwork is one of the best ways to adapt to the changes that have occurred in today's environment throughout time. In every industry, the optimum performance arrangement for realizing visions, carrying out plans, and accomplishing objectives is teamwork. It is also one of the most crucial components of systems for continuous improvement since it makes information exchange, issue resolution, and the growth of employee accountability easier. Teams function as a grouping of people with complementary talents who work together rather than against one another. They are held accountable for their strategic methods and use them to achieve a shared objective. The Supervised Learning technique was used in this work to simulate team performance utilizing an intelligent coaching agent. Through the use of an automated performance assessment and weighted scores for each task, this study was able to create a system that will remove biases from performance evaluation. As soon as a worker does the task, they will obtain a score. The purpose of this study was to demonstrate an event-based performance approach by developing and utilizing an intelligent coaching agent in a supervised learning team training framework. The goal was successfully met, and the result shows positive impacts on the team's performance.
... Discussion is an effective teaching and learning method. The past studies have discovered that the quality of the discussions may differ based on what educational activity comes before them [48]. The results echo that the quality of the discussions improved when watching the movie and self-reflection preceded them due to the positive effect active learning had on enhancing empathy. ...
Article
Full-text available
Background Medical students’ empathy toward patients with Alzheimer’s is rarely found in formal medical curricula. Based on Vygotsky’s theory, watching films and reflection can be considered as effective methods to improve empathy. The present study aimed to explore medical students’ perceptions of empathy toward patients with Alzheimer after participating in an educational program by using interactive video based on Vygotsky’s theory. Methods This qualitative study was conducted at Tehran University of Medical Sciences in 2022. The population included all 40 medical students. Firstly, the Still Alice movie which is about the feelings of a professor who was diagnosed with Alzheimer’s disease was shown to the students. Secondly, the students reflected on their experiences of watching the movie. Thirdly, a session was held for group discussion on the subject of the movie, the patient’s feelings, the doctor’s attitude, the social environment surrounding the patient shown in the movie, and the necessity of empathy toward patients with Alzheimer’s disease. The reflection papers were analyzed using the conventional qualitative content analysis method. Results After analyzing 216 codes from 38 reflection papers, four categories, including communication with a patient with Alzheimer’s, understanding the patient with Alzheimer’s as a whole, medical science development, and the student’s individual ideology, were extracted. Conclusion Reflection and group discussion after watching movie by providing opportunities for social interaction about personal interpretations will lead to active role in enhancing empathy. Based on the perceptions of the medical students, they gained a perspective to consider the patient as a whole and pay attention to establishing a proper relationship with the patient.
... We must distinguish what each one of us feels-knows about a proof's correctness from how we collectively ascertain "objectively" that a proof is correct; here, we only suggest that the "objectivity" of the correctness of a proof could be the result of a robust intersubjectivity: a strong feeling of correctness, shared by many, e.g., as the result of group discussions. There are, however, issues regarding the effectiveness of group discussions (see, e.g., Silver, Mellers, Tetlock 2021;Bang, Frith 2017); in this way, there is no simple answer. ...
Chapter
Full-text available
We present an approach in which ancient Greek mathematical proofs by Hippocrates of Chios and Euclid are addressed as a form of (guided) intentional reasoning. Schematically, in a proof, we start with a sentence that works as a premise; this sentence is followed by another, the conclusion of what we might take to be an inferential step. That goes on until the last conclusion is reached. Guided by the text, we go through small inferential steps; in each one, we go through an autonomous reasoning process linking the premise to the conclusion. The reasoning process is accompanied by a metareasoning process. Metareasoning gives rise to a feeling-knowing of correctness. In each step/cycle of the proof, we have a feeling-knowing of correctness. Overall, we reach a feeling of correctness for the whole proof. We suggest that this approach allows us to address the issues of how a proof functions, for us, as an enabler to ascertain the correctness of its argument and how we ascertain this correctness.
... Saling percaya, komunikasi terbuka, dan komitmen bersama adalah semua elemen yang mempengaruhi kerja sama tim. Dalam ( (Letsoin & Ratnasari, 2020) Keakuratan interaksi kerja tim selanjutnya lebih mungkin meningkat ketika tim dikalibrasi secara kolektif sebelum diskusi, dengan anggota yang lebih akurat memiliki kepercayaan diri yang lebih besar dalam penilaian mereka sendiri dan anggota yang kurang akurat memiliki kepercayaan diri yang lebih rendah.Dalam ((Silver et al., 2021) Selain itu, ditemukan bahwa kerjasama tim memiliki dampak yang menguntungkan dan signifikan secara statistik terhadap komitmen organisasi. Akhirnya, hasil menunjukkan bahwa komitmen organisasi meningkat secara signifikan melalui pelatihan karyawan.Dalam ( (Hanaysha, 2016) Mereka tampaknya telah mengasah kemampuan sosial dan emosional yang diperlukan untuk kerja sama tim yang produktif sepanjang gelar mereka.Dalam ( (Hastie & Barclay, 2021) Selain mendorong orang untuk memilih hal-hal yang bermanfaat untuk dilakukan, motivasi juga mempengaruhi bagaimana suatu pekerjaan akan dilaksanakan. ...
Article
Full-text available
Penelitian ini bertujuan untuk mengetahui pengaruh kepemimpinan dan lingkungan kerja terhadap kepuasan kerja karyawan. Sampel penelitian ini sebanyak 45 responden. Data penelitian ini dikumpulkan melalui kuesioner yang diproses dan dianalisis dengan menggunakan IBM Statistik 22 for Windows. Metode pengambilan sampel penelitian ini menggunakan metode pengambilan sampel acak bertingkat proporsional dimana Teknik ini digunakan Ketika suatu populasi tidak memiliki individua atau kelompok yang homogen dan terstratifikasi dengan tepat. Uji kualitas data yang digunakan adalah uji validitas dengan menggunakan Corrected Item Total dan uji reabilitas menggunakan Cronbach Alpha. Untuk uji hipotesis dalam penelitian ini menggunakan uji t, uji F dan uji determinasi. Hasil penelitian ini menunjukkan bahwa terdapat pengaruh positif variabel kepemimpinan terhadap kepuasan kerja karyawan yang ditunjukkan thitung 593> ttabel 1,68 dengan nilai signifikan 0.556< 0,05, terdapat pengaruh positif variabel lingkungan kerja terhadap variabel kepuasan kerja karyawan yang ditunjukkan thitung .053< ttabel 1,68 dengan nilai signifikan .958< 0,05, dan untuk kepemimpinan dan lingkungan kerja mempunyai pengaruh signifikan terhadap kepuasan kerja karyawan dengan nilai Fhitung (2.130) < Ftabel (3.20) dengan tingkat signifikasi 132. Selanjutnya nilai R Square yaitu sebesar 0.92atau 9% yang artinya pengaruh kepemimpinan dan lingkungan kerja terhadap kepuasan kerja karyawan.
... Our results are also broadly compatible with those of the Good Judgement Project in geopolitical forecasting [52], which found that those who worked in teams, discussed and debated evidence and exchanged rationales were more accurate than those who worked alone. More recent studies provide further evidence that group interaction and discussion improve the accuracy of individuals [53], particularly when structured in small, independent groups [54], and when groups were already collectively well-calibrated (i.e. more accurate people were more confident, and less accurate people were less confident going into discussion) [55]. Under these conditions, the most knowledgeable (and confident) people were more likely to influence the answers of the less knowledgeable people in the group. ...
Article
Full-text available
This paper explores judgements about the replicability of social and behavioural sciences research and what drives those judgements. Using a mixed methods approach, it draws on qualitative and quantitative data elicited from groups using a structured approach called the IDEA protocol (‘investigate’, ‘discuss’, ‘estimate’ and ‘aggregate’). Five groups of five people with relevant domain expertise evaluated 25 research claims that were subject to at least one replication study. Participants assessed the probability that each of the 25 research claims would replicate (i.e. that a replication study would find a statistically significant result in the same direction as the original study) and described the reasoning behind those judgements. We quantitatively analysed possible correlates of predictive accuracy, including self-rated expertise and updating of judgements after feedback and discussion. We qualitatively analysed the reasoning data to explore the cues, heuristics and patterns of reasoning used by participants. Participants achieved 84% classification accuracy in predicting replicability. Those who engaged in a greater breadth of reasoning provided more accurate replicability judgements. Some reasons were more commonly invoked by more accurate participants, such as ‘effect size’ and ‘reputation’ (e.g. of the field of research). There was also some evidence of a relationship between statistical literacy and accuracy.
... Discussion offers the potential to improve group performance by resolving misunderstanding of the question, providing opportunities for people to introduce new information and learn from each other (Mojzisch and Schulz-Hardt, 2010), encouraging critical thinking (Postmes et al., 2001), and encouraging counterfactual reasoning (Galinsky and Kray, 2003). A study with 211 teams engaged in 969 faceto-face discussions (Silver et al., 2021) found that group interaction improved group accuracy, but only when groups were already collectively well calibrated (i.e. more accurate people were more confident, and less accurate people were less confident going into discussion). Under these conditions, the most knowledgeable (and confident) people were more likely to influence the answers of the less knowledgeable people in the group. ...
Chapter
Full-text available
There are severe problems with the decision-making processes currently widely used, leading to ineffective use of evidence, faulty decisions, wasting of resources and the erosion of public and political support. In this book an international team of experts provide solutions. The transformation suggested includes rethinking how evidence is assessed, combined, communicated and used in decision-making; using effective methods when asking experts to make judgements (i.e. avoiding just asking an expert or a group of experts!); using a structured process for making decisions that incorporate the evidence and having effective processes for learning from actions. In each case, the specific problem with decision making is described with a range of practical solutions. Adopting this approach to decision-making requires societal change so detailed suggestions are made for transforming organisations, governments, businesses, funders and philanthropists. The practical suggestions include twelve downloadable checklists. The vision of the authors is to transform conservation so it is more effective, more cost-efficient, learns from practice and is more attractive to funders. However, the lessons of this important book go well beyond conservation to decision-makers in any field.
... These results are very relevant for studies on forecasting and researchers in forecasting have highlighted similar effects in forecasting tasks. Human forecasting is improved when forecasters can benefit from each others' estimations, arguments, evidence, and signals of confidence [18][19][20][21]. More recently, research in collective intelligence expanded their focus of analysis from teams to networks of problem-solvers [22][23][24][25]. ...
Article
Full-text available
As artificial intelligence becomes ubiquitous in our lives, so do the opportunities to combine machine and human intelligence to obtain more accurate and more resilient prediction models across a wide range of domains. Hybrid intelligence can be designed in many ways, depending on the role of the human and the algorithm in the hybrid system. This paper offers a brief taxonomy of hybrid intelligence, which describes possible relationships between human and machine intelligence for robust forecasting. In this taxonomy, biological intelligence represents one axis of variation, going from individual intelligence (one individual in isolation) to collective intelligence (several connected individuals). The second axis of variation represents increasingly sophisticated algorithms that can take into account more aspects of the forecasting system, from information to task to human problem-solvers. The novelty of the paper lies in the interpretation of recent studies in hybrid intelligence as precursors of a set of algorithms that are expected to be more prominent in the future. These algorithms promise to increase hybrid system’s resilience across a wide range of human errors and biases thanks to greater human-machine understanding. This work ends with a short overview for future research in this field.
Preprint
Full-text available
Understanding the ability to self-evaluate decisions is an active area of research. This research has primarily focused on the neural correlates of self-evaluation during visual-tasks, and whether pre- or post-decisional neural correlates capture subjective confidence in that decision. This focus has been useful, yet also precludes an investigation of key every-day features of metacognitive self-evaluation: that decisions are rapid, must be evaluated without explicit feedback, and unfold in a multisensory world. These considerations lead us to hypothesise that an automatic domain-general metacognitive signal may be shared between sensory modalities, which we tested in the present study with multivariate decoding of electroencephalographic (EEG) data. Participants (N=21, 12 female) first performed a visual task with no request for self-evaluations of performance, prior to an auditory task that included rating decision confidence on each trial. A multivariate classifier trained to predict errors in the speeded visual-task generalised to predict errors in the subsequent non-speeded auditory discrimination. This generalisation was unique to classifiers trained on the visual response-locked data, and further predicted subjective confidence on the subsequent auditory task. This evidence of overlapping neural activity across the two tasks provides evidence for automatic encoding of confidence independent of any explicit request for metacognitive reports, and a shared basis for metacognitive evaluations across sensory modalities.
Preprint
Full-text available
There is theoretical and practical interest in characterising the factors that affect the use of advice when making decisions. Here we investigated how the timing of advice affects its utilisation. We conducted three experiments to compare the integration of advice shown before vs. after participants had the chance to evaluate evidence relevant to a decision for themselves. We used a perceptual discrimination task in a judge-advisor system, allowing careful control over both participants' task performance and the task structure across conditions except for the timing of advice. Across all experiments, we found that advice provided after stimulus presentation was agreed with more, and influenced participants' judgements to a greater extent, than advice provided beforehand. In Experiment 1, we observed this tendency to hold when advice varied in accuracy and, in Experiment 2, across variations in task difficulty. Experiment 2 also revealed participants' preference for post-stimulus advice when they were given choice over when to receive advice. In Experiment 3, we found greater influence of post-stimulus advice to hold both for binary decisions and continuous estimations. These results provide interesting implications for research on the mechanisms of advice integration.
Article
Full-text available
Combining experts’ subjective probability estimates is a fundamental task with broad applicability in domains ranging from finance to public health. However, it is still an open question how to combine such estimates optimally. Since the beta distribution is a common choice for modeling uncertainty about probabilities, here we propose a family of normative Bayesian models for aggregating probability estimates based on beta distributions. We systematically derive and compare different variants, including hierarchical and non-hierarchical as well as asymmetric and symmetric beta fusion models. Using these models, we show how the beta calibration function naturally arises in this normative framework and how it is related to the widely used Linear-in-Log-Odds calibration function. For evaluation, we provide the new Knowledge Test Confidence data set consisting of subjective probability estimates of 85 forecasters on 180 queries. On this and another data set, we show that the hierarchical symmetric beta fusion model performs best of all beta fusion models and outperforms related Bayesian fusion models in terms of mean absolute error.
Article
Research on clinical versus statistical prediction has demonstrated that algorithms make more accurate predictions than humans in many domains. Geopolitical forecasting is an algorithm-unfriendly domain, with hard-to-quantify data and elusive reference classes that make predictive model-building difficult. Furthermore, the stakes can be high, with missed forecasts leading to mass-casualty consequences. For these reasons, geopolitical forecasting is typically done by humans, even though algorithms play important roles. They are essential as aggregators of crowd wisdom, as frameworks to partition human forecasting variance, and as inputs to hybrid forecasting models. Algorithms are extremely important in this domain. We doubt that humans will relinquish control to algorithms anytime soon-nor do we think they should. However, the accuracy of forecasts will greatly improve if humans are aided by algorithms.
Article
Identifying successful approaches for reducing the belief and spread of online misinformation is of great importance. Social media companies currently rely largely on professional fact-checking as their primary mechanism for identifying falsehoods. However, professional fact-checking has notable limitations regarding coverage and speed. In this article, we summarize research suggesting that the "wisdom of crowds" can be harnessed successfully to help identify misinformation at scale. Despite potential concerns about the abilities of laypeople to assess information quality, recent evidence demonstrates that aggregating judgments of groups of laypeople, or crowds, can effectively identify low-quality news sources and inaccurate news posts: Crowd ratings are strongly correlated with fact-checker ratings across a variety of studies using different designs, stimulus sets, and subject pools. We connect these experimental findings with recent attempts to deploy crowdsourced fact-checking in the field, and we close with recommendations and future directions for translating crowdsourced ratings into effective interventions.
Article
The influence of behavioral biases on aggregate outcomes depends in part on self-selection: whether rational people opt more strongly into aggregate interactions than biased individuals. In betting market, auction and committee experiments, we document that some errors are strongly reduced through self-selection, while others are not affected at all or even amplified. A large part of this variation is explained by differences in the relationship between confidence and performance. In some tasks, they are positively correlated, such that self-selection attenuates errors. In other tasks, rational and biased people are equally confident, such that self-selection has no effects on aggregate quantities. (JEL C91, D44, D91)
Article
Full-text available
When people have to solve many tasks, they can aggregate diverse individuals’ judgments using the majority rule, which often improves the accuracy of judgments (wisdom of crowds). When aggregating judgments, individuals’ subjective confidence is a useful cue for deciding which judgments to accept. However, can confidence in one task set predict performance not only in the same task set, but also in another? We examined this issue through computer simulations using behavioral data obtained from binary-choice experimental tasks. In our simulations, we developed a “training-test” approach: We split the questions used in the behavioral experiments into “training questions” (as questions to identify individuals’ confidence levels) and “test questions” (as questions to be solved), similar to the cross-validation method in machine learning. We found that (i) through analyses of behavioral data, confidence in a certain question could predict accuracy in the same question, but not always well in another question. (ii) Through a computer simulation for the accordance of two individuals’ judgments, individuals with high confidence in one training question tended to make less diverse judgments in other test questions. (iii) Through a computer simulation of group judgments, the groups constructed from individuals with high confidence in the training question(s) generally performed well; however, their performance sometimes largely decreased in the test questions especially when only one training question was available. These results suggest that when situations are highly uncertain, an effective strategy is to aggregate various individuals regardless of confidence levels in the training questions to avoid decreasing the group accuracy in test questions. We believe that our simulations, which follow a “training-test” approach, provide practical implications in terms of retaining groups’ ability to solve many tasks.
Article
Full-text available
The relative susceptibility of individuals and groups to systematic judgmental biases is considered. An overview of the relevant empirical literature reveals no clear or general pattern. However, a theoretical analysis employing J. H. Davis’s (1973) social decision scheme (SDS) model reveals that the relative magnitude of individual and group bias depends upon several factors, including group size, initial individual judgment, the magnitude of bias among individuals, the type of bias, and most of all, the group-judgment process. It is concluded that there can be no simple answer to the question, “Which are more biased, individuals or groups?,” but the SDS model offers a framework for specifying some of the conditions under which individuals are both more and less biased than groups.
Article
Full-text available
The aggregation of many independent estimates can outperform the most accurate individual judgement1–3. This centenarian finding1,2, popularly known as the 'wisdom of crowds'3, has been applied to problems ranging from the diagnosis of cancer4 to financial forecasting5. It is widely believed that social influence undermines collective wisdom by reducing the diversity of opinions within the crowd. Here, we show that if a large crowd is structured in small independent groups, deliberation and social influence within groups improve the crowd’s collective accuracy. We asked a live crowd (N = 5,180) to respond to general-knowledge questions (for example, "What is the height of the Eiffel Tower?"). Participants first answered individually, then deliberated and made consensus decisions in groups of five, and finally provided revised individual estimates. We found that averaging consensus decisions was substantially more accurate than aggregating the initial independent opinions. Remarkably, combining as few as four consensus choices outperformed the wisdom of thousands of individuals. The collective wisdom of crowds often provides better answers to problems than individual judgements. Here, a large experiment that split a crowd into many small deliberative groups produced better estimates than the average of all answers in the crowd.
Article
Full-text available
Significance Collective intelligence is considered to be one of the most promising approaches to improve decision making. However, up to now, little is known about the conditions underlying the emergence of collective intelligence in real-world contexts. Focusing on two key areas of medical diagnostics (breast and skin cancer detection), we here show that similarity in doctors’ accuracy is a key factor underlying the emergence of collective intelligence in these contexts. This result paves the way for innovative and more effective approaches to decision making in medical diagnostics and beyond, and to the scientific analyses of those approaches.
Article
Full-text available
This paper examines how consumers forecast their future spare money or “financial slack.” While consumers generally think that both their income and expenses will rise in the future, they underweight the extent to which their expected expenses will cut into their spare money, a phenomenon we term “expense neglect.” We test and rule out several possible explanations, and conclude that expense neglect is due in part to insufficient attention towards expectations about future expenses compared to future income. “Tightwad” consumers who are chronically attuned to expenses show less severe expense neglect than “spendthrifts” who are not. We further find that expectations regarding changes in income (and not changes in expenses) predict the Michigan Index of Consumer Sentiments—a leading macro-economic indicator. Finally, we conduct a meta-analysis of our entire file-drawer (27 studies, 8,418 participants) and find that, across studies, participants place 2.9 times the weight on income change as they do on expense change when forecasting changes in their financial slack, and that expense neglect is stronger for distant than near future forecasts.
Article
Full-text available
Errors in estimating and forecasting often result from the failure to collect and consider enough relevant informa-tion. We examine whether attributes associated with persistence in information acquisition can predict performance in an estimation task. We focus on actively open-minded thinking (AOT), need for cognition, grit, and the tendency to maximize or satisfice when making decisions. In three studies, participants made estimates and predictions of uncertain quantities, with varying levels of control over the amount of information they could collect before estimating. Only AOT predicted performance. This relationship was mediated by information acquisition: AOT predicted the tendency to collect information, and information acquisition predicted performance. To the extent that available information is predictive of future outcomes, actively open-minded thinkers are more likely than others to make accurate forecasts.
Article
Full-text available
Social psychologists have long recognized the power of statisticized groups. When individual judgments about some fact (e.g., the unemployment rate for next quarter) are averaged together, the average opinion is typically more accurate than most of the individual estimates, a pattern often referred to as the wisdom of crowds. The accuracy of averaging also often exceeds that of the individual perceived as most knowledgeable in the group. However, neither averaging nor relying on a single judge is a robust strategy; each performs well in some settings and poorly in others. As an alternative, we introduce the select-crowd strategy, which ranks judges based on a cue to ability (e.g., the accuracy of several recent judgments) and averages the opinions of the top judges, such as the top 5. Through both simulation and an analysis of 90 archival data sets, we show that select crowds of 5 knowledgeable judges yield very accurate judgments across a wide range of possible settings-the strategy is both accurate and robust. Following this, we examine how people prefer to use information from a crowd. Previous research suggests that people are distrustful of crowds and of mechanical processes such as averaging. We show in 3 experiments that, as expected, people are drawn to experts and dislike crowd averages-but, critically, they view the select-crowd strategy favorably and are willing to use it. The select-crowd strategy is thus accurate, robust, and appealing as a mechanism for helping individuals tap collective wisdom. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Article
Full-text available
Numerous studies and anecdotes demonstrate the "wisdom of the crowd," the surprising accuracy of a group's aggregated judgments. Less is known, however, about the generality of crowd wisdom. For example, are crowds wise even if their members have systematic judgmental biases, or can influence each other before members render their judgments? If so, are there situations in which we can expect a crowd to be less accurate than skilled individuals? We provide a precise but general definition of crowd wisdom: A crowd is wise if a linear aggregate, for example a mean, of its members' judgments is closer to the target value than a randomly, but not necessarily uniformly, sampled member of the crowd. Building on this definition, we develop a theoretical framework for examining, a priori, when and to what degree a crowd will be wise. We systematically investigate the boundary conditions for crowd wisdom within this framework and determine conditions under which the accuracy advantage for crowds is maximized. Our results demonstrate that crowd wisdom is highly robust: Even if judgments are biased and correlated, one would need to nearly deterministically select only a highly skilled judge before an individual's judgment could be expected to be more accurate than a simple averaging of the crowd. Our results also provide an accuracy rationale behind the need for diversity of judgments among group members. Contrary to folk explanations of crowd wisdom which hold that judgments should ideally be independent so that errors cancel out, we find that crowd wisdom is maximized when judgments systematically differ as much as possible. We re-analyze data from two published studies that confirm our theoretical results.
Article
Full-text available
Five university-based research groups competed to recruit forecasters, elicit their predictions, and aggregate those predictions to assign the most accurate probabilities to events in a 2-year geopolitical forecasting tournament. Our group tested and found support for three psychological drivers of accuracy: training, teaming, and tracking. Probability training corrected cognitive biases, encouraged forecasters to use reference classes, and provided forecasters with heuristics, such as averaging when multiple estimates were available. Teaming allowed forecasters to share information and discuss the rationales behind their beliefs. Tracking placed the highest performers (top 2% from Year 1) in elite teams that worked together. Results showed that probability training, team collaboration, and tracking improved both calibration and resolution. Forecasting is often viewed as a statistical problem, but forecasts can be improved with behavioral interventions. Training, teaming, and tracking are psychological interventions that dramatically increased the accuracy of forecasts. Statistical algorithms (reported elsewhere) improved the accuracy of the aggregation. Putting both statistics and psychology to work produced the best forecasts 2 years in a row.
Article
Full-text available
The present experiments examined several strategies designed to reduce interval overconfidence in group judgments. Results consistently indicated that 3–4-person nominal groups (whose members made independent judgments and later combined the highest and lowest of these estimates into a single confidence interval) were better calibrated than individual judges and interactive groups. This pattern held even when participants were directly instructed to expand their interval estimates, or when interactive groups appointed a devil's advocate or explicitly considered reasons why their interval estimates might be too narrow. Interactive groups did not perform substantially better than individuals, although participants frequently had the impression that group judgments were far superior to individual judgments. This misperception resembles the “illusion of group effectivity” found in brainstorming research.
Article
Full-text available
Examined the quality of group judgment in situations in which groups have to express an opinion in quantitative form. To provide a measure for evaluating the quality of group performance (which is defined as the absolute value of the discrepancy between the judgment and the true value), 4 baseline models are considered. These models provide a standard for evaluating how well groups perform. The 4 models are: (a) randomly picking a single individual; (b) weighting the judgments of the individual group members equally (the group mean); (c) weighting the 'best' group member (i.e., the one closest to the true value) totally where the best is known, a priori , with certainty; (d) weighting the best member totally where there is a given probability of misidentifying the best and getting the 2nd, 3rd, etc, best member. These 4 models are examined under varying conditions of group size and "bias." Bias is defined as the degree to which the expectation of the population of individual judgments does not equal the true value (i.e., there is systematic bias in individual judgments). A method is then developed to evaluate the accuracy of group judgment in terms of the 4 models. The method uses a Bayesian approach by estimating the probability that the accuracy of actual group judgment could have come from distributions generated by the 4 models. (25 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Investigated the influence of counterfactual reasoning on accuracy when predicting the outcomes of future personal events. 260 graduate business students made predictions about the results of their job search efforts 9 mo away (e.g., starting salary); all of the events involved positive outcomes, in which unrealistic optimism was expected. These events were constructed to vary in their underlying base rate of occurrence. Some Ss generated pro and/or con reasons concerning event occurrence before making their predictions. At low- to moderate-base rates, predictive accuracy increased when Ss generated a con reason. However, at high-base rates (events that occurred for a majority of the Ss), con reason generation had no effect on accuracy—all Ss were more accurate in predicting these events. Generation of pro reasons had no effect on accuracy, suggesting that Ss may have automatically generated supportive reasons as a by-product of the question-answering process. A substantive analysis of the reasons indicated that Ss attributed pro reasons to internal factors and con reasons to external factors. Moreover, Ss who generated internal pro reasons were less accurate than Ss generating external pro or either type of con reason. (60 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
When financial columnist James Surowiecki wrote The Wisdom of Crowds, he wished to explain the successes and failures of markets (an example of a "crowd") and to understand why the average opinion of a crowd is frequently more accurate than the opinions of most of its individual members. In this expanded review of the book, Scott Armstrong asks a question of immediate relevance to forecasters: Are the traditional face-to-face meetings an effective way to elicit forecasts from forecast crowds (i.e. teams)? Armstrong doesn't believe so. Quite the contrary, he explains why he considers face-to-face meetings a detriment to good forecasting practice, and he proposes several alternatives that have been tried successfully.
Article
Full-text available
Is it possible to increase one's influence simply by behaving more confidently? Prior research presents two competing hypotheses: (1) the confidence heuristic holds that more confidence increases credibility, and (2) the calibration hypothesis asserts that overconfidence will backfire when others find out. Study 1 reveals that, consistent with the calibration hypothesis, while accurate advisors benefit from displaying confidence, confident but inaccurate advisors received low credibility ratings. However, Study 2 shows that when feedback on advisor accuracy is unavailable or costly, confident advisors hold sway regardless of accuracy. People also made less effort to determine the accuracy of confident advisors; interest in buying advisor performance data decreased as the advisor’s confidence went up. These results add to our understanding of how advisor confidence, accuracy, and calibration influence others.
Article
Full-text available
This research tests the hypothesis of Yates et al. (1996) that people prefer judgment producers who make extreme confidence judgments. In each of three experiments, college students evaluated two fictional financial advisors who judged the likelihood that each of several stocks would increase in value. One of the advisors (the moderate advisor) was reasonably well calibrated and the other (the extreme advisor) was overconfident. In all three experiments, participants tended to prefer the extreme advisor. Experiments 2 and 3 showed that the advisors' confidence influenced participants' perception of their knowledge, and Experiment 3 showed that it influenced their perception of the number of categorically correct judgments they made. Both of these variables were, in turn, related to participants' preferences. Experiment 3 also suggested that need for cognition and right-wing authoritarianism are positively related to preference for the extreme advisor. A quantitative model is presented, which captures the basic pattern of results. This model includes the assumption that people use a confidence heuristic; they assume that a more confident advisor makes more categorically correct judgments and is more knowledgeable. Copyright © 2004 John Wiley & Sons, Ltd.
Article
Full-text available
Prior research has suggested that most people are seriously overconfident in their answers to general knowledge questions. We attempted to reduce over-confidence in each of two separate experiments. In Experiment 1 half of the subjects answered five practice questions which appeared to be difficult. The remaining subjects answered practice problems which appeared to be easy but were actually just as difficult as the other group's practice questions. Within each of these two groups, half of the subjects received feedback on the accuracy of their answers to the practice questions, while the other half received no feedback. All four groups then answered 30 additional questions and indicated their confidence in these answers. The group which had received five apparently “easy” practice questions and then had been given feedback on the accuracy of their answers was underconfident on the final 30 questions. In Experiment 2 subjects who anticipated a group discussion of their answers to general knowledge questions took longer to answer the questions and expressed less overconfidence in their answers than did a control group.
Article
Full-text available
The relative susceptibility of individuals and groups to systematic judgmental biases is considered. An overview of the relevant empirical literature reveals no clear or general pattern. However, a theoretical analysis employing J. H. Davis's (1973) social decision scheme (SDS) model reveals that the relative magnitude of individual and group bias depends upon several factors, including group size, initial individual judgment, the magnitude of bias among individuals, the type of bias, and most of all, the group-judgment process. It is concluded that there can be no simple answer to the question, "Which are more biased, individuals or groups?," but the SDS model offers a framework for specifying some of the conditions under which individuals are both more and less biased than groups. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Studied 4-member decision-making groups given information about 3 hypothetical candidates for student body president in unshared/consensus or shared or unshared/conflict conditions. 84 undergraduates participated in the unshared consensus condition, and 72 undergraduates participated in the other conditions. Results show that even though groups could have produced unbiased composites of the candidates through discussion, they decided in favor of the candidate initially preferred by a plurality rather than the most favorable candidate. Group members' pre- and postdiscussion recall of candidate attributes indicated that discussion tended to perpetuate, not to correct, members' distorted pictures of the candidates. It is suggested that unstructured discussion in the face of a consensus requirement may fail as a means of combining unique informational resources. (16 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Meeting of Minds The performance of humans across a range of different kinds of cognitive tasks has been encapsulated as a common statistical factor called g or general intelligence factor. What intelligence actually is, is unclear and hotly debated, yet there is a reproducible association of g with performance outcomes, such as income and academic achievement. Woolley et al. (p. 686 , published online 30 September) report a psychometric methodology for quantifying a factor termed “collective intelligence” ( c ), which reflects how well groups perform on a similarly diverse set of group problem-solving tasks. The primary contributors to c appear to be the g factors of the group members, along with a propensity toward social sensitivity—in essence, how well individuals work with others.
Article
Full-text available
Herding is a form of convergent social behaviour that can be broadly defined as the alignment of the thoughts or behaviours of individuals in a group (herd) through local interaction and without centralized coordination. We suggest that herding has a broad application, from intellectual fashion to mob violence; and that understanding herding is particularly pertinent in an increasingly interconnected world. An integrated approach to herding is proposed, describing two key issues: mechanisms of transmission of thoughts or behaviour between agents, and patterns of connections between agents. We show how bringing together the diverse, often disconnected, theoretical and methodological approaches illuminates the applicability of herding to many domains of cognition and suggest that cognitive neuroscience offers a novel approach to its study.
Article
Full-text available
Consumer knowledge is seldom complete or errorless. Therefore, the self-assessed validity of knowledge and consequent knowledge calibration (i.e., the correspondence between self-assessed and actual validity) is an important issue for the study of consumer decision making. In this article we describe methods and models used in calibration research. We then review a wide variety of empirical results indicating that high levels of calibration are achieved rarely, moderate levels that include some degree of systematic bias are the norm, and confidence and accuracy are sometimes completely uncorrelated. Finally, we examine the explanations of miscalibration and offer suggestions for future research. Copyright 2000 by the University of Chicago.
Article
Full-text available
When students answer an in-class conceptual question individually using clickers, discuss it with their neighbors, and then revote on the same question, the percentage of correct answers typically increases. This outcome could result from gains in understanding during discussion, or simply from peer influence of knowledgeable students on their neighbors. To distinguish between these alternatives in an undergraduate genetics course, we followed the above exercise with a second, similar (isomorphic) question on the same concept that students answered individually. Our results indicate that peer discussion enhances understanding, even when none of the students in a discussion group originally knows the correct answer.
Article
Full-text available
This article reviews the now extensive research literature addressing the impact of accountability on a wide range of social judgments and choices. It focuses on 4 issues: (a) What impact do various accountability ground rules have on thoughts, feelings, and action? (b) Under what conditions will accountability attenuate, have no effect on, or amplify cognitive biases? (c) Does accountability alter how people think or merely what people say they think? and (d) What goals do accountable decision makers seek to achieve? In addition, this review explores the broader implications of accountability research. It highlights the utility of treating thought as a process of internalized dialogue; the importance of documenting social and institutional boundary conditions on putative cognitive biases; and the potential to craft empirical answers to such applied problems as how to structure accountability relationships in organizations.
Article
Full-text available
The authors present a reconciliation of 3 distinct ways in which the research literature has defined overconfidence: (a) overestimation of one's actual performance, (b) overplacement of one's performance relative to others, and (c) excessive precision in one's beliefs. Experimental evidence shows that reversals of the first 2 (apparent underconfidence), when they occur, tend to be on different types of tasks. On difficult tasks, people overestimate their actual performances but also mistakenly believe that they are worse than others; on easy tasks, people underestimate their actual performances but mistakenly believe they are better than others. The authors offer a straightforward theory that can explain these inconsistencies. Overprecision appears to be more persistent than either of the other 2 types of overconfidence, but its presence reduces the magnitude of both overestimation and overplacement.
Chapter
People like to think well of themselves. Most endorse high levels of self-esteem (Greenwald 1980; Baumeister, Tice, and Hutton 1989). They believe they will experience more good outcomes and fewer bad outcomes than similar others (Weinstein 1980). They see themselves as more ethical, more productive, more charitable, and simply better on just about any socially desirable outcome (Alicke 1985). They often attribute their successes to strong skills rather than good luck and their failures to bad luck rather than weak skills (Cohen 1964; Weiner et al. 1971). Finally, they believe they have more influence over chance than reality dictates (Langer 1975; Langer and Roth 1975).
Article
The wisdom of the crowd refers to the finding that judgments aggregated over individuals are typically more accurate than the average individual’s judgment. Here, we examine the potential for improving crowd judgments by allowing individuals to choose which of a set of queries to respond to. If individuals’ metacognitive assessments of what they know is accurate, allowing individuals to opt in to questions of interest or expertise has the potential to create a more informed knowledge base over which to aggregate. This prediction was confirmed: crowds composed of volunteered judgments were more accurate than crowds composed of forced judgments. Overall, allowing individuals to use private metacognitive knowledge holds much promise in enhancing judgments, including those of the crowd.
Article
We evaluate the effect of discussion on the accuracy of collaborative judgments. In contrast to prior research, we show that discussion can either aid or impede accuracy relative to the averaging of collaborators’ independent judgments, as a systematic function of task type and interaction process. For estimation tasks with a wide range of potential estimates, discussion aided accuracy by helping participants prevent and eliminate egregious errors. For estimation tasks with a naturally bounded range, discussion following independent estimates performed on par with averaging. Importantly, if participants did not first make independent estimates, discussion greatly harmed accuracy by limiting the range of considered estimates, independent of task type. Our research shows that discussion can be a powerful tool for error reduction, but only when appropriately structured: Decision makers should form independent judgments to consider a wide range of possible answers, and then use discussion to eliminate extremely large errors. Data and the online appendix are available at https://doi.org/10.1287/mnsc.2017.2823 . This paper was accepted by Yuval Rottenstreich, judgment and decision making.
Article
Nature - the world's best science and medicine on your desktop
Article
Once considered provocative, the notion that the wisdom of the crowd is superior to any individual has become itself a piece of crowd wisdom, leading to speculation that online voting may soon put credentialed experts out of business. Recent applications include political and economic forecasting, evaluating nuclear safety, public policy, the quality of chemical probes, and possible responses to a restless volcano. Algorithms for extracting wisdom from the crowd are typically based on a democratic voting procedure. They are simple to apply and preserve the independence of personal judgment. However, democratic methods have serious limitations. They are biased for shallow, lowest common denominator information, at the expense of novel or specialized knowledge that is not widely shared. Adjustments based on measuring confidence do not solve this problem reliably. Here we propose the following alternative to a democratic vote: select the answer that is more popular than people predict. We show that this principle yields the best answer under reasonable assumptions about voter behaviour, while the standard ‘most popular’ or ‘most confident’ principles fail under exactly those same assumptions. Like traditional voting, the principle accepts unique problems, such as panel decisions about scientific or artistic merit, and legal or historical disputes. The potential application domain is thus broader than that covered by machine learning and psychometric methods, which require data across multiple questions.
Article
A ubiquitous finding in research on human judgment is that people are overconfident about their true predictive abilities. The goal of this study was to understand why overconfidence arises and how it can be reduced to improve the accuracy of predictions about future personal events. Subjects made predictions about the results of their job search efforts 9 months away (e.g., starting salary); all of the events involved positive outcomes, where unrealistic optimism was expected. These events were constructed to vary in their underlying base rate of occurrence. Some subjects generated pro and/or con reasons concerning event occurrence before making their predictions. At low- to moderate-base rates, predictive accuracy increased when subjects generated a con reason. However, at high-base rates (events that occurred for a majority of the subjects), con reason generation had no effect on accuracy-all subjects were more accurate in predicting these events. Generation of pro reasons had no effect on accuracy, suggesting that subjects may have automatically generated supportive reasons as a by-product of the question-answering process. A substantive analysis of the reasons indicated that subjects attributed pro reasons to internal factors and con reasons to external factors. Moreover, subjects who generated internal pro reasons were less accurate than subjects generating external pro or either type of con reason.
Article
Although researchers have documented many instances of crowd wisdom, it is important to know whether some kinds of judgments may lead the crowd astray, whether crowds’ judgments improve with feedback over time, and whether crowds’ judgments can be improved by changing the way judgments are elicited. We investigated these questions in a sports gambling context (predictions against point spreads) believed to elicit crowd wisdom. In a season-long experiment, fans wagered over $20,000 on NFL football predictions. Contrary to the wisdom-of-crowds hypothesis, faulty intuitions led the crowd to predict “favorites” more than “underdogs” against point spreads that disadvantaged favorites, even when bettors knew that the spreads disadvantaged favorites. Moreover, the bias increased over time, a result consistent with attributions for success and failure that rewarded intuitive choosing. However, when the crowd predicted game outcomes by estimating point differentials rather than by predicting against point spreads, its predictions were unbiased and wiser.
Article
Decision makers have a strong tendency to consider problems as unique. They isolate the current choice from future opportunities and neglect the statistics of the past in evaluating current plans. Overly cautious attitudes to risk result from a failure to appreciate the effects of statistical aggregation in mitigating relative risk. Overly optimistic forecasts result from the adoption of an inside view of the problem, which anchors predictions on plans and scenarios. The conflicting biases are documented in psychological research. Possible implications for decision making in organizations are examined.
Article
The subjective confidence of individuals in groups can be a valid predictor of accuracy in decision-making tasks.
Article
We present three studies of interactive decision making, where decision makers interact with others before making a final decision alone. Because the theories of lay observers and social psychologists emphasize the role of information collection in interaction, we developed a series of tests of information collection. Two studies with sports collection show that interaction does not increase decision accuracy or meta-knowledge (calibration or resolution). The simplest test of information collection is responsiveness - that people should respond to information against their position by modifying their choices or at least lowering their confidence. Studies using traditional scenarios from the group polarization literature show little responsiveness, and even "deviants," who interact with others who unanimously disagree with their choice, frequently fail to respond to the information they collect. The most consistent finding is that interaction increases people′s confidence in their decisions in both sports predictions and risky shift dilemmas. For predictions, confidence increases are not justified by increased accuracy. These results question theories of interaction which assume that people collect information during interaction (e.g., Persuasive Arguments Theory). They also question the labeling of previous results as "shifts" or "polarization." We suggest that interaction is better understood as rationale construction than as information collection - interaction forces people to explain their choices to others, and a variety of previous research in social psychology has shown that explanation generation leads to increased confidence. In Study 3, we provide a preliminary test of rationale construction by showing that people increase in confidence when they construct a case for their position individually, without interaction.
Article
Considerable literature has accumulated over the years regarding the combination of forecasts. The primary conclusion of this line of research is that forecast accuracy can be substantially improved through the combination of multiple individual forecasts. Furthermore, simple combination methods often work reasonably well relative to more complex combinations. This paper provides a review and annotated bibliography of that literature, including contributions from the forecasting, psychology, statistics, and management science literatures. The objectives are to provide a guide to the literature for students and researchers and to help researchers locate contributions in specific areas, both theoretical and applied. Suggestions for future research directions include (1) examination of simple combining approaches to determine reasons for their robustness, (2) development of alternative uses of multiple forecasts in order to make better use of the information they contain, (3) use of combined forecasts as benchmarks for forecast evaluation, and (4) study of subjective combination procedures. Finally, combining forecasts should become part of the mainstream of forecasting practice. In order to achieve this, practitioners should be encouraged to combine forecasts, and software to produce combined forecasts easily should be made available.
Article
This study investigates the relation between an individual's self-reported confidence and his or her influence within a freely interacting group. Each participant chose responses and provided confidence assessments for choice items of a variety of task types, first as an individual and a second time as a member of a pentad, a member of a dyad, or an individual. The influence of a particular faction within a group was greater if its members were more confident. A participant's response accuracy was related to both greater confidence and greater influence to the extent that the task fell on the intellective end of the intellective-judgmental continuum of task types. As a result, the extent to which group members' confidence predicted their influence was also greatest on intellective rather than judgmental tasks. Results further illustrate that adding group members to work on a problem may increase overconfidence on judgmental tasks but decrease overconfidence on intellective tasks.
Article
We introduce a general framework for modeling functionally diverse problem-solving agents. In this framework, problem-solving agents possess representations of problems and algorithms that they use to locate solutions. We use this framework to establish a result relevant to group composition. We find that when selecting a problem-solving team from a diverse population of intelligent agents, a team of randomly selected agents outperforms a team comprised of the best-performing agents. This result relies on the intuition that, as the initial pool of problem solvers becomes large, the best-performing agents necessarily become similar in the space of problem solvers. Their relatively greater ability is more than offset by their lack of problem-solving diversity.
Article
Researchers often conduct mediation analysis in order to indirectly assess the effect of a proposed cause on some outcome through a proposed mediator. The utility of mediation analysis stems from its ability to go beyond the merely descriptive to a more functional understanding of the relationships among variables. A necessary component of mediation is a statistically and practically significant indirect effect. Although mediation hypotheses are frequently explored in psychological research, formal significance tests of indirect effects are rarely conducted. After a brief overview of mediation, we argue the importance of directly testing the significance of indirect effects and provide SPSS and SAS macros that facilitate estimation of the indirect effect with a normal theory approach and a bootstrap approach to obtaining confidence intervals, as well as the traditional approach advocated by Baron and Kenny (1986). We hope that this discussion and the macros will enhance the frequency of formal mediation tests in the psychology literature. Electronic copies of these macros may be downloaded from the Psychonomic Society's Web archive at www.psychonomic.org/archive/.
Article
This paper introduces a three-item "Cognitive Reflection Test" (CRT) as a simple measure of one type of cognitive ability--the ability or disposition to reflect on a question and resist reporting the first response that comes to mind. The author will show that CRT scores are predictive of the types of choices that feature prominently in tests of decision-making theories, like expected utility theory and prospect theory. Indeed, the relation is sometimes so strong that the preferences themselves effectively function as expressions of cognitive ability--an empirical fact begging for a theoretical explanation. The author examines the relation between CRT scores and two important decision-making characteristics: time preference and risk preference. The CRT scores are then compared with other measures of cognitive ability or cognitive "style." The CRT scores exhibit considerable difference between men and women and the article explores how this relates to sex differences in time and risk preferences. The final section addresses the interpretation of correlations between cognitive abilities and decision-making characteristics.
Article
This paper provides a survey on studies that analyze the macroeconomic effects of intellectual property rights (IPR). The first part of this paper introduces different patent policy instruments and reviews their effects on R&D and economic growth. This part also discusses the distortionary effects and distributional consequences of IPR protection as well as empirical evidence on the effects of patent rights. Then, the second part considers the international aspects of IPR protection. In summary, this paper draws the following conclusions from the literature. Firstly, different patent policy instruments have different effects on R&D and growth. Secondly, there is empirical evidence supporting a positive relationship between IPR protection and innovation, but the evidence is stronger for developed countries than for developing countries. Thirdly, the optimal level of IPR protection should tradeoff the social benefits of enhanced innovation against the social costs of multiple distortions and income inequality. Finally, in an open economy, achieving the globally optimal level of protection requires an international coordination (rather than the harmonization) of IPR protection.
Article
This paper explores the consequences of cognitive dissonance, coupled with time-inconsistent preferences, in an intertemporal decision problem with two distinct goals: acting decisively on early information (vision) and adjusting flexibly to late information (flexibility). The decision maker considered here is capable of manipulating information to serve her self-interests, but a tradeoff between distorted beliefs and distorted actions constrains the extent of information manipulation. Building on this tradeoff, the present model provides a unified framework to account for the conformity bias (excessive reliance on precedents) and the confirmatory bias (excessive attachment to initial perceptions).
Article
Interacting groups fail to make judgments as accurate as those of their most capable members due to problems associated with both interaction processes and cognitive processing. Group process techniques and decision analytic tools have been used with groups to combat these problems. While such techniques and tools do improve the quality of group judgment, they have not enabled groups to make judgments more accurate than those of their most capable members on tasks that evoke a great deal of systematic bias. A new intervention procedure that integrates group facilitation, social judgment analysis, and information technology was developed to overcome more fully the problems typically associated with interaction processes and cognitive processing. The intervention was evaluated by testing the hypothesis that groups using this new procedure can establish judgment policies for cognitive conflict tasks that are more accurate than the ones produced by any of their members. An experiment involving 16 four- and five-member groups was conducted to compare the accuracy of group judgments with the accuracy of the judgments of the most capable group member. A total of 96 participants (48 males and 48 females) completed the individual part of the task; 71 of these participants worked in groups. Results indicated that the process intervention enabled small, interacting groups to perform significantly better than their most capable members on two cognitive conflict tasks (p < .05). The findings suggest that Group Decision Support Systems that integrate facilitation, social judgment analysis, and information technology should be used to improve the accuracy of group judgment.
Self-serving beliefs and the pleasure of outcomes. The Psychology of Economic Decisions
  • B Mellers
  • A P Mcgraw
Mellers, B., & McGraw, A. P. (2004). Self-serving beliefs and the pleasure of outcomes. The Psychology of Economic Decisions, 2, 31.
Cheap talk and credibility: The consequences of confidence and accuracy on advisor credibility and persuasiveness
  • Sah
Intuitive biases in choice versus estimation: Implications for the wisdom of crowds
  • Simmons