PreprintPDF Available

Cultural Consensus Theory for Two-Dimensional Data: Expertise-Weighted Aggregation of Location Judgments

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Cultural consensus theory is a model-based approach for analyzing responses of informants when correct answers are unknown. The model provides aggregate estimates of the latent consensus knowledge at the group level while accounting for heterogeneity both with respect to informants' competence and items' difficulty. We develop a specific version of cultural consensus theory for two-dimensional continuous judgments as obtained when asking informants to locate a set of unknown sites on a geographic map. The new model is fitted using hierarchical Bayesian modeling, with a simulation study indicating satisfactory parameter recovery. We also assess the accuracy of the aggregate location estimates by comparing the new model against simply computing the unweighted average of the informant's judgments. A simulation study shows that, due to weighting judgments by the inferred competence of the informants, cultural consensus theory provides more accurate location estimates than unweighted averaging. This result is also supported in an empirical study in which individuals judged the location of European cities on maps.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 1
Cultural Consensus Theory for Two-Dimensional Data: Expertise-Weighted
Aggregation of Location Judgments
Maren Mayer1,2 & Daniel W. Heck3
1University of Mannheim
2Heidelberg Academy of Sciences and Humanities
3University of Marburg
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 2
Author Note
Maren Mayer, Department of Psychology, School of Social Sciences, University of
Mannheim, Germany. https://orcid.org/0000-0002-6830-7768
Daniel W. Heck, Department of Psychology, University of Marburg, Germany.
https://orcid.org/0000-0002-6302-9252
Data and R scripts for the analyses are available at the Open Science Framework
(https://osf.io/jbzk7/).
The present work was presented at the SJDM Annual Meeting 2020 (Virtual
Conference) and at the 15th Conference of the Section ‘Methods and Evaluation’ (2021)
of the German Psychological Society (DGPs). The present manuscript has not yet been
peer reviewed. A preprint was uploaded to PsyArXiv and ResearchGate for timely
dissemination (version: May 7, 2022).
This work was funded by the WIN programme of the Heidelberg Academy of
Sciences and Humanities, financed by the Ministry of Science, Research and the Arts of
the State of Baden-Württemberg and also supported by the Research Training Group
“Statistical Modeling in Psychology” funded by the German Research Foundation
(DFG grant GRK 2277).
The authors made the following contributions. Maren Mayer: Conceptualization,
Investigation, Methodology, Writing - Original Draft, Writing - Review & Editing;
Daniel W. Heck: Conceptualization, Methodology, Writing - Review & Editing.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 3
Abstract
Cultural consensus theory is a model-based approach for analyzing responses of
informants when correct answers are unknown. The model provides aggregate estimates
of the latent consensus knowledge at the group level while accounting for heterogeneity
both with respect to informants’ competence and items’ difficulty. We develop a specific
version of cultural consensus theory for two-dimensional continuous judgments as
obtained when asking informants to locate a set of unknown sites on a geographic map.
The new model is fitted using hierarchical Bayesian modeling, with a simulation study
indicating satisfactory parameter recovery. We also assess the accuracy of the aggregate
location estimates by comparing the new model against simply computing the
unweighted average of the informant’s judgments. A simulation study shows that, due
to weighting judgments by the inferred competence of the informants, cultural
consensus theory provides more accurate location estimates than unweighted averaging.
This result is also supported in an empirical study in which individuals judged the
location of European cities on maps.
Keywords: wisdom of crowds, group decision making, Bayesian modeling, test
theory, psychometrics
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 4
Cultural Consensus Theory for Two-Dimensional Data: Expertise-Weighted
Aggregation of Location Judgments
1 Introduction
In many domains in the social sciences and particularly in psychological
research, participants often provide responses to questions for which correct answers are
not known. For instance, researchers may ask whether one agrees or disagrees with a set
of statements about a certain topic such as beliefs about AIDS (Trotter et al., 1999).
Cultural consensus theory (CCT, Romney et al., 1986) is a method for analyzing
responses from several informants when correct answers are unknown. The model infers
the latent cultural consensus of a group while considering variance both in the
competence of informants and in the difficulty of items. Hence, CCT has also been
described as “test theory without an answer key” (Batchelder & Romney, 1988).
The fact that true answers are unknown complicates the aggregation of
informants’ responses because it is not clear which of the informants are most
competent in the sense that they provide judgments close to the unknown cultural
truth. As a remedy, CCT allows researchers to identify the latent cultural truth while
simultaneously estimating the cultural competence of each informant. The main
principle of CCT is that informants with more cultural knowledge, and thus, higher
competence regarding the latent consensus, are likely to show similar answer patterns
across the set of questions asked (Romney et al., 1986). Based on the correlation of
answer patterns, the method jointly estimates the cultural truth at the group level and
the informants’ competence at the individual level. This requires that multiple
informants provide judgments to a set of items from the same knowledge domain
(Weller, 2007).
1.1 Applications and Extensions of Cultural Consensus Theory
CCT was first developed in anthropological research for questionnaires about
cultural topics with a dichotomous response format (Batchelder & Romney, 1988;
Romney et al., 1986). For instance, one of the first applications investigated the
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 5
intracultural variability of beliefs about whether illnesses are contagious (Romney et al.,
1986). The method has since been applied in various contexts such as aggregating
eyewitness reports (Waubert de Puiseau et al., 2017; Waubert de Puiseau et al., 2012),
obtaining forecasts for various events (Anders et al., 2014; Merkle et al., 2020), or
estimating social networks where individuals provide information about social relations
among different people (Batchelder et al., 1997; Batchelder, 2009).
The original version of CCT was applicable only to dichotomous data with one
latent cultural truth to which all informants belong. As it may be possible that not all
informants share a single, common consensus, Anders and Batchelder (2012) extended
CCT to multiple cultural truths (see also Aßfalg & Klauer, 2020). Essentially, such
extended models assume that informants belong to separate latent classes which differ
with respect to the assumed cultural truth. For instance, medical professionals and lay
people may differ with respect to medical beliefs resulting in different latent cultural
truth if the group membership is not known.
CCT was also extended to other response formats than binary answers.
Extensions have been developed for continuous data (Anders et al., 2014; Batchelder &
Anders, 2012), ordinal responses (Anders & Batchelder, 2015), and mixed response
formats (Aßfalg, 2018), and have been used to aggregate ratings about the grammatical
acceptability of English phrases as well as judgments about the importance of various
health behaviors. Statistical inference for such extended CCT models has often relied
on hierarchical Bayesian modeling in which parameter estimates are obtained via
Markov chain Monte Carlo sampling (Anders et al., 2014; Anders & Batchelder, 2012;
Aßfalg & Klauer, 2020). Overall, all these extensions have enabled researches to adapt
the CCT approach to various types of data while assuming a certain structure of
cultural truths underlying informants’ answers.
CCT is also applicable to scenarios in which correct answers are not known
during the time of data collection, but may become available later. Such applications
are especially interesting because the performance of different aggregation methods can
be directly compared against each other. In fact, prior research in judgment and
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 6
decision making showed that aggregating independent individual judgments with an
unweighted average of all judgments results in highly accurate group estimates for
various tasks and contexts (Hueffer et al., 2013; Larrick & Soll, 2006; Steyvers et al.,
2009; Surowiecki, 2005). This is surprising because all judgments are weighted equally
without considering or estimating informants’ competence with respect to the
corresponding domain. In contrast, the aggregation of judgments in CCT is weighted by
the estimated competence of informants, thereby assigning more weight to informants
closer to the cultural truth. Merkle et al. (2020) recently showed that a CCT-inspired
aggregation mechanism indeed outperforms unweighted averaging. Similarly, the
accuracy of aggregated eyewitness testimonies increases when accounting for the
witnesses’ competence levels (Waubert de Puiseau et al., 2017). This illustrates that
CCT is a useful tool for aggregating judgments when the ground truth becomes
available only at a later time.
While CCT has been adapted to several types of response formats and
applications, an extension to two- or higher-dimensional continuous judgments has not
been developed yet. Such an extension is especially useful for the aggregation of
geographical judgments about the unknown location of several sites on a map. Possible
applications for such an extension are, for instance, two-dimensional location judgments
in research on geographic knowledge and representation (Friedman, Brown, et al., 2002;
Friedman et al., 2012, 2005; Friedman, Kerkman, et al., 2002; Thorndyke &
Hayes-Roth, 1982), location judgments for objects hidden by obstacles (Yarbrough et
al., 2002), or the search of optimal locations for public facilities (e.g., park-and-ride
facilities, Faghri et al., 2002). Especially when comparing the geographical knowledge of
different cultural groups with respect to location judgments on maps (Friedman, Brown,
et al., 2002; Friedman et al., 2005), a two-dimensional extension of CCT allows
researchers to aggregate individual judgments while identifying individuals’ competence
as a possible source of variance in judgments. Furthermore, a two-dimensional extension
of CCT may be useful for locating unknown sites based on expert judgments in
scenarios such as finding a lost submarine (Surowiecki, 2005), ancient archaeological
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 7
sites (Casana, 2014), natural resources (e.g., water harvesting sites, Al-shabeeb, 2016),
or suitable areas for ecotourism (Mahdavi et al., 2015).
In the following, we thus extend CCT to two-dimensional location judgments
based on Anders’ (2014) CCT model for one-dimensional continuous responses. We
check the validity and performance of the proposed CCT model and its Bayesian
implementation in JAGS (Plummer, 2003) by investigating parameter convergence and
recovery in a Monte Carlo simulation. Moreover, we use simulations to examine under
which conditions CCT’s weighting of judgments by individuals’ competence improves
the accuracy of location estimates at the group level. Empirically, we apply the new
model to reanalyze location judgments of European cities on maps (Mayer & Heck,
2021) and compare the accuracy of the aggregate location estimates to those obtained
with unweighted averaging. Overall, the results of our simulation studies and the
empirical reanalysis show that CCT’s weighting of individual location judgments by
informants’ competence improves the estimation accuracy compared to weighting all
judgments equally.
2 Model extension for two-dimensional continuous responses
2.1 Data structure
We extend the CCT model for one-dimensional continuous responses by Anders
et al. (2014) to two-dimensional continuous judgments. As in all CCT models, the
model requires that multiple informants provide judgments for a set of items from the
same competence domain (Weller, 2007). For instance, as illustrated in Figure 1A,
several informants could be asked to locate different European cities such as London on
geographic maps (Mayer & Heck, 2021). Locations can be measured in different units
depending on the application. For instance, one may use pixels of the presented image
as in our empirical study below or geographical coordinates such as longitude and
latitude, but other two-dimensional judgments are also feasible.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 8
Figure 1
Data structure and CCT parameters for location judgments of London.
Item difficulty
Actual location of London
Cultural truth 𝑻 = 𝑇1
𝑇2
Location judgments 𝒀 = 𝑌
1
𝑌
2
Cultural competence
(one parameter per person)
expert
𝐸1
novice
𝐸2
Correlation
of errors: 𝜌
easy
𝜆12
𝜆11
difficult
𝜆22
𝜆21
x
(A) Cultural Truth & Judgments (B) Person & Item Parameters
(separate parameters for
x- and y-direction per item)
Regarding the notation, we assume that i= 1, . . . , N informants answer
k= 1, . . . , M items by providing continuous, two-dimensional location judgments
Yik =
Yik1
Yik2
.(1)
This means that each location judgment contains two components with Yik1referring to
the first dimension (e.g., the x-axis or longitude on a map) and Yik2referring to the
second dimension (e.g., the y-axis or latitude).
2.2 Model specification
The CCT model for two-dimensional judgments (CCT-2D) assumes that all
respondents share a single latent cultural truth Tkfor each item k. In our example, the
latent-truth parameters refer to the group’s consensus knowledge about the location of
London and other European cities on a map. Note that our example concerns a case
where the true locations are in principle available, but of course, the model also applies
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 9
to scenarios in which this is not the case.
As displayed in Figure 1A, we assume that the observed judgments Yik can be
modeled by two additive components, the shared cultural truth and an unsystematic
judgment error,
Yik =Tk+εik.(2)
This additive structure of a true score and an error term is not only common for CCT
models (Anders et al., 2014; Anders & Batchelder, 2012), but also at the core of
classical test theory (Lord et al., 1968). Similar to other CCT models (Anders et al.,
2014) and item response theory in general (Embretson & Reise, 2000), we assume that
the errors εik are conditionally independent given the person competence Eiand the
item difficulty λk. Moreover, since judgments are continuous, we assume a bivariate
normal distribution of errors,
(εik |Ei,λk)iid
MV-Normal (0,Σik).(3)
The covariance matrix Σik of judgment errors is modeled as a function of the
informant’s competence and the item’s difficulty. The error variances in the x- and
y-direction (i.e., the diagonal elements of Σik) are assumed to be smaller for persons
with higher cultural competence and for items that are easier, meaning that in such
cases the observed judgments are closer to the cultural truth. For instance, when asked
to locate cities in the United Kingdom, informants with high competence will position
these cities close to the shared cultural knowledge about the location. Formally, this
idea is implemented by defining the person competence Eiand the item difficulty λkd as
multiplicative factors which jointly determine the standard deviation of informants’
judgments around the cultural truth in the d-th dimension,
σikd =Eiλkd (4)
Since cultural competence is modeled as a multiplicative factor affecting the standard
deviation, the parameter Eiis restricted to be positive (Ei>0). Figure 1B illustrates
how the parameter Eiaffects the variance of the distribution of errors. Essentially,
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 10
smaller values of Eireflect a higher competence since judgments are closer to the
cultural truth.
Recent versions of CCT (e.g., Anders et al., 2014) also assume that items vary in
difficulty such that more difficult items result in a larger variance of judgments around
the cultural truth. For the present case of location judgments, we define a vector-valued
item-difficulty parameter λkfor each item with two components λk1>0and λk2>0for
the x- and y-dimension, respectively. We model the difficulty of each item with two
instead of only one value because the x- and y-dimension may differ in difficulty.
2.3 Model assumptions specific to location judgments
Two-dimensional location judgments have some unique features which require
special consideration in model development. Imagine that informants are asked to
locate London, Birmingham, Glasgow, Liverpool, and Dublin on a map of the United
Kingdom and Ireland similar to Figure 1A. The CCT-2D model outlined above
accounts for such two-dimensional continuous responses by assuming that all informants
answer according to the same underlying cultural truth. Here, the latent truth Tkrefers
to the group’s shared knowledge about the positions of city kon the map. The model
assumes that the location judgments of an informant are closer or further away from the
shared consensus knowledge depending on their competence level. Importantly, the
parameter Eirefers to the general competence of an informant irrespective of the x- or
y-direction. Hence, when an informant knows that London is located in the south of the
United Kingdom, it is also likely that they know whether it is located more to the west
or to the east. This restriction simplifies the interpretation of the competency
parameter Eias a one-dimensional trait or construct.
Whereas competence is modeled as a one-dimensional parameter, the model
assumes that each city has separate and possibly different difficulties λk1and λk2in the
x- and y-direction, respectively. Due to geographical features of a map such as borders,
lakes, coasts, or other anchor points, informants may be naturally restricted in the
positioning of a location in the vertical direction but not in the horizontal direction or
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 11
vice versa. For instance, when positioning Liverpool and Dublin, informants are limited
by the coastline to the West and the East, respectively, which may in turn result in a
reduced variance of judgments in the x-direction (longitude) compared to the
y-direction (latitude).
More generally, certain features of geographic maps such as coastlines may also
lead to spatially correlated errors of location judgments. For instance, a positive
correlation may emerge when positioning cities on a map which are closely located to a
“diagonal” coastline (e.g., Aberdeen which is located close to a coast going from
South-West to North-East). In other cases, however, informants are not restricted by
nearby coasts (e.g., Birmingham), meaning that judgment errors in x- and y- direction
may be uncorrelated. Overall, these considerations lead us to allow for a stochastic
dependence of the judgment errors εik1and εik2in the x- and y-direction, respectively.
We thus assume that, for each item k, the normally-distributed errors may correlate
between the two dimensions with correlation ρk(as illustrated by the tilted red ellipses
in Figure 1A). This results in the following covariance matrix of the two-dimensional
judgment errors in Equation 3:
Σik =
(Eiλk1)2ρkE2
iλk1λk2
ρkE2
iλk1λk2(Eiλk2)2
.(5)
Hence, the errors may be correlated between the two dimensions within each item for
each informant, which does, however, not imply that the errors are correlated across
items or informants. Hence, the CCT-2D model still satisfies the
conditional-independence assumption with respect to the two-dimensional vector of
errors εik.
2.4 Model simplifications
Compared to the CCT model for one-dimensional continuous data developed by
Anders et al. (2014), we simplified the CCT-2D model for two-dimensional judgments
with respect to several aspects. First, we do not assume multiple cultural truths. In our
example of positioning cities on a map of the United Kingdom and Ireland, multiple
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 12
cultural truths would imply that there are two or more latent classes of informants with
each group having a different consensus of where the cities are located (Anders &
Batchelder, 2012). When inferring the position of unknown locations such as natural
resources, missing victims, or ancient archaeological sites, we assume that informants
often use similar information and background knowledge to form their judgment. Thus,
a multimodal distribution of distinct patterns of location judgments is possible but
rather unlikely. In other scenarios such as the city-location task, a single correct position
on the map does exist but is not available to the informants. In such cases, CCT is
most useful when it provides a single, competence-weighted group-level estimate for
each item which can then be compared to the accuracy of other aggregation approaches
such as unweighted averaging (Merkle et al., 2020; Waubert de Puiseau et al., 2017).
Second, we do not assume a systematic response bias of location judgments. A
bias for one-dimensional responses means that informants generally shift all their
answers up or down to a certain degree as reflected by an additive component for each
informant (Anders et al., 2014). When positioning cities on a map of the United
Kingdom, a response bias would imply that informants shift all their location judgments
in a certain direction by a fixed distance (e.g., horizontally, vertically, or diagonally).
However, such a general shift of location judgments for all items seems to be unlikely
given that certain cues provided by the map (e.g., the borders, coasts, or other
geographic features) constrain the possible responses for each item in different ways. For
instance, when positioning cities on a map of the United Kingdom and Ireland, a bias to
the east would simply result in slightly biased judgments for some cities (e.g. London,
Birmingham, and Manchester) but to judgments located in the ocean for others (e.g.,
Glasgow and Dublin). Hence, the CCT-2D model does not assume a response bias.
Lastly, the CCT-2D model does not include a scaling-bias parameter. For
one-dimensional continuous data, a scaling bias refers to a multiplicative bias (i.e., a
“stretching factor”) for each informant which is assumed to affect the judgments of all
items (Anders et al., 2014). When giving location judgments, a scaling bias would mean
that informants’ judgments on each axis and for all items are scaled by a multiplicative
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 13
component resulting in location judgments that are, for instance, positioned at about
half of the correct latitude. Since informants do not give their judgments numerically
but geographically, a scaling bias would depend on where the origin of the coordinate
system is located, which is usually unknown to the informants. Moreover, a possible
bias should not depend on the underlying coordinate system. We thus did not
implement a scaling bias in the CCT-2D model.
2.5 Hierarchical Bayesian modeling
To fit the CCT-2D model to data and estimate its parameters, we adopt the
hierarchical Bayesian modeling approach by Anders et al. (2014). Hierarchical modeling
allows researchers to specify a population distribution for a set of model parameters
such as person abilities or item difficulties (Lee & Wagenmakers, 2014). This provides
many benefits such as a partial pooling of the information between the individual and
the group level, which in turn results in shrinkage of the estimates (e.g., Heck, 2019;
Singmann & Kellen, 2019). In our case, we assume separate population distributions of
the competence parameters Eiacross informants and of the item difficulty parameters
λkacross questions.
Besides specifying hierarchical distributions, the Bayesian framework also
requires to define prior distributions. In the following, we adopt the common notation
of distributions of the software JAGS (Plummer, 2003) which is used to fit the CCT-2D
model below. The normal distribution is thus not parameterized by the mean µand the
standard deviation σ, but rather by the mean µand the precision parameter τ= 12
(i.e., the inverse of the variance). Similarly, for the tdistribution, the second parameter
refers to the precision and not to the scale parameter.
Often, normal distributions are assumed as hierarchical group-level distributions.
Concerning the latent truth for each item k, we assume that the cultural truth
coordinates Tkd (with dimension index d= 1,2) are located on the real line and are
normally distributed across items,
Tkd Normal(µT, τT).(6)
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 14
In contrast, the parameters Eiand λkd are constrained to be positive. As a remedy, we
first apply a log transformation to obtain parameters on the real line for which we can
assume unbounded normal distributions (Anders et al., 2014). Taking the
dimensionality of the parameters into account, the CCT-2D model assumes a
one-dimensional hierarchical distribution of the informants’ competence,
log EiNormal(µlog E, τlog E),(7)
and a two-dimensional distribution (with dimensions d∈ {1,2}) of the items’ difficulty,
log λkMV-Normal(µlog λ,Σ1
log λ).(8)
For Bayesian inference, it is necessary to specify prior distributions for the
hyperparameters of the hierarchical group-level distributions (e.g., for µlog Eand µlog λ).
Our main goal is to estimate the parameters reflecting cultural truth, competence, and
item difficulty. Since we are not interested in testing hypotheses with theoretically
informed prior distributions (e.g., via Bayes factors, Heck et al., 2022), we rely on prior
distributions that are only weakly informative. Moreover, some hyperparameters are
fixed to constants to ensure the identifiability of the resulting model similar as in item
response theory (Embretson & Reise, 2000). For the correlation of judgment errors in
the x- and y-direction for item k, we assume the following prior:
ρkUniform(1,1).(9)
For the mean and precision of the latent truth coordinates, we assume
µTNormal(0,0.25) (10)
τTHalf-tdf=1(0,1).(11)
For the mean and standard deviation of the (log) competence, the prior is
µlog E= 0 (12)
σlog EHalf-tdf=1(0,1).(13)
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 15
For the mean and standard deviation of the (log) difficulty parameters, we assume
µlog λ,d = 0 (14)
σlog λ,d Half-tdf=1(0,3).(15)
Finally, the prior for the correlation of the (log) difficulty in x- and y-direction across
items is
ρlog λUniform(1,1).(16)
A positive correlation ρlog λmeans that if positioning a city is difficult with respect to
one axis, it is also difficult with respect to the other axis.
3 Simulation study
We performed a simulation study to examine general properties of the CCT-2D
model. First, we want to assess how well the model can recover the true,
data-generating parameters in various, realistic scenarios. Second, we compare the
accuracy of location estimates obtained with the CCT model for two-dimensional
continuous data to location estimates obtained with the unweighted aggregation of
judgments. Simulated data and R scripts are available at https://osf.io/jbzk7/.
3.1 Method
In the simulation study, the following factors were varied in a fully crossed
design using 100 replications per cell:
Number of informants: N= 10,20,50,100
Number of items: M= 5,10,25,50
Standard deviation of log informants’ competence: σlog E= 0,0.25,0.5,1
Standard deviation of log item difficulty: σlog λ= 0,0.25,0.5,1
We chose a wide range for the sample size Nto illustrate the effect of having few
or many informants on parameter recovery and on the relative performance of CCT-2D
compared to unweighted averaging. However, informants’ competence can only be
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 16
estimated precisely if the number of items is sufficiently large. Hence, we also varied the
number of items Mon a large range. Overall, these settings reflect the fact that CCT is
useful for a wide range of scenarios with both smaller and larger numbers of informants
who answer more or less questions (e.g., Waubert de Puiseau et al., 2012).
Furthermore, we varied the standard deviation of the logarithm of informants’
competence (σlog E) and the standard deviation of the logarithm of item difficulty
(σlog λ) on a large range, including conditions with no variance at all. The standard
deviations refer to the logarithm of these parameters since informants’ competence and
item difficulty must be positive, which also reflects the model’s assumption that the
log-transformed parameters follow unbounded normal distributions. While both types
of variances can be expected to affect parameter recovery of their respective parameters,
σlog Eis especially relevant for the comparison of the accuracy of estimates obtained
with CCT-2D and unweighted averaging. Without any variance in informants’
competence, CCT and unweighted averaging are expected to perform approximately
equally well because equal weighting of judgments leads to optimal performance
(Davis-Stober et al., 2014). However, if the variance in informants’ location judgments
partially emerges due to differences in informants’ competence, CCT-2D is expected to
result in more accurate estimates than unweighted averaging because it assigns larger
weights to competent informants (Merkle et al., 2020).
All simulations were conducted with the software JAGS (Plummer, 2003) in R
using the packages rjags and runjags (Denwood, 2016; Plummer, 2021). For
parameter estimation, we used 8,000 Markov chain Monte Carlo (MCMC) samples from
six chains with 1,000 adaptions, 1,500 burn-in iterations, and a thinning factor of 3.
These MCMC settings were selected to achieve a potential scale reduction factor of
ˆ
R < 1.1for all parameters. For this purpose, we first performed a small-scale simulation
study with only few informants, few items, and a small variance in informants’
competence and item difficulty to adjust the setting for JAGS. In the main simulation
study, only 56 simulations (0.22%) did not converge with more than 10% of parameters
having a potential scale reduction factor of ˆ
R > 1.1and were, thus, excluded from the
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 17
analysis. For the remaining simulations, the average potential scale reduction factor was
ˆ
R= 1.002 (99% quantile = 1.02). The model code for JAGS can be found in Appendix
A.
3.2 Parameter recovery
Figure 2
Parameter recovery of the CCT-2D model for a single simulated data set.
r = 0.99
RMSE = 0.18
r = 0.98
RMSE = 0.22
r = 0.99
RMSE = 0.15
r = 0.99
RMSE = 0.18
log Ei
log λkd
ρk
Tkd
−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2
−2
−1
0
1
2
Data−generating parameter
Estimated parameter
Note. Parameter recovery for a single simulated data set with N= 20 informants, M= 10
items, σlog E= 1, and σlog λ= 0.5. The first two panels show the logarithm of informants’
competence (log Ei) and item difficulty (log λkd).
To examine parameter recovery in our extended CCT model, we first investigate
parameter recovery using a single simulated data set. For this example, we chose a
model with N= 20 informants, M= 10 items, a standard deviation of informants’
competence of σlog E= 1, and a standard deviation of item difficulty of σlog λ= 0.5.
Figure 2shows the data-generating and estimated parameters for log Ei,log λkd,ρk, and
Tkd including the correlation of data-generating and estimated parameters and the
root-mean-square error (RMSE). For the vector-valued parameters λkand Tk, the
data-generating and estimated values for the x- and y-dimension are displayed jointly in
the respective panels. All correlations are above .98 with the RMSE of the estimates
ranging between 0.15 and 0.22. This indicates that the CCT-2D model performs quite
well even with a moderate number of informants and items.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 18
Figure 3
Parameter recovery across 25,544 replications.
log Ei
log λkd
ρk
Tkd
0.80
0.85
0.90
0.95
1.00
Correlation
10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100
0.00
0.05
0.10
0.15
0.20
0.25
N
RMSE
M5 10 25 50
Note. Average correlations of data-generating and estimated parameters and RMSEs are
displayed with 95% confidence intervals. For simulations with log σE= 0 and log σλ= 0,
no correlations could be computed for the parameters log Eiand log λkd , respectively.
To judge the performance of the CCT-2D model for various scenarios, we assess
the parameter recovery by computing the average correlation and RMSE of the
data-generating and the estimated parameters across all 25,544 replications. Again, we
display the correlation and RMSE for log λkd and Tkd for both dimensions in one panel.
For all simulations with σlog E= 0 or σlog λ= 0, the correlation of generated and
posterior values for log Eiand log λkd, respectively, cannot be computed. This affected
11,188 replications for which either σlog E,σlog λ, or both were zero.
Figure 3displays the average correlation and RMSE for all combinations of N
and M. The item parameters log λk d,ρk, and Tkd were clearly affected by the number of
informants (N). This is due to the item parameters requiring a certain number of
informants who answer these items to yield reliable parameter estimates. In contrast,
the person parameters Eiwere more strongly affected by the number of items (M).
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 19
This shows that the estimation of person parameters requires a certain number of items
to be reliable. Of all parameters, RMSEs of the cultural truth Tkd were somewhat more
affected by varying levels of Nthan those of all other parameters with RMSEs as high
as 0.30. However, correlations of data-generating and estimated parameters of log λkd
and log Eiwere more strongly affected by varying levels of Nand Mrespectively with
correlations just above .80 for both parameters.
Furthermore, Figure 4displays the parameter recovery of log Ei(Panel A) and
log λkd (Panel B) for varying levels of σlog Eand σlog λ, respectively. While RMSEs are
very small when there is no variance in either of the parameters, the recovery of Eiand
λkd is worse for low levels of σlog Eand σlog λ, respectively, with correlations between
data-generated parameters and estimated parameters as low as .64 for log Eiand .65 for
log λkd. However, as already observed in Figure 3, with increasing M, parameter
recovery for log Eiimproves, and with increasing N, parameter recovery for log λkd
improves.
Overall, parameter recovery is acceptable for small Nand Mas well as low
levels of σlog Eand σlog λ. As expected, all parameters show better recovery the larger N
and Mare and the larger the variances in informants’ competence and item difficulty
are. Accordingly, if Nand Mare small while there is little variance in σlog Eand σlog λ,
the parameters log Eior log λkd cannot be estimated reliably.
3.3 Comparing the accuracy of CCT-2D and unweighted averaging
In the following, we compare the accuracy of aggregating two-dimensional
location judgments either with the CCT-2D model or with unweighted averaging. To
obtain unweighted group-level estimates, we simply computed the unweighted mean of
all location judgments for each item (separately for the x- and the y-coordinate). As a
measure of accuracy, we use the Euclidean distance to the correct position for each
item. Figure 5displays the mean Euclidean distances across all items between the
correct values and the CCT-2D estimates (gray points) and between the correct values
and the estimates obtained with unweighted averaging (black points). To facilitate
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 20
Figure 4
Average parameter recovery for different σlog Eor σlog λ.
(A) Parameter recovery of log Ei for varying levels of σlog E
σlog E=0.25
σlog E=0.5
σlog E=1
0.6
0.7
0.8
0.9
1.0
Correlation
10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100
0.0
0.1
0.2
0.3
N
RMSE
(B) Parameter recovery of log λkd for varying levels of σ log λ
σlog λ =0.25
σlog λ =0.5
σlog λ =1
0.7
0.8
0.9
1.0
Correlation
10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100
0.0
0.1
0.2
0.3
N
RMSE
M5 10 25 50
Note. Mean correlations and RMSEs are displayed with 95% confidence intervals. For
simulations with σlog E= 0 and σlog λ= 0 no correlations could be computed for log Ei
and log λkd, respectively.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 21
Figure 5
Marginal accuracy of aggregate location estimates.
σlog λ =0
σlog λ =0.25
σlog λ =0.5
σlog λ =1
σlog E=0
σlog E=0.25
σlog E=0.5
σlog E=1
10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100
0.2
0.4
0.6
0.2
0.4
0.6
0.25
0.50
0.75
0.0
0.5
1.0
1.5
N
Euclidean Distance
Method Cultural Consensus Theory (2D) Unweighted Averaging
Note. The scaling of the y-axis differs across rows to improve readability. Mean accuracy
is displayed with 95% confidence intervals.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 22
interpretation of the results, we aggregated across replications with varying numbers of
items.
As expected, Figure 5shows that aggregating location judgments with CCT-2D
yielded more accurate estimates than aggregating judgments with unweighted
averaging. However, without any variance in informants’ competence (σlog E= 0) or
item difficulty (σlog λ= 0), both methods lead to equally accurate location estimates
(upper left panel). In line with the principles of averaging out individual errors, Figure
5shows that both unweighted averaging and CCT generally provided more accurate
estimates the larger the sample of informants was. However, increasing sample size was
more beneficial for unweighted averaging than for CCT estimates. Furthermore,
estimates obtained with unweighted averaging became worse the larger the variance in
informants’ competence became. This was expected since increasing the heterogeneity
of informants’ competence yields larger variation in judgments, which in turn results in
larger Euclidean distances to the correct position. The CCT model accounts and
corrects for this additional variance in the observed location judgments, thereby
resulting in a better recovery of the latent truth.
Even in the absence of differences in competence (first row in Figure 5), CCT-2D
resulted in more accurate location estimates than unweighted averaging. This effect is
due to shrinkage of the item parameters in the Bayesian hierarchical model. More
precisely, the CCT-2D model assumes a hierarchical group-level distribution of the
cultural-truth parameters Tkacross items. Shrinkage of these random-effect parameters
results in estimates closer to the mean µTcompared to estimates based on assuming
independent item parameters (i.e., fixed effects, Heck, 2019). As a consequence, extreme
estimates are avoided especially when there are only few judgments for each item (i.e., if
the sample size Nis small). In Figure 5, this results in a higher accuracy of CCT-2D
compared to unweighted averaging even in the absence of differences in competence.
However, with increasing numbers of judgments per item (i.e., for larger N), shrinkage
is reduced as the item parameters can be estimated more precisely. In turn, this results
in a similar accuracy for CCT-2D and unweighted averaging. Overall, our comparison
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 23
shows that CCT-2D can increase the accuracy of aggregated location judgments by
accounting for heterogeneity in competence and item difficulty.
4 Empirical study
In addition to the simulation study, we also apply the CCT-2D model to
empirical data of participants who located various European cities on geographic maps
(Mayer & Heck, 2021). Additionally, we compare the accuracy of aggregated location
judgments of CCT-2D and unweighted averaging. Since multiple informants provided
judgments for multiple items from the same knowledge domain (i.e., locations of
European cities), the data fulfills the necessary requirements for an analysis with
CCT-2D. All data and R scripts are available at https://osf.io/jbzk7/.
4.1 Methods
In the following, we reanalyze the data of a study by Mayer and Heck (2021) in
which participants had to judge the location of 57 European cities on 7 different maps.
We recruited 417 adult participants via a commercial German panel provider for an
experiment on collaboration. 235 of these participants completed a condition in which
they provided independent location judgments for all the presented items which makes
their data suitable for an reanalysis with both CCT-2D and unweighted averaging.
However, we excluded 7 participants who positioned more than 10% of the cities outside
of the countries of interest (which were highlighted in white color), resulting in a total
of 228 participants. In the remaining sample of participants, the mean age was 46.68
(SD = 15.23) and 46.9% of the participants were female. Most participants had a
college degree (34.2%) or a high-school diploma (25.9%), while 24.1% had vocational
education, and 15.8% had a lesser educational attainment.
A comprehensive overview of all presented cities and maps can be found in
Appendix B1. All maps were scaled to 1:5,000,000 and were presented as images with
800 ×500 pixels. At this scaling, the influence of earth’s curvature is small and can be
neglected in further analyses. The maps only showed oceans which were colored in blue,
landmasses which were colored in white for countries of interest and in gray for all other
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 24
countries, and national borders as black lines as shown in Figure 7.
While completing the study, participants indicated the position of each of the 57
cities independently in separate trials. Maps and cities clustered within maps were
presented in random order. Since the study was conducted online, we implemented a
maximum time limit of 40 seconds for each item to prevent looking up the correct
locations of the cities (for details, see Mayer & Heck, 2021).
4.2 Results
Figure 6
Accuracy of location estimates for 57 European cities.
Note. Reanalysis based on N= 228 participants from the data by Mayer and Heck
(2021).
To compare the accuracy of CCT-2D and unweighted averaging, we first
computed the group-level estimates for all locations of the 57 cities. For unweighted
averaging, we simply aggregated the independent location judgments for each city by
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 25
taking the mean in the x- and the y-direction. For the CCT-2D model, we extracted the
posterior-mean estimates of the two-dimensional cultural-truth parameters Tk. We then
computed the accuracy of the estimated locations by the Euclidean distance to the
actual location of the presented cities.
Figure 6displays the mean Euclidean distances across the 57 cities for the
aggregate location estimates of CCT-2D and unweighted averaging. The results show
that aggregating location judgments with CCT-2D resulted in more accurate estimates
than unweighted averaging. To illustrate the advantage of CCT-2D for aggregating
location judgments, Figure 7displays the estimated locations of both methods as well
as the correct locations for the five cities on the map of the United Kingdom and
Ireland. CCT-2D shows more accurate estimates than unweighted averaging for four of
the five cities (i.e., Birmingham, Dublin, Glasgow, and London) and an equally accurate
estimate for one city (Liverpool). Notably, for some cities such as London, the distance
between the true and the estimated location is approximately half as large for CCT-2D
compared to unweighted averaging. The supplementary material provides plots of all
seven European maps used in the study, each displaying the location estimates obtained
with unweighted averaging and CCT-2D as well as the cities’ actual positions
(https://osf.io/jbzk7/).
The descriptive patterns shown in Figures 6and 7were also supported by a
statistical analysis. A paired-sample t-test showed that the accuracy of the CCT-2D
estimates was significantly higher than that of estimates obtained with unweighted
averaging (t(56) = 10.43, p < .001). Notably, Cohen’s dindicated a large effect size of
d= 1.38. Across all cities, estimates were on average 12.35 pixels closer to the correct
position, resembling the improvement for Glasgow in Figure 7which was 15.63 pixels.
To further examine the validity of the CCT-2D model, we also computed the
correlation between the estimated competence parameters log Eiand individuals’
education level. Individuals with a higher education level should have more geographic
knowledge and thus provide more accurate judgments which are closer to the cultural
truth. Since smaller values of the competence parameter indicate higher individual
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 26
competence (i.e., reflecting a smaller variance of judgments around the cultural truth),
we expect a negative correlation between the estimated competence and education level.
When encoding the education level as an ordinal variable, a Spearman rank correlation
indeed showed a medium negative correlation of .35 (p<.001), thus strengthening the
validity of the CCT-2D model and the log Eiparameters.
Figure 7
Estimated versus actual locations of five cities.
5 Discussion
We proposed a novel model of Cultural Consensus Theory for two-dimensional
location judgments (CCT-2D). The model is based on the hierarchical Bayesian CCT
model by Anders et al. (2014) for one-dimensional data. The CCT-2D model estimates
the latent cultural truths of the presented items, that is, the group’s consensus
knowledge concerning the (unknown) positions of the items. To do so, the model infers
the informants’ competence based on the distance of their response patterns to the
shared consensus, as well as the difficulty of the items. To account for the spatial
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 27
structure of the two-dimensional data, the model assumes that judgment errors are
correlated between the two dimensions for each item.
We successfully applied the new model both to simulated and empirical data.
Using simulations, we showed that the CCT-2D model has a very good parameter
recovery for a large range of numbers of informants and numbers of items. Moreover,
the simulations showed that the CCT-2D group-level estimates for the latent truths of
the locations were more accurate in terms of the Euclidean distance to the true
locations than the estimates obtained with unweighted averaging of individual
judgment. This is due to the fact that the CCT-2D model considers additional
information obtained by inferring differences in the items’ difficulty and the informants’
competence. Furthermore, a reanalysis of an empirical study in which informants
located 57 European cities on seven maps showed a large effect concerning an increase
in accuracy of CCT-2D compared to unweighted averaging. These findings conceptually
replicate the results of Merkle et al. (2020) who found that a CCT-inspired mechanism
of weighting informants’ judgments by their expertise outperformed unweighted
averaging for one-dimensional forecasting judgments (i.e., for point spread forecasts of
the Australian Football League).
5.1 Limitations and future research
While our results provide preliminary evidence for the usefulness of the proposed
CCT-2D model, the model has several limitations that should be addressed in the
future. First, it is possible that response biases may lead to a general shift of location
judgments away from the borders into the interior regions of the presented maps. A
similar effect may also occur due to certain geographic features such as coastlines or
national borders (Friedman, Brown, et al., 2002; Friedman et al., 2005). Note that a
simple, additive shift of all location judgments into a certain direction by a certain
distance similar as in the one-dimensional CCT model by Anders et al. (2014) cannot
describe such a complex, nonlinear bias towards inner regions. However, it may
generally be difficult to disentangle complex, item-independent response biases from
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 28
distortions of the latent consensus knowledge about the locations of specific items both
empirically and conceptually.
Second, the proposed CCT-2D model assumes bivariate normal distributions of
the observed location judgments and of the latent truths concerning the positions of the
presented items. However, locations on maps are naturally constrained by the borders
of the map and by geographic features such as coasts or national borders (Friedman et
al., 2005). It is thus likely that our assumption that location judgments and latent
truths follow bivariate normal distributions with unbounded support is violated. As a
remedy, the CCT-2D model of location judgments may be improved by implementing a
truncation of the support in the two-dimensional space by respecting geographic
features of the map. For instance, when estimating the location of Dublin, one may
exclude observed judgments that position the city in the Atlantic Ocean, while also
implementing a corresponding truncation for the support of the bivariate normal
distribution of observed judgments (Gelfand et al., 1992). For the application of our
model to empirical data, we simply excluded participants who positioned more than
10% of their judgments outside the highlighted countries of interest to more adequately
fulfill this assumption.
In principle, it is also possible to truncate the support of the bivariate
distribution of latent truths to landmasses only. Thereby, one ensures that all posterior
samples of the inferred locations in MCMC sampling are actually located on land and
away from the sea. However, implementing complex, nonlinear, two-dimensional
truncations in JAGS or other software is not straightforward. Even when considering
only a set of simple, linear order constraints, tailored MCMC algorithms are usually
required to ensure that all posterior samples satisfy the constraints (Heck &
Davis-Stober, 2019). Moreover, these methods often assume that the truncated
parameter space is convex which is not the case for landmasses on geographic maps.
Thus, we leave it to future research to implement the truncation of distributions in the
CCT-2D model.
Besides aggregating location judgments on geographic maps, our extension of
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 29
CCT to two-dimensional continuous data can also be applied to other types of
judgments such as continuous ratings of both the emotional arousal and valence of
pictures on two visual analogue scales (Funke & Reips, 2012; Reips & Funke, 2008).
When using such response scales, it is reasonable to include response-bias shifts and
scaling biases as in Anders et al. (2014) to account for different response styles. The
CCT-2D model can also easily be extended to d-multivariate responses on an arbitrary
number of judgment dimensions. Such an approach could be useful, for instance, when
rating faces with respect to several dimensions such as trustworthiness, attractiveness,
and symmetry on continuous scales (Oosterhof & Todorov, 2008).
5.2 Conclusions
The proposed CCT-2D model extends the scope of applications of cultural
consensus theory to two-dimensional continuous data. Researchers can now analyze and
aggregate geographical location judgments consisting of x- and y-coordinates or
longitude and latitude to infer the group’s cultural knowledge about the unknown
locations. In doing so, the model weighs the observed judgments both by the
informants’ competence and by the items’ difficulty. Concerning the study design, it is
necessary to recruit multiple informants who provide judgments for multiple items from
the same knowledge domain. We showed that the CCT-2D model provides good
parameter recovery and, in cases where the factual truth is known, provides aggregate
group-level estimates that are more accurate than those obtained by the unweighted
averaging of location judgments.
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 30
6 References
Al-shabeeb, A. R. (2016). The use of AHP within GIS in selecting potential sites
for water harvesting sites in the Azraq Basin—Jordan. Journal of Geographic
Information System,8(1), 73–88. https://doi.org/10.4236/jgis.2016.81008
Anders, R., & Batchelder, W. H. (2015). Cultural consensus theory for the
ordinal data case. Psychometrika,80, 151–181.
https://doi.org/10.1007/s11336-013-9382-9
Anders, R., & Batchelder, W. H. (2012). Cultural consensus theory for multiple
consensus truths. Journal of Mathematical Psychology,56, 452–469.
https://doi.org/10.1016/j.jmp.2013.01.004
Anders, R., Oravecz, Z., & Batchelder, W. H. (2014). Cultural consensus theory
for continuous responses: A latent appraisal model for information pooling.
Journal of Mathematical Psychology,61, 1–13.
https://doi.org/10.1016/j.jmp.2014.06.001
Aßfalg, A. (2018). Consensus theory for mixed response formats. Journal of
Mathematical Psychology,86, 51–63.
https://doi.org/10.1016/j.jmp.2018.08.005
Aßfalg, A., & Klauer, K. C. (2020). Consensus theory for multiple latent traits
and consensus groups. Journal of Mathematical Psychology,97, 102374.
https://doi.org/10.1016/j.jmp.2020.102374
Batchelder, W. H. (2009). Cultural consensus theory: Aggregating expert
judgments about ties in a social network. Social Computing and Behavioral
Modeling, 1–9. https://doi.org/10.1007/978-1-4419-0056-2_5
Batchelder, W. H., & Anders, R. (2012). Cultural consensus theory: Comparing
different concepts of cultural truth. Journal of Mathematical Psychology,56,
316–332. https://doi.org/10.1016/j.jmp.2012.06.002
Batchelder, W. H., Kumbasar, E., & Boyd, J. P. (1997). Consensus analysis of
three-way social network data. The Journal of Mathematical Sociology,22,
29–58. https://doi.org/10.1080/0022250X.1997.9990193
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 31
Batchelder, W. H., & Romney, A. K. (1988). Test theory without an answer key.
Psychometrika,53, 71–92. https://doi.org/10.1007/BF02294195
Casana, J. (2014). Regional-scale archaeological remote sensing in the age of big
data: Automated site discovery vs. Brute force methods. Advances in
Archaeological Practice,2(3), 222–233.
https://doi.org/10.7183/2326-3768.2.3.222
Davis-Stober, C. P., Budescu, D. V., Dana, J., & Broomell, S. B. (2014). When
is a crowd wise? Decision,1, 79–101. https://doi.org/10.1037/dec0000004
Denwood, M. J. (2016). runjags: An R package providing interface utilities,
model templates, parallel computing methods and additional distributions for
MCMC models in JAGS. Journal of Statistical Software,71 (9), 1–25.
https://doi.org/10.18637/jss.v071.i09
Embretson, S. E., & Reise, S. P. (2000). Item response theory. Psychology Press.
https://doi.org/10.4324/9781410605269
Faghri, A., Lang, A., Hamad, K., & Henck, H. (2002). Integrated
knowledge-based geographic information system for determining optimal
location of park-and-ride facilities. Journal of Urban Planning and
Development,128, 18–41.
https://doi.org/10.1061/(ASCE)0733-9488(2002)128:1(18)
Friedman, A., Brown, N. R., & Mcgaffey, A. P. (2002). A basis for bias in
geographical judgments. Psychonomic Bulletin & Review,9, 151–159.
https://doi.org/10.3758/BF03196272
Friedman, A., Kerkman, D. D., & Brown, N. R. (2002). Spatial location
judgments: A cross-national comparison of estimation bias in subjective
North American geography. Psychonomic Bulletin & Review,9, 615–623.
https://doi.org/10.3758/BF03196321
Friedman, A., Kerkman, D. D., Brown, N. R., Stea, D., & Cappello, H. M.
(2005). Cross-cultural similarities and differences in North Americans’
geographic location judgments. Psychonomic Bul letin & Review,12,
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 32
1054–1060. https://doi.org/10.3758/BF03206443
Friedman, A., Mohr, C., & Brugger, P. (2012). Representational pseudoneglect
and reference points both influence geographic location estimates.
Psychonomic Bulletin & Review,19, 277–284.
https://doi.org/10.3758/s13423-011-0202-x
Funke, F., & Reips, U.-D. (2012). Why semantic differentials in web-based
research should be made from visual analogue scales and not from 5-point
scales. Field Methods,24, 310–327.
https://doi.org/10.1177/1525822X12444061
Gelfand, A. E., Smith, A. F. M., & Lee, T.-M. (1992). Bayesian analysis of
constrained parameter and truncated data problems using Gibbs sampling.
Journal of the American Statistical Association,87, 523–532.
https://doi.org/10.2307/2290286
Heck, D. W. (2019). Accounting for estimation uncertainty and shrinkage in
Bayesian within-subject intervals: A comment on Nathoo, Kilshaw, and
Masson (2018). Journal of Mathematical Psychology,88, 27–31.
https://doi.org/10.1016/j.jmp.2018.11.002
Heck, D. W., Boehm, U., Böing-Messing, F., Bürkner, P.-C., Derks, K., Dienes,
Z., Fu, Q., Gu, X., Karimova, D., Kiers, H., Klugkist, I., Kuiper, R. M., Lee,
M. D., Leenders, R., Leplaa, H. J., Linde, M., Ly, A., Meijerink-Bosman, M.,
Moerbeek, M., . .. Hoijtink, H. (2022). A review of applications of the Bayes
factor in psychological research. Psychological Methods. In press.
https://doi.org/10.1037/met0000454
Heck, D. W., & Davis-Stober, C. P. (2019). Multinomial models with linear
inequality constraints: Overview and improvements of computational
methods for Bayesian inference. Journal of Mathematical Psychology,91,
70–87. https://doi.org/10.1016/j.jmp.2019.03.004
Hueffer, K., Fonseca, M. A., Leiserowitz, A., & Taylor, K. M. (2013). The
wisdom of crowds: Predicting a weather and climate-related event. Judgment
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 33
and Decision Making,8, 91–105.
Larrick, R. P., & Soll, J. B. (2006). Intuitions about combining opinions:
Misappreciation of the averaging principle. Management Science,52,
111–127. https://doi.org/10.1287/mnsc.1050.0459
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A
practical course. Cambridge University Press.
Lord, F. M., Novick, M. R., & Birnbaum, A. (1968). Statistical theories of
mental test scores. Addison-Wesley.
Mahdavi, A., Niknejad, M., & Karami, O. (2015). A fuzzy multi-criteria decision
method for ecotourism development locating. Caspian Journal of
Environmental Sciences,13 (3), 221–236.
https://cjes.guilan.ac.ir/article_1373.html
Mayer, M., & Heck, D. W. (2021). Sequential collaboration: Comparing the
accuracy of dependent, incremental judgments to wisdom of crowds.
https://doi.org/10.31234/osf.io/w4xdk
Merkle, E. C., Saw, G., & Davis-Stober, C. (2020). Beating the average forecast:
Regularization based on forecaster attributes. Journal of Mathematical
Psychology,98, 102419. https://doi.org/10.1016/j.jmp.2020.102419
Oosterhof, N. N., & Todorov, A. (2008). The functional basis of face evaluation.
Proceedings of the National Academy of Sciences,105, 11087–11092.
https://doi.org/10.1073/pnas.0805664105
Plummer, M. (2003). JAGS: A program for analysis of bayesian graphical models
using Gibbs sampling.
Plummer, M. (2021). rjags: Bayesian graphical models using MCMC.
https://CRAN.R-project.org/package=rjags
Reips, U.-D., & Funke, F. (2008). Interval-level measurement with visual
analogue scales in internet-based research: VAS generator. Behavior Research
Methods,40, 699–704. https://doi.org/10.3758/BRM.40.3.699
Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus:
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 34
A theory of culture and informant accuracy. American Anthropologist,88,
313–338. https://www.jstor.org/stable/677564
Singmann, H., & Kellen, D. (2019). An introduction to mixed models for
experimental psychology. In D. H. Spieler & E. Schumacher (Eds.), New
Methods in Cognitive Psychology. Psychology Press.
Steyvers, M., Miller, B., Hemmer, P., & Lee, M. (2009). The wisdom of crowds
in the recollection of order information. In Y. Bengio, D. Schuurmans, J.
Lafferty, C. Williams, & A. Culotta (Eds.), Advances in neural information
processing systems (Vol. 22). Curran Associates, Inc.
Surowiecki, J. (2005). The wisdom of crowds. Anchor Books.
Thorndyke, P. W., & Hayes-Roth, B. (1982). Differences in spatial knowledge
acquired from maps and navigation. Cognitive Psychology,14, 560–589.
https://doi.org/10.1016/0010-0285(82)90019-6
Trotter, R., Weller, S., Baer, R., Pachter, L., Glazer, M., Alba-García, J., &
Klein, R. (1999). Consensus theory model of AIDS/SIDA beliefs in four
latino populations. AIDS Education and Prevention,11, 414–426.
Waubert de Puiseau, B., Aßfalg, A., Erdfelder, E., & Bernstein, D. M. (2012).
Extracting the truth from conflicting eyewitness reports: A formal modeling
approach. Journal of Experimental Psychology: Applied,18, 390–403.
https://doi.org/10.1037/a0029801
Waubert de Puiseau, B., Greving, S., Aßfalg, A., & Musch, J. (2017). On the
importance of considering heterogeneity in witnesses’ competence levels when
reconstructing crimes from multiple witness testimonies. Psychological
Research,81, 947–960. https://doi.org/10.1007/s00426-016-0802-1
Weller, S. C. (2007). Cultural consensus theory: Applications and frequently
asked questions. Field Methods,19, 339–368.
https://doi.org/10.1177/1525822X07303502
Yarbrough, G. L., Wu, B., Wu, J., J. He, Z., & Leng, T. (2002). Judgments of
object location behind an obstacle depend on the particular information
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 35
selected. Journal of Vision,2, 625. https://doi.org/10.1167/2.7.625
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 36
Appendix A
JAGS code for the CCT-2D model of two-dimensional location judgments
model{
for(i in 1:n){
for(k in 1:m){
sigma[i,k,1] <- E[i]*lam[k,1]
sigma[i,k,2] <- E[i]*lam[k,2]
Sigma[i,k,1,1] <- pow(sigma[i,k,1], 2)
Sigma[i,k,2,2] <- pow(sigma[i,k,2], 2)
Sigma[i,k,1,2] <- rho[k] * sigma[i,k,1] * sigma[i,k,2]
Sigma[i,k,2,1] <- rho[k] * sigma[i,k,1] * sigma[i,k,2]
Tau[i,k,1:2,1:2] <- inverse(Sigma[i,k,1:2,1:2])
Y[i,k,1:2] ~ dmnorm(T[k,1:2], Tau[i,k,1:2,1:2])
}
}
# Parameters
for (i in 1:n){
Elog[i] ~ dnorm(Emu,Etau)
E[i] <- exp(Elog[i])
}
lamSigma[1,1] <- pow(lamsigmax, 2)
lamSigma[2,2] <- pow(lamsigmay, 2)
lamSigma[1,2] <- lamrho * lamsigmax * lamsigmay
lamSigma[2,1] <- lamSigma[1,2]
for (k in 1:m){
T[k,1] ~ dnorm(Tmu,Ttau)
T[k,2] ~ dnorm(Tmu,Ttau)
lamlog[k,1:2] ~ dmnorm.vcov(lammu[1:2], lamSigma[1:2,1:2])
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 37
lam[k,1] <- exp(lamlog[k,1])
lam[k,2] <- exp(lamlog[k,2])
}
# Hyperparameters
Tmu ~ dnorm(0,0.25)
Ttau ~ dt(0,1,1)T(0,)
lammu[1] <- 0
lammu[2] <- 0
lamsigmax ~ dt(0,3,1)T(0,)
lamsigmay ~ dt(0,3,1)T(0,)
lamrho ~ dunif(-1, 1)
Emu <- 0
Etau <- pow(Esigma, -2)
Esigma ~ dt(0,1,1)T(0,)
for(k in 1:m){
rho[k] ~ dunif(-1, 1)
}
}
CULTURAL CONSENSUS THEORY FOR LOCATION JUDGMENTS 38
Appendix B
European cities used in the reanalysis
Table B1
European cities and maps from the study by Mayer and Heck (2021).
Item Map Cities
1 Austria and Switzerland Zurich, Geneva, Basel, Bern, Vienna, Graz, Linz, Salzburg
2 France Paris, Marseille, Lyon, Toulouse, Nice
3 Italy Rome, Milan, Naples, Florence, Venice
4 Spain and Portugal Madrid, Barcelona, Seville, Lisbon, Porto
5 United Kingdom and Ireland London, Birmingham, Glasgow, Liverpool, Dublin
6 Poland, Czech, Hungary and Slovenia Warsaw, Prague, Bratislava, Budapest
7 Germany Berlin, Hamburg, Cologne, Frankfurt, Stuttgart, Düsseldorf,
Leipzig, Dortmund, Essen, Bremen, Dresden, Hannover,
Nuremberg, Duisburg, Wuppertal, Bielefeld, Bonn, Münster,
Karlsruhe, Mannheim, Augsburg, Wiesbaden, Braunschweig,
Kiel, Munich
ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
Online collaborative projects in which users contribute to extensive knowledge bases such as Wikipedia or OpenStreetMap have become increasingly popular while yielding highly accurate information. Collaboration in such projects is organized sequentially with one contributor creating an entry and the following contributors deciding whether to adjust or to maintain the presented information. We refer to this process as sequential collaboration since individual judgments directly depend on the previous judgment. As sequential collaboration has not yet been examined systematically, we investigate whether dependent, sequential judgments become increasingly more accurate. Moreover, we test whether final sequential judgments are more accurate than the unweighted average of independent judgments from equally large groups. We conducted three studies with groups of four to six contributors who either answered general knowledge questions (Experiments 1 and 2) or located cities on maps (Experiment 3). As expected, individual judgments became more accurate across the course of sequential chains and final estimates were similarly accurate as unweighted averaging of independent judgments. These results show that sequential collaboration profits from dependent, incremental judgments, thereby shedding light on the contribution process underlying large-scale online collaborative projects.
Chapter
Full-text available
This chapter describes a class of statistical model that is able to account for most of the cases of nonindependence that are typically encountered in psychological experiments, linear mixed-effects models, or mixed models for short. It introduces the concepts underlying mixed models and how they allow accounting for different types of nonindependence that can occur in psychological data. The chapter discusses how to set up a mixed model and how to perform statistical inference with a mixed model. The most important concept for understanding how to estimate and how to interpret mixed models is the distinction between fixed and random effects. One important characteristic of mixed models is that they allow random effects for multiple, possibly independent, random effects grouping factors. Mixed models are a modern class of statistical models that extend regular regression models by including random-effects parameters to account for dependencies among related data points.
Article
Full-text available
Aggregating information across multiple testimonies may improve crime reconstructions. However, different aggregation methods are available, and research on which method is best suited for aggregating multiple observations is lacking. Furthermore, little is known about how variance in the accuracy of individual testimonies impacts the performance of competing aggregation procedures. We investigated the superiority of aggregation-based crime reconstructions involving multiple individual testimonies and whether this superiority varied as a function of the number of witnesses and the degree of heterogeneity in witnesses' ability to accurately report their observations. Moreover, we examined whether heterogeneity in competence levels differentially affected the relative accuracy of two aggregation procedures: a simple majority rule, which ignores individual differences, and the more complex general Condorcet model (Romney et al., Am Anthropol 88(2):313-338, 1986; Batchelder and Romney, Psychometrika 53(1):71-92, 1988), which takes into account differences in competence between individuals. 121 participants viewed a simulated crime and subsequently answered 128 true/false questions about the crime. We experimentally generated groups of witnesses with homogeneous or heterogeneous competences. Both the majority rule and the general Condorcet model provided more accurate reconstructions of the observed crime than individual testimonies. The superiority of aggregated crime reconstructions involving multiple individual testimonies increased with an increasing number of witnesses. Crime reconstructions were most accurate when competences were heterogeneous and aggregation was based on the general Condorcet model. We argue that a formal aggregation should be considered more often when eyewitness testimonies have to be assessed and that the general Condorcet model provides a good framework for such aggregations.
Article
The last 25 years have shown a steady increase in attention for the Bayes factor as a tool for hypothesis evaluation and model selection. The present review highlights the potential of the Bayes factor in psychological research. We discuss six types of applications: Bayesian evaluation of point null, interval, and informative hypotheses, Bayesian evidence synthesis, Bayesian variable selection and model averaging, and Bayesian evaluation of cognitive models. We elaborate what each application entails, give illustrative examples, and provide an overview of key references and software with links to other applications. The article is concluded with a discussion of the opportunities and pitfalls of Bayes factor applications and a sketch of corresponding future research lines. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Book
This book develops an intuitive understanding of IRT principles through the use of graphical displays and analogies to familiar psychological principles. It surveys contemporary IRT models, estimation methods, and computer programs. Polytomous IRT models are given central coverage since many psychological tests use rating scales. Ideal for clinical, industrial, counseling, educational, and behavioral medicine professionals and students familiar with classical testing principles, exposure to material covered in first-year graduate statistics courses is helpful. All symbols and equations are thoroughly explained verbally and graphically. © 2000 by Lawrence Erlbaum Associates, Inc. All rights reserved.
Article
In a variety of real-world forecasting contexts, researchers have demonstrated that the unweighted average forecast is reasonably accurate and difficult to improve upon with more complex, model-based aggregation methods. We investigate this phenomenon by systematically examining the relationship between individual forecaster characteristics (e.g., bias, consistency) and aspects of the criterion being forecast (e.g., “signal strength”). To this end, we develop a model inspired by Cultural Consensus Theory (Batchelder and Romney, 1988) that (i) allows us to jointly estimate both forecaster characteristics and environmental characteristics and (ii) contains the unweighted average as a special case. This allows us to use the model as a regularization method for forecast aggregation, where restrictions on forecaster parameters make the model similar to use of an unweighted average. Relatedly, the model allows us to apply existing results on optimal forecaster weighting to real data. We show how the model provides guidance for identifying prediction environments where the average forecast can potentially be beaten. We also conduct two simulation studies and illustrate the model’s practical application using forecasts of Australian Football League point spreads.
Article
We consider a situation in which a group of respondents answers a set of questions and the aim is to identify any consensus among the respondents—that is, shared attitudes, beliefs, or knowledge. Consensus theory postulates that a latent trait determines the respondents’ probability to produce the consensus response. We propose a new version of the variable-response model, which implements consensus theory for numerical continuous responses, ordered categorical responses, unordered categorical responses, or a mixture thereof. The new model also accounts for multiple consensus groups and multiple latent traits underlying the response data. In a series of simulation studies, we identify procedures and conditions that permit an accurate estimation of the number of consensus groups and latent traits. In these simulations, we find that the model recovers the data-generating consensus responses well. We replicate these findings with the empirical data of a memory test.
Article
Measuring shared beliefs, expert consensus, or the details of a crime in eyewitness testimony represents a psychometric challenge. In expert interviews, for example, the correct responses representing the expert consensus (i.e., the answer key) are initially unknown and experts may differ in their contribution to this consensus. I propose the variable-response model, an extension of latent-trait models. The model allows the estimation of the answer key and the latent trait for continuous, categorical, or mixed responses. I describe some minimal requirements for the addition of new response formats to the model. I further propose a Markov chain Monte Carlo algorithm to estimate the model parameters. The results of a simulation study demonstrate that the algorithm accurately recovers the data-generating parameters. I also present an application of the variable-response model to the empirical data of a Geography test. In this application, the parameter estimates correspond well with the true answer key.