ArticlePDF Available

Dealing with measurement error in list experiments: Choosing the right control list design

Authors:

Abstract and Figures

List experiments are widely used in the social sciences to elicit truthful responses to sensitive questions. Yet, the research design commonly suffers from the problem of measurement error in the form of non-strategic respondent error, where some inattentive participants might provide random responses. This type of error can result in severely biased estimates. A recently proposed solution is the use of a necessarily false placebo item to equalize the length of the treatment and control lists in order to alleviate concerns about respondent error. In this paper we show theoretically that placebo items do not in general eliminate bias caused by non-strategic respondent error. We introduce a new option, the mixed control list, and show how researchers can choose between different control list designs to minimize the problems caused by inattentive respondents. We provide researchers with practical guidance to think carefully about the bias that inattentive respondents might cause in a given application of the list experiment. We also report results from a large novel list experiment fielded to over 4900 respondents, specifically designed to illustrate our theoretical argument and recommendations.
Content may be subject to copyright.
Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons
Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use,
reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and
Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
https://doi.org/10.1177/20531680211013154
Research and Politics
April-June 2021: 1 –8
© The Author(s) 2021
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/20531680211013154
journals.sagepub.com/home/rap
Introduction
While list experiments remain popular in the social sci-
ences, several studies have highlighted the problem of
measurement error in the form of non-strategic respondent
error, where some inattentive subjects might provide ran-
dom responses to the list (Ahlquist, 2018; Alvarez et al.,
2019; Blair et al., 2019; Kramon and Weghorst, 2019).
Given that the length of the treatment and control list dif-
fers, this type of error can result in heavily biased estimates.
To address this issue, a recent study recommends the use of
a necessarily false placebo item1 in order to equalize the
length of the treatment and control lists and, in turn, allevi-
ate concerns about respondent error (Riambau and Ostwald,
2020) (see also Ahlquist et al. (2014) and De Jonge and
Nickerson (2014)).
In this paper we show that placebo items are not a
straightforward universal solution to the problem of non-
strategic respondent error in list experiments and that
their inclusion can even increase bias in many cases. In
general, modifying the length of the control list will not
eliminate bias caused by inattentive respondents.
However, we show that researchers, by carefully consid-
ering the expected bias associated with a specific control
list design, can choose a design that minimizes the prob-
lems caused by non-strategic respondent error. In addi-
tion to the conventional and placebo control list, we
introduce a novel third alternative, the mixed control list,
that combines the two former designs. We calculate the
expected bias associated with each control list design and
provide concrete recommendations for researchers for
choosing between them in different situations. Using data
from a recent meta analysis (Blair et al., 2020), we show
Dealing with measurement error in list
experiments: Choosing the right control
list design
Mattias Agerberg and Marcus Tannenberg
Abstract
List experiments are widely used in the social sciences to elicit truthful responses to sensitive questions. Yet, the
research design commonly suffers from the problem of measurement error in the form of non-strategic respondent
error, where some inattentive participants might provide random responses. This type of error can result in severely
biased estimates. A recently proposed solution is the use of a necessarily false placebo item to equalize the length of the
treatment and control lists in order to alleviate concerns about respondent error. In this paper we show theoretically
that placebo items do not in general eliminate bias caused by non-strategic respondent error. We introduce a new
option, the mixed control list, and show how researchers can choose between different control list designs to minimize
the problems caused by inattentive respondents. We provide researchers with practical guidance to think carefully about
the bias that inattentive respondents might cause in a given application of the list experiment. We also report results
from a large novel list experiment fielded to over 4900 respondents, specifically designed to illustrate our theoretical
argument and recommendations.
Keywords
List experiment, non-strategic respondent error, placebo item, sensitive questions, control list design
University of Gothenburg, Sweden
Corresponding author:
Mattias Agerberg, University of Gothenburg, Sprängkullsgatan 19,
Göteborg, 41123, Sweden.
Email: mattias.agerberg@gu.se
1013154RAP0010.1177/20531680211013154Research & PoliticsAgerberg and Tannenberg
research-article20212021
Research Article
2 Research and Politics
that a clear majority of existing studies using list experi-
ments might have benefited from a non-conventional
control list design. We also report results from from a
large list experiment fielded in China to over 4900
respondents, designed to test our theoretical predictions.
The study includes a novel approach to designing placebo
items that we recommend researchers to use whenever
the inclusion of such an item is warranted. We conclude
by discussing the general implications of our results and
summarize our overall recommendations for researchers
using the list experiment.
Setup
The list experiment works by aggregating a sensitive
item of interest with a list of control items to protect
respondents’ integrity (Glynn, 2013). We adopt the nota-
tion in Blair and Imai (2012) and consider a list experi-
ment with J binary control items and one binary
sensitive item denoted ZiJ
,1
+. Respondents are randomly
assigned to either a control group ( Ti
=0
) and given the
list with J control items, or to a treatment group (Ti=1
), given the list with with J control items and the sensi-
tive item ZiJ
,1
+. The total number of items in the treat-
ment group is thus J
+
1. Yi denotes the observed
response to the list experiment for respondent i (the
total number of items answered affirmatively). If we
denote the total number of affirmative answers to J
control items with Yi
*, the process generating the
observed response can be written as:
YT
ZY
iiiJ i
=,1
*
++ (1)
Blair and Imai (2012) show that if we assume that the
responses to the sensitive item are truthful (“no liars”) and
that the addition of the sensitive item does not alter
responses to the control items (“no design effects”), the
proportion of affirmative answers to the sensitive item,
denoted
τ
, can be estimated by taking the difference
between the average response among the treatment group
and the average response among the control group, that is,
a difference-in-means (DiM) estimator. The DiM estimator
can be written as:
τ
=11
(1 )
1=1 0=1
NTY N
TY
i
N
ii
i
N
ii
∑∑
−−
(2)
where
τ
is the estimated proportion affirmative answers to
the sensitive item,
i1= is the size of the treatment
group and NNN
01
= is the size of the control group.
When assignment to the treatment group is random and we
invoke the two assumptions above, the DiM estimator is
unbiased: EE[| =1][|=0]
=0
YT YT
ii ii
−−
τ
(Blair et al.,
2019).
Non-strategic respondent error and
bias
Modeling inattentive respondents
Strategic respondent error in list experiments, where the
respondent for instance might avoid selecting the maxi-
mum or minimum number of items, can generally be mini-
mized by choosing control items (non-sensitive items) in a
well-thought-out manner (Glynn, 2013). Non-strategic
respondent error, on the other hand, arises when respond-
ents provide a random response to the list experiment. This
type of respondent error is likely to be prevalent when
respondents do not pay enough attention to the survey
(Berinsky et al., 2014) or when respondents resort to satis-
ficing (Krosnick, 1991). Given that the list lengths differ
between the treatment and control group, this type of error
will often be correlated with treatment assignment and can
hence dramatically increase both bias and variance in the
estimate of interest (Ahlquist, 2018; Blair et al., 2019).
We refer to respondents providing random responses as
inattentive2 and let Wi=1 if a respondent is inattentive
(Wi=0 otherwise). Blair et al. (2019) model inattentives as
providing responses to the list experiment according to a
discrete uniform distribution
UJ
{0
,}
, that is, by randomly
picking a number between 0 and J ( J
+
1 in the treatment
group). We refer to this as the uniform error model. The pro-
cess generating the observed response under this model can
be written as:
YWTZ YWTU
TU
iiiiJi ii J
iJ
=(1)()(
(1 ))
,1
*
{0
,1
}
{0,}
−++
+−
++
(3)
The consequences of this error process in terms of bias
in the estimate of
τ
(see below) are identical to several
other plausible error processes. We discuss this further in
Appendix A.
Let
s
denote the share of inattentive respondents. As
shown by Blair et al. (2019) [appendix A], under the uni-
form error model, the bias of the difference-in-means esti-
mator ( EE[| =1][|=0]YT YT
ii ii
−−
τ
) amounts to:
{(1)()
1
2
(1 )2}
=(1
*
,1
*
E
E
−+ ++
−−+
+
sY Zs
J
sY sJ
iiJ
i
τ
sss
JJ s)1
22 =1
2
ττ
τ
++
−−
(4)
This follows from the fact that the expected value of a
(discrete) uniform distribution is
ab
+
2
, where the interval
[,]ab denotes the support of the distribution. The expected
difference between the treatment and control group for
inattentive respondents is thus:
N
i=1
Agerberg and Tannenberg 3
EE
E
[| =1]= [| =1,=1]
[| =0,=1] = 1
22
=1
2
τ
WYTW
YT W
JJ
iiii
ii i
+ (5)
The amount of bias will hence be larger when
τ
is fur-
ther away from 1
2 and the estimate will be “pulled” toward
1
2 as
s
increases. What is sometimes referred to as “artifi-
cial inflation” (De Jonge and Nickerson, 2014; Riambau
and Ostwald, 2020) is simply what we would expect under
this error process, as long as
τ
<1
2.
Introducing a placebo item
How is the amount of bias affected by the introduction of a
placebo item? When adding a placebo item (with expected
value equal to 0) to the control list, the bias under this error
model3 instead becomes:
{(1)()
1
2
(1 )1
2}
=(
*
,1
*
E
E
−+ ++
−−++
+
sY Zs
J
sY sJ
iiJ
i
τ
11) 1
2
1
2=−+ ++
−−
ss
JJ s
ττ
τ
(6)
Hence, without a placebo item the bias will be 0 only
when
τ
=1
2
, and with a placebo item the bias will be 0 only
when
τ
=0
(assuming
s
is fixed and positive). The addi-
tion of the placebo item will thus in general cause negative
bias (
s
τ
) under the uniform-error model. This is because
the expected estimated prevalence of the sensitive item in
the inattentive group is 0: E[( 1) /2 (1)/2]
JJ
+−+.
When does the addition of a placebo item decrease
the absolute amount of bias? This happens when
ss
1
2
><
1
4
τττ
. The addition of a placebo item
will thus only decrease the absolute amount of bias when
the true prevalence of ZiJ
,1
+ is below 1
4
.
The mixed control list
We also consider a third option for constructing a control list:
allocating respondents in the control group to either the pla-
cebo or conventional control list. We focus on a design where
respondents are allocated to the two groups with p*= 0.5
(conditional on being assigned to the control group). It is also
possible to assign respondents to the different control lists
with unequal probability ( p* and 1*
p, with p0.5*),
however we focus on the equal probability scenario since
this is the easiest to implement.4 Under the mixed design, the
bias under the uniform error model becomes:
{(1)()
1
2
(1 )2(1 )1
2
*
,1
** *
E
E
−+ ++
−+ +− +
+
sY Zs
J
sY spJpJ
iiJ
i
−+ +−++
}=
(1
)1
22
(1 )1
2
**
τ
τ
ss
JpJpJ
τ
ττ
=2
*
sp
(7)
With p*= 0.5 (the equal probability scenario), the bias
is hence 0 when
τ
=1
4.
Which control list design should you choose?
In summary, adding a placebo item might decrease bias but
should not in general be considered a silver bullet solution
to the problem of non-strategic respondent error; in many
cases it might even increase bias if not implemented care-
fully. It is clear that this type of bias varies considerably
between different designs and that it has the potential to
strongly influence the resulting estimate. Unfortunately, the
bias caused by inattentive respondents can not be easily
eliminated by modifying the control list. However, by mak-
ing an informed choice regarding the control list design, the
problem can be mitigated.
As the sections above show, the bias under the different
control list designs – conventional, placebo, and mixed –
will depend on
τ
(when s
>0
). Researchers thus need to
make an informed guess about the true prevalence rate to
choose the best control list design in a given situation. We
call this guess the predicted prevalence rate and denote it by
τ
*. To set
τ
*, researchers should rely on previous research,
including other list experiments. If previous evidence is
insufficient, an alternative is to run a pilot study with a few
hundred respondents to estimate agreement with the direct
question. Based on this estimate,
τ
can be approximated
by making assumptions about the degree of sensitivity bias.
As a starting point, Blair et al. (2020) find that sensitivity
biases are typically smaller than 10 percentage points. The
authors also provide a discussion of sensitivity bias in dif-
ferent literatures.
Which control list design to choose in a specific study
depends mainly on two considerations:
τ
* and the specific
goal of the study (e.g., what is the estimand?). If the goal of
the study is to use a list experiment to get a good descriptive
estimate of
τ
, we advocate choosing the control list design
that minimizes the absolute expected bias. This amount is
calculated by plugging in a value for
τ
* in equation (4),
(6), and (7) and comparing the absolute value of each result.
If the goal of the study is to estimate the amount of sen-
sitivity bias, we advocate an approach that likely will lead
to a conservative estimate. Typically, researchers do not
reflect upon how non-strategic respondent error may influ-
ence their list experiment estimate. In many cases this bias
4 Research and Politics
works in favor of the tested hypothesis: when
τ
is below
0.5 and the researcher wants to test if the quantity is under-
reported (when measured with a direct question), a list
experiment with a conventional control list will pull the
estimate toward 0.5 due to non-strategic respondent error.
This may thus inflate the type 1 error rate – e.g., concluding
that underreporting is prevalent when it is in fact negligible.
Instead, we advocate choosing a control list design where
potential bias due to respondent error works against the
tested hypothesis. This approach is described in Table 1.
To illustrate, when
τ
* is assumed to be approximately
0.35 and hypothesized to be underreported, the researcher
should choose the mixed design. This will pull the estimate
from the list experiment toward 0.25 (the expected bias
among inattentives under the mixed design) and hence
yield a conservative estimate of the degree of underreport-
ing when comparing the list estimate to a direct estimate.
Note that when 0.5
<1
*
τ
and the researcher predicts
overreporting, there is no design that yields a conservative
estimate. The conventional design is still the “least bad” in
this case.
To what extent are our design recommendations relevant
for researchers using list experiments? To take the 264 list
experiments included in Blair et al.’s (2020) meta analysis
as an indication: only 36 percent of the studies have an esti-
mated prevalence and hypothesized sensitivity bias for
which the conventional control list is expected to yield the
smallest bias granted a preference for a conservative esti-
mate.5 For 37 percent, the placebo control list would have
been recommended. The remaining 27 percent of the list
experiments would have benefited from our proposed
mixed control list design (see Figure 1). In summary,
assuming a non-negligible share of inattentive respondents,
the vast majority of existing studies using list experiments
have used a sub-optimal control list design, among which
we are more likely to find type 1 errors.
This concerns some sub-fields more than others. For
example, studies of relatively low prevalence behaviors,
with a hypothesized underreporting, such as vote buying
(Gonzalez-Ocantos et al., 2012), clientelism (Corstange,
2018), and corruption (Agerberg, 2020; Tang, 2016)
would often be recommended to choose the placebo or
mixed control list in order to guard against type 1 errors.
In contrast, studies of voter turnout (Holbrook and
Krosnick, 2010) and support in authoritarian regimes
(Frye et al., 2017; Robinson and Tannenberg, 2019),
where the hypothesized sensitivity bias concerns overre-
porting, are more often in the predicted prevalence range
where the conventional control list is the best choice. It
should be noted that several of the latter studies are in a
range, ( 0.5
<1
*
τ
), where it is not possible to obtain
conservative estimates (see bottom right of Figure 1).
Table 1. Control lists yielding (mostly) conservative estimates when testing for sensitivity bias.
Predicted prevalence Hypothesized sensitivity bias Control list design Estimate
0< 0.25
*
τ
Underreporting Placebo Conservative (pulled toward 0)
0.25
<0
.5
*
τ
Underreporting Mixed Conservative (pulled toward 0.25)
0.5< 1
*
τ
Underreporting Conventional Conservative (pulled toward 0.5)
0< 0.25
*
τ
Overreporting Mixed Conservative (pulled toward 0.25)
0.25
<0
.5
*
τ
Overreporting Conventional Conservative (pulled toward 0.5)
0.5< 1
*
τ
Overreporting Conventional Non-conservative (pulled toward 0.5)
Recommended controllist: Conventional Placebo Mixed
Under−
reporting
Over
reporting
0.00 0.25 0.50 0.75 1.00
Estimatedprevalence
Figure 1. Distribution of estimated prevalence by hypothesized sensitivity bias from list experiments included in Blair etal.’s
(2020) meta analysis. Recommendations are based on a preference for a conservative estimate (see Table 1).
Agerberg and Tannenberg 5
An empirical illustration
To empirically demonstrate the consequences of the differ-
ent control list designs we designed a list experiment that
was fielded online in China to 4973 respondents, in col-
laboration with the survey research company Lucid. In the
experiment, one-third of respondents were assigned to a
conventional control list with four items, another third to a
placebo list with the same four control items plus a placebo
item, and the remaining third were assigned to a treatment
list with the four control items plus an additional “item of
interest” (corresponding to the “sensitive item” in regular
designs). By sampling from the conventional and the pla-
cebo control list with p*= 0.5 (the equal probability sce-
nario), we create the third control group representing the
mixed design.
The four items on the control list were presented in ran-
dom order and two of the items were chosen to be nega-
tively correlated, following best practice (Glynn, 2013)
(see Appendix B, Table 2). The placebo list added a fifth
item to match the length of the treatment list. The order of
the items was randomized. The placebo item was designed
to have a true expected prevalence for each respondent of 0
(see section How to choose good placebo items below for
further details).
The treatment group was given the control list plus an
additional item of interest. We designed the experiment so
that the item of interest had the following three specific
properties: (1) the true quantity of the item was known, (2)
the item was independent of all items on the control list, (3)
the item was independent of all (observed and unobserved)
respondent characteristics. Given the focus of our study, we
also wanted an item that was not sensitive, to avoid mixing
strategic and non-strategic respondent error. We constructed
an item of interest that met these criteria by randomly
selecting one item from a separate set of items for each
individual respondent in the treatment group. This set con-
tained items regarding one’s zodiac animal, for example, “I
was born in the year of the Dog or in the year of the Pig.” In
China, respondents’ knowledge of their zodiac animal is
safe to assume. Each given year is associated with one ani-
mal of which there are 12 in total. The specific animal com-
bination presented to each respondent was randomly drawn
with uniform probability from a set of 6 different combina-
tions (including all 12 animals) and piped into the treatment
list (see Appendix B Table 2 for the 6 zodiac statements).
Hence, agreement with the proposed item (
τ
) was exactly
1/6 in expectation for each respondent by construction.
Since it is was randomly determined whether any specific
respondent would agree with the item, it also had property
(2) and (3).
The design allows us to estimate a known quantity
(
τ
=1/6) using the conventional control list, the placebo
list, and the mixed control list design. At this prevalence,
we expect the conventional control list design to be the
most biased (estimate pulled toward 0.5), the placebo
design to be moderately biased (estimate pulled toward 0),
and the mixed design to yield the smallest absolute bias
(estimate pulled toward 0.25). This follows from equation
(4), (6), and (7) (see Table 1 for a quick overview). Since
this is an application where we are simply trying to estimate
the prevalence of the item of interest, our recommendations
suggest that we should choose the mixed control list design
since this is expected to minimize the absolute bias, assum-
ing that we set
τ
* to approximately 1/ 6.
How to choose good placebo items: the piped-in
approach
Designing a good placebo item is not as easy as it may
first appear. Ideally, a placebo item should be plausible
for all respondents, yet by design necessarily false for any
one given respondent. We propose a novel design for
assigning a placebo item that can be implemented in any
programmed survey, such as web-administrated or tablet-
administrated surveys, where it is possible to pipe in an
item utilizing information gained earlier in the survey or
collected beforehand. This can be done in a number of dif-
ferent ways. In our application we gave survey respond-
ents who indicated earlier in the survey that they were
below 30 years of age the statement “I was born in the
70s,” and respondents who indicated that they were 30 or
above the placebo statement “I was born in the 2000s.”
Depending on the other items on the list, it may for exam-
ple fit better to construct a contextually adjusted placebo
item such as, “I was born after the September 11 attacks”
for the US, or another well-known national event, by
using prior data on respondents’ date of birth. Theoretically
the pipe-in approach has one clear benefit vis-á-vis exist-
ing approaches. For example, in the Singaporean setting
Riambau and Ostwald (2020) use “I have been invited to
have dinner with PM Lee at Sri Temasek [the Prime
Minister of Singapore’s residence] next week,” which
they suggest is “plausible but false” for all respondents.
The authors caution against using ridiculous items (such
as having the ability to teleport oneself) so as not to risk
compromising the perceived seriousness of the survey.
We take this one step further and suggest that there is a
benefit to having a placebo item that is truly plausible to
signal seriousness. Using an item that is necessary true or
necessary false due to implausibility, even if not impossi-
ble, risks signaling to the respondent that their responses
are not important or valuable to the researchers, which
could result in lower attentiveness.
Results
Figure 2 shows the estimated prevalence of the item of
interest using the DiM estimator with the conventional
6 Research and Politics
control group (point with solid line); the placebo control
group (triangle with short dotted line); and the mixed con-
trol group (square with long dotted line). The vertical dot-
ted line notes the true prevalence of the item being estimated
(
τ
=1/6). Notably, the true prevalence falls outside of the
associated confidence intervals when using both the con-
ventional and the placebo control group (see Appendix B,
Table 4 for additional details). While the former results in
an overestimation of the item by 7.6 percentage points, the
latter results in an underestimation of the true prevalence
by 7.9 percentage points. In this particular empirical appli-
cation, it is clear that including a placebo item on the con-
trol list is no remedy, as both the conventional and placebo
approaches results in similarly and severely biased esti-
mates. It should be noted that in an application where the
true prevalence of the item of interest is less than one quar-
ter (
τ
<1
4), we expect the placebo approach to yield a
smaller bias than the conventional approach (see equation
7). The observed discrepancy is likely due to standard sam-
pling variability, which is known to be large in list experi-
ments (Blair et al., 2020), and we can conclude that neither
approach is particularly precise. The mixed design, where
respondents in the control group receive (in this application
were sampled from) the conventional list or the placebo list
with equal probability, yields an estimate shy of the true
prevalence by just 0.3 percentage points.
Conclusions
Respondents’ lack of attention is a core threat to ob-
taining unbiased estimates through list experiments.
Unfortunately, this threat is not easily averted by modify-
ing the length of the control list. The inclusion of placebo
items in list experiments should hence not in general be
regarded as costless and is not a universal solution to the
problem of non-strategic respond errors. Rather, by mod-
eling the error process, we show that different control list
designs are associated with different kinds of bias. We
focus on two control list designs from previous research,
the conventional and the placebo design, and introduce a
third, the mixed design.
Our first recommendation for researchers is therefore
to focus on eliminating
s
– the share of inattentive
respondents. This is the only reliable way of minimizing
non-strategic respond error. Reducing the share of inat-
tentive respondents is arguably more important in list
experiments than in many other designs: The resulting
measurement error not only results in noisier estimates,
but is also associated with specific forms of bias, as
shown in this paper. The share of inattentives in the sam-
ple can be minimized by excluding respondents who fail
instructional manipulation checks or similar control
questions (Berinsky et al., 2014), or by trying to increase
the effort and attention among respondents in the study
(see Clifford and Jerit (2015)). However, all these meth-
ods involve trade-offs. Excluding some respondents
might, for instance, decrease the representativeness of
the sample if attentiveness is correlated with certain
respondent characteristics (Alvarez et al., 2019; Berinsky
et al., 2014). How to best reduce
s
in the context of the
list experiment is clearly an important area for future
research.
Our second recommendation is to encourage research-
ers to think carefully about the specific bias that inatten-
tive respondents might cause in a given study and adjust
their control list design based on this. The share of inat-
tentive respondents is often large (Alvarez et al., 2019;
Berinsky et al., 2014) and it is unlikely that researchers
will be able to eliminate
s
entirely (
s
is of course in gen-
eral not known). We argue that a major problem in previ-
ous studies is that the expected bias often works in favor
of the researcher’s hypothesis. When estimating the prev-
alence of phenomena that are relatively uncommon (for
instance vote buying), the list experiment might provide
evidence of “underreporting” when compared with a
direct question simply because the list estimate is pulled
toward 0.5 by inattentive respondents under the conven-
tional design.
We advocate the following approach for choosing and
designing control lists. First, the researcher should pro-
vide a best guess regarding the true prevalence rate of the
item of interest (
τ
*). This guess should be guided by pre-
vious research or, potentially, by a pilot study estimating
agreement with the direct question. Second, the researcher
should decide on a control list design (conventional, pla-
cebo, or mixed) based on
τ
* and the specific goal of the
study. When the goal is to simply estimate the prevalence
of the item of interest, the researcher should choose the
design that minimizes the expected absolute bias. This is
done by plugging
τ
* into equation (4), (6), and (7), and
taking the absolute value. When the goal is to test for the
existence of hypothesized under- or overreporting, the
researcher should choose a control list design according to
Conventional
N = 3284
Placebo
N= 3310
Mixed
N= 3317
0.1 0.
20
.3
Estimated prevalence
Figure 2. DiM estimates of the item of interest using
conventional, placebo, and mixed control groups. The dotted
line at
τ
=1/6 marks the true prevalence of the item being
estimated.
Agerberg and Tannenberg 7
Table 1. This way, the bias caused by inattentive respond-
ents will (in most cases) result in a conservative estimate
that works against the tested hypothesis. There are oppor-
tunities for improvement: Using data from a Blair et al.’s
(2020) meta analysis, we show that a majority of existing
studies using list experiments likely would have benefited
from choosing a different control list design.
Our third recommendation is for researchers to always
(when possible) use our proposed piped-in approach to
construct plausible placebo items, utilizing information
collected previously. We view this as a distinct improve-
ment over previous methods that is flexible and avoids
undermining the credibility of the experiment by including
implausible items.
We provide an empirical illustration in which we
fielded a list experiment with known prevalence rate, and
where one-third of respondents received the treatment list,
another a conventional control list, and the remaining
third a placebo list. We also constructed a mixed list from
the two other control lists. The empirical study confirms
many of our theoretical results: We find that the preva-
lence is overestimated when we use the conventional con-
trol list, but underestimated when using the placebo list.
In line with our recommendations applied to the specific
case, we find that the mixed list yields the estimate with
lowest bias.
As our paper shows, there is no simple solution to the
issue of non-strategic respondent error in list experiments
without decreasing the share of inattentive respondents in
the sample. However, by thinking more carefully about the
expected bias in a specific application, researchers can sub-
stantially mitigate the problem.
Acknowledgements
We thank Adam Glynn, Jacob Sohlberg, and Kyle Marquardt for
their helpful comments at the early stage of this project. We would
also like to thank Eddie Yang and Shane Xuan for generous help
with fine-tuning the Chinese question wording. Finally, we thank
the editor at R&P and the anonymous reviewers whose comments
and insightful input helped us in making the paper stronger.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect
to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed the following financial support for the
research, authorship, and/or publication of this article: We are
grateful to Helge Ax:son Johnssons Stiftelse and Lundgrens
Vetenskapsfond for financial support for data collection.
ORCID iDs
Mattias Agerberg https://orcid.org/0000-0001-7813-6109
Marcus Tannenberg https://orcid.org/0000-0003-0077-4711
Supplementary materials
The supplementary files are available at http://journals.sagepub.
com/doi/suppl/10.1177/20531680211013154.
Notes
1. Riambau and Ostwald (2020) refer to this as a “placebo state-
ment.” We define a placebo item as an item that is added to
the control list in addition to the regular items but that is nec-
essarily false for each respondent (the expected prevalence
of the item is 0).
2. We use “inattentive” in a broad sense to also include response
patterns often described as “satisficing” (Krosnick, 1991).
3. In Appendix A we show that the consequences of adding a
placebo item are identical under at least two other plausi-
ble error models: a binomial model where each respondent
agrees with each item with p= 0.5 and a model where inat-
tentive respondents select the middle response. This also
implies that any weighted average of these three error models
will exhibit the same pattern of bias.
4. As long as
τ
is between 0 and 0.5, the researcher can choose
to allocate respondents “optimally” (in the sense of minimiz-
ing absolute bias) between the two control groups by simply
solving p*
2=
τ
for p*, where
τ
is set by the researcher
(see equation (7) and the discussion below). However, since
this type of unequal randomization is much trickier to imple-
ment on most platforms, we focus on the equal probability
scenario.
5. Assuming each study’s estimated prevalence is our best
guess for
τ
*.
Carnegie Corporation of New York Grant
This publication was made possible (in part) by a grant from the
Carnegie Corporation of New York. The statements made and
views expressed are solely the responsibility of the author
References
Agerberg M (2020) Corrupted estimates? Response bias in citizen
surveys on corruption. Political Behavior 1–26.
Ahlquist JS (2018) List experiment design, non-strategic respond-
ent error, and item count technique estimators. Political
Analysis 26(1): 34–53.
Ahlquist JS, Mayer KR and Jackman S (2014) Alien abduction
and voter impersonation in the 2012 U.S. general elec-
tion: Evidence from a survey list experiment. Election Law
Journal 13(4): 46–475.
Alvarez RM, et al. (2019) Paying attention to inattentive survey
respondents. Political Analysis 27(2): 1–18.
Berinsky AJ, Margolis MF and Sances MW (2014) Separating
the shirkers from the workers? Making sure respondents pay
attention on self-administered surveys. American Journal of
Political Science 58(3): 739–753.
Blair G and Imai K (2012) Statistical analysis of list experiments.
Political Analysis 20(1): 47–77.
Blair G, Chou W and Imai K (2019) List experiments with meas-
urement error. Political Analysis 27(4): 455–480.
Blair G, Coppock A and Moor M (2020) When to worry about
sensitivity bias: A social reference theory and evidence from
8 Research and Politics
30 years of list experiments. American Political Science
Review 114(4): 1297–1315.
Clifford S and Jerit J (2015) Do attempts to improve respondent
attention increase social desirability bias? Public Opinion
Quarterly 79(3): 790–802.
Corstange D (2018) Clientelism in competitive and uncompetitive
elections. Comparative Political Studies 51(1): 76–104.
De Jonge CPK and Nickerson DW (2014) Artificial inflation or
deflation? Assessing the item count technique in compara-
tive surveys. Political Behavior 36(3): 659–682.
Frye T, et al. (2017) Is Putin’s popularity real? Post-Soviet Affairs
33(1): 1–15.
Glynn AN (2013) What can we learn with statistical truth serum?
Design and analysis of the list experiment. Public Opinion
Quarterly 77(S1): 159–172.
Gonzalez-Ocantos E, et al. (2012) Vote buying and social
desirability bias: Experimental evidence from Nicaragua.
American Journal of Political Science 56(1): 202–217.
Holbrook AL and Krosnick JA (2010) Social desirability bias in
voter turnout reports. Public Opinion Quarterly 74(1): 37–67.
Kramon E and Weghorst K (2019) (Mis)Measuring sensitive attitudes
with the list experiment: Solutions to list experiment breakdown
in Kenya. Public Opinion Quarterly 83(S1): 236–263.
Krosnick JA (1991) Response strategies for coping with the cog-
nitive demands of attitude measures in surveys. Applied
Cognitive Psychology 5(3): 213–236.
Riambau G and Ostwald K (2020) Placebo statements in list exper-
iments: Evidence from a face-to-face survey in Singapore.
Political Science Research and Methods: 1–8.
Robinson D and Tannenberg M (2019) Self-censorship of
regime support in authoritarian states: Evidence from
list experiments in China. Research & Politics 6(3). doi:
10.1177/2053168019856449.
Tang W (2016) Populist Authoritarianism: Chinese political cul-
ture and regime sustainability. Oxford: Oxford University
Press.
... Additionally, we addressed concerns about inattentive respondents in online surveys (Kees et al., 2017), which can be particularly problematic for list experiments. Inattentive respondents in list experiments can choose the middle number of the list (3 out of 5 list items), which differs between treatment and control groups, generating measurement errors (Ahlquist, 2018;Agerberg and Tannenberg, 2021). To mitigate this issue, we conducted pre-experiment attention checks to exclude inattentive respondents. ...
Article
Discrimination is one of the largest barriers that immigrants and racial/ethnic minorities face in contemporary society. Social scientists have developed and applied field experimental methods to detect the existence and prevalence of discrimination in various domains. In addition, researchers have utilized questionnaires to directly ask discrimination victims about their experiences and the frequency of discrimination they encounter. However, self-reports of discrimination may be biased due to judgment errors in attributing mistreatment to discrimination and intentional overreporting (vigilance) or underreporting (minimization) of discrimination. In this study, we propose a two-stage model that distinguishes between these judgment and reporting biases. We argue that vigilance and minimization stem from sensitivity concerns. We conducted a list experiment with African American respondents who asked about their experiences of employment and everyday discrimination. Comparing the list experiment and direct question estimates, we find no evidence of systematic underreporting or overreporting of employment discrimination. For everyday discrimination, we find overreporting concentrated among ideologically liberal African Americans. These results provide new insights into biases in self-reported discrimination and suggest researchers should be attentive to the conditions under which these biases arise.
... The same analysis was performed after filtering the sample for attentive respondents, that is, those who correctly answered three attention questions (see Agerberg and Tannenberg, 2021, for a study on measurement errors and inattentive respondents). While this reduced the sample by nearly 40 %, there were no statistically significant differences in sociodemographic characteristics between both samples. ...
Article
Scholarly research has consistently shown that teachers present negative assessments of and attitudes toward migrant students. However, previous studies have not clearly addressed the distinction between implicit and explicit prejudices, or identified their underlying sources. This study identifies the explicit and implicit prejudices held by elementary and middle school teachers regarding the learning abilities of an ethnic minority group: Haitian students within the Chilean educational system. We use a list experiment to assess how social desirability and intergroup attitudes toward minority students influence teachers' prejudices. The findings reveal that teachers harbor implicit prejudices towards Haitian students and are truthful in reporting their attitudes, thereby contradicting the desirability bias hypothesis. We suggest that teachers rely on stereotypes associated with the students' nationality when assessing Haitian students’ learning abilities. The implications of these results are discussed in relation to theories grounded in stereotypes and intergroup attitudes.
... Despite the scholarly belief that list experiments address the problem of strategic misreporting, some researchers have recently raised concerns that list experiments are vulnerable to non-strategic misreporting (Agerberg and Tannenberg 2021;Ahlquist 2018;Kuhn and Vivyan 2022;Riambau and Ostwald 2021). To borrow Riambau and Ostwald's (2021) explanation, while "[s]trategic errors arise when respondents lie to conceal their position on the sensitive issue" (172), "[n]on-strategic error includes such things as coding errors and poor quality responses that arise when respondents do not understand or rush through the list experiment" (173). ...
Article
Full-text available
The incumbent-led subversion of democracy represents the most prevalent form of democratic backsliding in recent decades. A central puzzle in this mode of backsliding is why these incumbents enjoy popular support despite their actions against democracy. We address this puzzle using the case of Philippine President Rodrigo Duterte. Although some Philippine analysts have speculated that his popularity was inflated due to social desirability bias (SDB) among survey respondents, there has been limited empirical examination. Our pre-registered list experiment surveys conducted in February/March 2021 detected SBD-induced overreporting at about 39 percentage points in face-to-face surveys and 28 percentage points in online surveys. We also found that the poor Mindanaoans, and those who believed their neighbors supported Duterte, were more likely to respond according to SDB. These possibly counter-intuitive results should be interpreted with caution because the survey was conducted during the height of the COVID-19 lockdown, and the findings cannot necessarily be extrapolated to the other period of his presidency. Nevertheless, this study suggests that preference falsification could be an alternative explanation for the puzzle of popular incumbents in democratic backsliding.
... Finally, inattentive participants can lead to high levels of noise in estimates, which is particularly problematic for list experiments due to the nature of the analysis required to identify social desirability bias. These inattentive participants should be excluded where doing so does not affect the representativeness of the sample (Agerberg and Tannenberg, 2021). ...
Technical Report
Full-text available
Despite the right of disabled people to full social and economic inclusion, many face multiple day-to-day and systemic challenges. These include but are not limited to additional expenses, access to housing, and everyday accessibility difficulties. Surveys show the general public hold positive attitudes towards policies that seek to enable disabled people to overcome these challenges, but standard survey methods are susceptible to response biases that may overestimate this support. This study aimed to test whether two such biases influence support for disability policy in Ireland: social desirability bias (i.e. the tendency for survey respondents to alter their responses in order to present themselves in a positive light); and inattention to the implications of policy support (e.g. that welfare policies require funding). Together the survey experiments covered a range of policy issues and types of disability, as identified in previous research and in consultation with the disability advisory group for the project. A nationally representative sample of 2,000 adults took part in the online study. One stage of the study used list experiments to test for social desirability bias in responses to three issues: (1) support for increased social welfare for disabled people, (2) support for prioritising disabled people for social housing and (3) how many people admit to parking in a disabled parking space without a permit. In each list experiment, participants were assigned at random to one of two groups. One (‘control’) group was presented with a list of items unrelated to the topic of interest (in this case, disability policy) and asked how many they agree with. The other (‘treatment’) group was presented with the same list but with the addition of an item about the topic of interest. Any difference in the average response between the groups can be attributed to the added item and gives an indication of support for that item when participants are provided full anonymity (because they are never asked directly about their support for that item). Allowing participants to respond anonymously minimises the influence of the desire to be viewed positively by others on responses. Another stage of the study tested the influence of question detail on policy support. The policies in this stage related to (1) increased cost of living support for disabled people, (2) support for children with disabilities and (3) support for building wheelchair accessible infrastructure. Participants were randomly allocated to a group that was asked for support for a policy without any specified funding mechanism, or to a group that was asked about support for the same policy but with the funding mechanism specified, for example that the policy would be funded through a budget reallocation or a tax increase. The study shows that while the majority of people in Ireland support most policies that aim to enable disabled people to participate fully in society, standard surveys are likely to lead to inaccurate estimates of support. Approximately one-in-seven people are estimated to express support for some policies when asked directly but not when allowed to respond anonymously, with a similar change in support when funding mechanisms or policy trade-offs are made explicit. Support is stronger among those more familiar with disability issues, although further research is required to understand why. If those familiar with disability simply better understand the challenges associated with disability, this implies that enhancing public understanding of the challenges and costs of disability would strengthen support. If it is because they know someone who will directly benefit from the policy, further research on how people understand and recognise disability among people in their social networks may help. Complementing standard surveys with reliable experimental methods is recommended to avoid misperceptions of support for disabled people and to identify where potentially negative attitudes may need to be challenged.
Article
Full-text available
Eliciting honest answers to sensitive questions is frustrated if subjects withhold the truth for fear that others will judge or punish them. The resulting bias is commonly referred to as social desirability bias, a subset of what we label sensitivity bias. We make three contributions. First, we propose a social reference theory of sensitivity bias to structure expectations about survey responses on sensitive topics. Second, we explore the bias-variance trade-off inherent in the choice between direct and indirect measurement technologies. Third, to estimate the extent of sensitivity bias, we meta-analyze the set of published and unpublished list experiments (a.k.a., the item count technique) conducted to date and compare the results with direct questions. We find that sensitivity biases are typically smaller than 10 percentage points and in some domains are approximately zero.
Article
Full-text available
Measuring corruption has become a global industry. An important and commonly used data source are several large-scale multi-country projects that survey citizens directly about their perceptions and experiences of corruption. Such indicators are regularly used by political scientists to test theories on political attitudes and behavior. However, we still know little about the quality of many of these measures. This paper deploys a large survey with two embedded experiments to investigate two potential sources of bias in indicators based on citizens’ perceptions and experiences of corruption, stemming from political bias and sensitivity bias. First, I draw upon research on economic perceptions and argue that respondents are likely to respond in a political manner when asked how they perceive the level of corruption in their country. I test this argument by experimentally priming respondents’ political affiliations before asking for their perception of corruption. Second, I argue that standard questions probing peoples’ corruption experiences are likely to be subject to sensitivity bias. I test this second argument by constructing a list experiment. Overall, the results show strong and predictable sources of response bias that also vary significantly between important subgroups. I discuss implications for researchers and practitioners.
Article
Full-text available
The study of popular support for authoritarian regimes has long relied on the assumption that respondents provide truthful answers to surveys. However, when measuring regime support in closed political systems there is a distinct risk that individuals are less than forthright due to fear that their opinions may be made known to the public or the authorities. In order to test this assumption, we conducted a novel web-based survey in China in which we included four list experiments of commonly used items in the comparative literature on regime support. We find systematic bias for all four measures; substantially more individuals state that they support the regime with direct questioning than when presented with our indirect list experiments. The level of self-censorship, which ranges from 24.5 to 26.5 percentage points, is considerably higher than previously thought. Self-censorship is further most prevalent among the wealthy, urban, female and younger respondents.
Article
List experiments (LEs) are an increasingly popular survey research tool for measuring sensitive attitudes and behaviors. However, there is evidence that list experiments sometimes produce unreasonable estimates. Why do list experiments “fail,” and how can the performance of the list experiment be improved? Using evidence from Kenya, we hypothesize that the length and complexity of the LE format make them costlier for respondents to complete and thus prone to comprehension and reporting errors. First, we show that list experiments encounter difficulties with simple, nonsensitive lists about food consumption and daily activities: over 40 percent of respondents provide inconsistent responses between list experiment and direct question formats. These errors are concentrated among less numerate and less educated respondents, offering evidence that the errors are driven by the complexity and difficulty of list experiments. Second, we examine list experiments measuring attitudes about political violence. The standard list experiment reveals lower rates of support for political violence compared to simply asking directly about this sensitive attitude, which we interpret as list experiment breakdown. We evaluate two modifications to the list experiment designed to reduce its complexity: private tabulation and cartoon visual aids. Both modifications greatly enhance list experiment performance, especially among respondent subgroups where the standard procedure is most problematic. The paper makes two key contributions: (1) showing that techniques such as the list experiment, which have promise for reducing response bias, can introduce different forms of error associated with question complexity and difficulty; and (2) demonstrating the effectiveness of easy-to-implement solutions to the problem.
Article
List experiments are a widely used survey technique for estimating the prevalence of socially sensitive attitudes or behaviors. Their design, however, makes them vulnerable to bias: because treatment group respondents see a greater number of items ( J + 1) than control group respondents ( J ), the treatment group mean may be mechanically inflated due simply to the greater number of items. The few previous studies that directly examine this do not arrive at definitive conclusions. We find clear evidence of inflation in an original dataset, though only among respondents with low educational attainment. Furthermore, we use available data from previous studies and find similar heterogeneous patterns. The evidence of heterogeneous effects has implications for the interpretation of previous research using list experiments, especially in developing world contexts. We recommend a simple solution: using a necessarily false placebo statement for the control group equalizes list lengths, thereby protecting against mechanical inflation without imposing costs or altering interpretations.
Article
Measurement error threatens the validity of survey research, especially when studying sensitive questions. Although list experiments can help discourage deliberate misreporting, they may also suffer from nonstrategic measurement error due to flawed implementation and respondents’ inattention. Such error runs against the assumptions of the standard maximum likelihood regression ( MLreg ) estimator for list experiments and can result in misleading inferences, especially when the underlying sensitive trait is rare. We address this problem by providing new tools for diagnosing and mitigating measurement error in list experiments. First, we demonstrate that the nonlinear least squares regression ( NLSreg ) estimator proposed in Imai (2011) is robust to nonstrategic measurement error. Second, we offer a general model misspecification test to gauge the divergence of the MLreg and NLSreg estimates. Third, we show how to model measurement error directly, proposing new estimators that preserve the statistical efficiency of MLreg while improving robustness. Last, we revisit empirical studies shown to exhibit nonstrategic measurement error, and demonstrate that our tools readily diagnose and mitigate the bias. We conclude this article with a number of practical recommendations for applied researchers. The proposed methods are implemented through an open-source software package.
Article
Does attentiveness matter in survey responses? Do more attentive survey participants give higher quality responses? Using data from a recent online survey that identified inattentive respondents using instructed-response items, we demonstrate that ignoring attentiveness provides a biased portrait of the distribution of critical political attitudes and behavior. We show that this bias occurs in the context of both typical closed-ended questions and in list experiments. Inattentive respondents are common and are more prevalent among the young and less educated. Those who do not pass the trap questions interact with the survey instrument in distinctive ways: they take less time to respond; are more likely to report nonattitudes; and display lower consistency in their reported choices. Inattentiveness does not occur completely at random and failing to properly account for it may lead to inaccurate estimates of the prevalence of key political attitudes and behaviors, of both sensitive and more prosaic nature.
Article
The item count technique (ICT-MLE) regression model for survey list experiments depends on assumptions about responses at the extremes (choosing no or all items on the list). Existing list experiment best practices aim to minimize strategic misrepresentation in ways that virtually guarantee that a tiny number of respondents appear in the extrema. Under such conditions both the “no liars” identification assumption and the computational strategy used to estimate the ICT-MLE become difficult to sustain. I report the results of Monte Carlo experiments examining the sensitivity of the ICT-MLE and simple difference-in-means estimators to survey design choices and small amounts of non-strategic respondent error. I show that, compared to the difference in means, the performance of the ICT-MLE depends on list design. Both estimators are sensitive to measurement error, but the problems are more severe for the ICT-MLE as a direct consequence of the no liars assumption. These problems become extreme as the number of treatment-group respondents choosing all the items on the list decreases. I document that such problems can arise in real-world applications, provide guidance for applied work, and suggest directions for further research.