ArticlePDF Available

Measuring Subgroup Preferences in Conjoint Experiments

Article

Measuring Subgroup Preferences in Conjoint Experiments

Abstract and Figures

Conjoint analysis is a common tool for studying political preferences. The method disentangles patterns in respondents' favorability toward complex, multidimensional objects, such as candidates or policies. Most conjoints rely upon a fully randomized design to generate average marginal component effects (AMCEs). These measure the degree to which a given value of a conjoint profile feature increases, or decreases, respondents' support for the overall profile relative to a baseline, averaging across all respondents and other features. While the AMCE has a clear causal interpretation (about the effect of features), most published conjoint analyses also use AMCEs to describe levels of favorability. This often means comparing AM-CEs among respondent subgroups. We show that using conditional AMCEs to describe the degree of subgroup agreement can be misleading as regression interactions are sensitive to the reference category used in the analysis. This leads to inferences about subgroup differences in preferences that have arbitrary sign, size, and significance. We demonstrate the problem using examples drawn from published articles and provide suggestions for improved reporting and interpretation using marginal means and an omnibus F-test. Given the accelerating use of these designs in political science, we offer advice for best practice in analysis and presentation of results.
Content may be subject to copyright.
Measuring Subgroup Preferences in Conjoint
Experiments
Thomas J. Leeper, Sara B. Hobolt, and James Tilley
May 24, 2019
Abstract
Conjoint analysis is a common tool for studying political preferences. The method disen-
tangles patterns in respondents’ favorability toward complex, multidimensional objects, such
as candidates or policies. Most conjoints rely upon a fully randomized design to generate
average marginal component effects (AMCEs). These measure the degree to which a given
value of a conjoint profile feature increases, or decreases, respondents’ support for the overall
profile relative to a baseline, averaging across all respondents and other features. While the
AMCE has a clear causal interpretation (about the effect of features), most published conjoint
analyses also use AMCEs to describe levels of favorability. This often means comparing AM-
CEs among respondent subgroups. We show that using conditional AMCEs to describe the
degree of subgroup agreement can be misleading as regression interactions are sensitive to the
reference category used in the analysis. This leads to inferences about subgroup differences in
preferences that have arbitrary sign, size, and significance. We demonstrate the problem us-
ing examples drawn from published articles and provide suggestions for improved reporting
and interpretation using marginal means and an omnibus F-test. Given the accelerating use of
these designs in political science, we offer advice for best practice in analysis and presentation
of results.
We thank Benjamin Lauderdale, Jamie Druckman, Yusaku Horiuchi, the editor, and anonymous
reviewers for feedback on this manuscript. Replication data and code for this article are available from
the Political Analysis Dataverse: https://doi.org/10.7910/DVN/ARHZU4. This work was funded, in
part, by the United Kingdom Economic and Social Research Council (Grant ES/R000573/1).
1
One aspect of the dramatic increase in the use of experiments within political sci-
ence (Druckman et al., 2006; Mutz, 2011) is the establishment of conjoint experimen-
tal designs as a prominent methodological tool. While survey experiments have tra-
ditionally examined just one or two factors that might shape outcomes (see, for re-
views, Gaines, Kuklinski, and Quirk, 2007; Sniderman, 2011), conjoint designs allow
researchers to study the independent effects on preferences of many features of com-
plex, multidimensional objects. These include many different types of phenomena,
such as political candidates (Campbell et al., 2016; Teele, Kalla, and Rosenbluth, 2018),
immigrant admissions (Hainmueller and Hopkins, 2015; Bansak, Hainmueller, and
Hangartner, 2016; Wright, Levy, and Citrin, 2016), and public policies (Gallego and
Marx, 2017; Hankinson, 2018). Factorial designs of this sort have a long history, but the
driving force behind this use of conjoint analysis has been the introduction by Hain-
mueller, Hopkins, and Yamamoto (2014) of a small-sample, fully randomized conjoint
design. The associated analytic approach emphasizes a single quantity of interest:
the average marginal component effect (AMCE). By capturing the multidimensional-
ity of target objects, the randomized conjoint design breaks any explicit, or implicit,
confounding between features of these objects. This gives the AMCE a clear causal
interpretation: the degree to which a given value of a feature increases, or decreases,
respondents’ favorability towards a packaged conjoint profile relative to a baseline.
While randomization of profile features gives the AMCE a causal interpretation,
most published conjoint analyses in political science use AMCEs not only for causal
purposes (interpreting AMCEs as effect sizes), but also for descriptive purposes. The
aim is to map levels of favorability toward a multidimensional object across its var-
ious features.1In this sense, conjoints are often applied like list experiments, using
randomization to measure a sample’s preferences over something difficult to measure
with direct questioning. A positive AMCE for a given feature can be read as a descrip-
tive measure of high favorability towards profiles with that feature. The quantity is
causal, but it is often read descriptively.
1See Shmueli (2010) for an elaboration on the distinctions between explanatory (causal) modelling,
descriptive modelling, and predictive modelling.
2
This is particularly the case for subgroup analyses of conjoint experiments. Such
exercises are an increasingly common feature of experimental analysis (Green and
Kern, 2012; Ratkovic and Tingley, 2017; Grimmer, Messing, and Westwood, 2017; Egami
and Imai, 2018). For example, the Hainmueller, Hopkins, and Yamamoto (2014) study
of immigration attitudes splits the sample in two using a measure of ethnocentrism
and then compares AMCEs for the two subgroups. Similarly, Bansak, Hainmueller,
and Hangartner (2016) compare preferences toward immigrants across number of bi-
nary respondent characteristics: age, education, left-right ideology, and income. Other
examples abound. Ballard-Rosa, Martin, and Scheve (2016) compare preferences over
tax policies across a number of subgroups defined by demographics and political ori-
entations; Bechtel and Scheve (2013) compare AMCEs on climate agreements across
four different countries, and across subgroups of respondents; and Teele, Kalla, and
Rosenbluth (2018) compare AMCEs for features of male and female political candi-
dates among male and female respondents. Most of these comparisons are visual or
informal. But some involve explicit estimation of the subgroup difference, such as
when Kirkland and Coppock (2017) compare conditional AMCEs across hypothetical
partisan and nonpartisan elections. Interpretation of subgroup AMCEs thus involves
an implied quantity of interest: the difference between two conditional AMCEs.
What is not necessarily obvious in such analyses is that differences-in-preferences
(that is to say, the difference in degree of favorability toward profiles containing a
given feature) are not directly reflected in subgroup differences-in-AMCEs. A differ-
ence in effect sizes is distinct from a difference in preferences. We show that a dif-
ference in two (or more) subgroups’ favorability toward a conjoint feature — like a
difference in willingness to support a particular type of immigrant between high and
low ethnocentrism respondents — is only rarely reflected in the difference-in-AMCEs.
In fact, no information about the similarity of the subgroups’ preferences is provided
by comparisons of subgroup AMCEs, yet such comparisons are commonly made in
practice.
As we will show, where preferences in subgroups toward the experimental ref-
3
erence category are similar, the difference-in-AMCEs conveys preferences reasonably
well. The problem occurs when preferences between subgroups diverge in the refer-
ence category. Here, the difference-in-AMCEs is a misleading representation of un-
derlying patterns of favorability. Given most published conjoint studies report re-
sults based upon reference categories chosen for substantive reasons about the nature
or meaning of the levels rather than the configuration of preferences revealed in the
experiment, difference-in-AMCEs should not be assumed to be interpretable as dif-
ferences in subgroup preferences. The root of this error is likely familiar to many
researchers: it is simply a matter of regression specification for models involving inter-
actions between categorical regressors. Egami and Imai (2018), for example, provide
an extensive discussion of the implications of this property for interpreting causal in-
teractions between randomized features of conjoint profiles. The state of the published
literature would suggest the problem remains non-obvious when applied to descrip-
tive analysis of subgroups in conjoint designs.2
In what follows, we demonstrate the challenges of conjoint analysis and remind
readers of how reference category choice for profile features creates problems for com-
paring conditional AMCEs across respondent subgroups. We show how the use of
an arbitrary reference category means the size, direction, and statistical significance of
differences-in-AMCEs have little relationship to the underlying degree of favorability
of the subgroups toward profiles with particular features. Reference category choices
can make similar preferences look dissimilar and dissimilar preferences look similar.
We demonstrate this with examples drawn from the published political science litera-
ture (namely experiments by Hainmueller, Hopkins, and Yamamoto 2014; Bechtel and
Scheve 2013; Teele, Kalla, and Rosenbluth 2018). The paper then provides suggestions
for improved conjoint reporting and interpretation based around two quantities of in-
terest drawn from the factorial experimentation literature: (a) unadjusted marginal
means, a quantity measuring favorability toward a given feature, and (b) an omnibus
2Since this manuscript has been under review, we have been made aware of one working paper by
Clayton, Ferwerda, and Horiuchi (2018), on the topic of immigration preferences, that correctly notes
the need to address the arbitrary reference category in order to compare subgroup preferences.
4
F-test, measuring differences therein. Software for the R programming language to
support our findings — and that can be used to examine sensitivity of conjoint analysis
to reference category selection, calculate AMCEs and marginal means, perform sub-
group analyses, and test for subgroup differences in any conjoint experiment (Leeper,
2018) — is demonstrated throughout using example data (Leeper, Hobolt, and Tilley,
2019). We conclude with advice for best practices in the analysis and presentation of
conjoint results.
Quantities of Interest in Conjoint Experiments
Conjoint analysis serves two purposes. One is to assess causal effects. Another is
preference description.3In causal inference, fully randomized conjoints provide a de-
sign and analytic approach that allows researchers to understand the causal effect of a
given feature on overall support for a multidimensional object, averaging across other
features of the object included in the design. Such inferences can be thought of as state-
ments of the form: “shifting an immigrant’s country of origin from India to Poland
increases favorability by X percentage points.” In descriptive inference, conjoints pro-
vide information about both (a) the absolute favorability of respondents toward objects
with particular features or combinations of features, and (b) the relative favorability of
respondents toward an object with alternative combinations of features. Such infer-
ences can be thought of as statements of the form “Polish immigrants are preferred
by X% of respondents” or “Polish immigrants are more supported than Mexican im-
migrants, by X percentage points.” Thus both causal and descriptive interpretations
of conjoints are based upon the distribution of preferences across profile features and
differences in preferences across alternative feature combinations.
Analytically, a fully randomized conjoint design without constraints between pro-
file features is simply a full-factorial experiment (with some cells possibly, albeit ran-
3Here we use “preference” as Hainmueller, Hopkins, and Yamamoto (2014) do: that is, as a statement
of favorability or support for a profile, not the more narrow economic definition of a strict rank ordering
of objects by favorability.
5
domly, left unobserved). All quantities of interest relevant to the analysis of conjoint
designs therefore derive from combinations of cell means, marginal means, and the
grand mean, as in the traditional analysis of factorial experiments. In a forced choice
conjoint design, the grand mean is by definition 0.5 (i.e., 50% of all profiles shown are
chosen and 50% are not chosen). Cell means are the mean outcome for each particular
combination of feature levels. In the full-factorial design discussed by Hainmueller,
Hopkins, and Yamamoto (2014) and now widely used in political science, many or
perhaps most cell means are unobserved. For example, in their candidate choice ex-
periment, there are 2 6662666=186, 624 cell means, but only 3,466
observations. About 98% of cell means are unobserved. While this would be prob-
lematic for attempting to infer pairwise comparisons between cells, conjoint analysts
mostly focus on the marginal effects of each feature rather than more complex interac-
tions. Appendix A provides detailed notation and elaborations of these definitions of
quantities of interest.
In fully randomized designs, the average marginal component effects (AMCEs)
are simply marginal effects of changing one feature level to another, all else constant.
AMCEs therefore depend only upon marginal means: that is the column and row mean
outcomes for each feature level averaging across all other features. A marginal mean
describes the level of favorability toward profiles that have a particular feature level,
ignoring all other features. For example, in the common forced-choice design with two
alternatives, marginal means have a direct interpretation as probabilities. A marginal
mean of 0 indicates respondents select profiles with that feature level with probability
P(Y=1|X=x) = 0. While a marginal mean of 1 indicates respondents select profiles
with that feature level with probability P(Y=1|X=x) = 1, where Yis a binary
outcome and Xis a vector of profile features.4With rating scale outcomes, marginal
4It is not possible for the marginal mean to equal zero or one if pairs of profiles shown together are
allowed to have the same level of a given feature (for example, both immigrants are from Germany). In-
stead, the marginal mean can range from the probability of co-occurrence to 1 minus that probability. If
there are five levels of a feature, each shown with equal probability, then the probability of co-occurrence
is 1
51
5=0.04 such that the marginal mean can take values in the range (0.04, 0.96). If the design is
constrained so that features cannot be the same for both immigrants, then the marginal means fully
range from zero to one. This constraint on the range of the marginal means also constrains the range of
AMCEs. Notably, many conjoints provide features with only two levels, such as the male-versus-female
6
Figure 1: Replication of Hainmueller et al. (2014) Candidate Experiment using AMCEs
and MMs
0.09 (0.02)
−0.04 (0.03)
−0.02 (0.03)
−0.01 (0.03)
−0.12 (0.03)
−0.14 (0.03)
0.14 (0.03)
0.15 (0.03)
0.19 (0.03)
0.18 (0.03)
0.27 (0.03)
−0.02 (0.03)
−0.02 (0.03)
−0.04 (0.03)
−0.09 (0.03)
−0.23 (0.03)
0.02 (0.03)
0.06 (0.03)
0.03 (0.03)
0.07 (0.03)
0.01 (0.03)
0.02 (0.03)
0.03 (0.03)
−0.02 (0.03)
0.00 (0.03)
0.04 (0.03)
0.03 (0.03)
0.02 (0.03)
0.00 (0.03)
−0.06 (0.03)
−0.15 (0.03)
0.00 (0.02)
Male
Female
(Gender)
36
45
52
60
68
75
(Age)
White
Native American
Black
Hispanic
Caucasian
Asian American
(Race/Ethnicity)
32K
54K
65K
92K
210K
5.1M
(Income)
Business owner
Lawyer
Doctor
High school teacher
Farmer
Car dealer
(Profession)
No BA
Baptist college
Community college
State university
Small college
Ivy League university
(College)
None
Jewish
Catholic
Mainline protestant
Evangelical protestant
Mormon
(Religion)
Did Not Serve
Served
(Military Service)
−0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 0.4
Estimated AMCE
0.46 (0.01)
0.54 (0.01)
0.56 (0.02)
0.52 (0.02)
0.53 (0.02)
0.54 (0.02)
0.44 (0.02)
0.42 (0.02)
0.34 (0.02)
0.48 (0.02)
0.49 (0.02)
0.53 (0.02)
0.52 (0.02)
0.62 (0.02)
0.57 (0.02)
0.55 (0.02)
0.54 (0.02)
0.53 (0.02)
0.48 (0.02)
0.33 (0.02)
0.46 (0.02)
0.49 (0.02)
0.53 (0.02)
0.51 (0.02)
0.54 (0.02)
0.47 (0.02)
0.48 (0.02)
0.52 (0.02)
0.52 (0.02)
0.47 (0.02)
0.49 (0.02)
0.52 (0.02)
0.53 (0.02)
0.55 (0.02)
0.56 (0.02)
0.53 (0.02)
0.45 (0.02)
0.38 (0.02)
0.50 (0.01)
0.50 (0.01)
Male
Female
(Gender)
36
45
52
60
68
75
(Age)
White
Native American
Black
Hispanic
Caucasian
Asian American
(Race/Ethnicity)
32K
54K
65K
92K
210K
5.1M
(Income)
Business owner
Lawyer
Doctor
High school teacher
Farmer
Car dealer
(Profession)
No BA
Baptist college
Community college
State university
Small college
Ivy League university
(College)
None
Jewish
Catholic
Mainline protestant
Evangelical protestant
Mormon
(Religion)
Did Not Serve
Served
(Military Service)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Marginal Mean
means can vary arbitrarily along the outcome scale used.
Because levels of features are randomly assigned, pairwise differences between
two marginal means for a given feature (e.g., between candidates who are male versus
female) have a direct causal interpretation. For fully randomized designs, the AMCE
proposed by Hainmueller, Hopkins, and Yamamoto (2014) is equivalent to the aver-
age marginal effect of each feature level for a model where each feature is converted
into a matrix of indicator variables with one level left out as a reference category. This
is no different from any other regression context wherein one level of any categori-
cal variable must be omitted from the design matrix in order to avoid perfect multi-
candidate feature examined by Teele, Kalla, and Rosenbluth (2018) or Hainmueller, Hopkins, and Ya-
mamoto (2014) in their conjoints on candidate choice. In such cases, the probability of co-occurrence is
1
21
2=0.25 bounding the AMCE for female (as opposed to male) candidates to the range (0.5, 0.5)
if both candidates can have the same sex. Caution is therefore needed in comparing the relative size of
features with few levels to features with many levels given that effects have different bounds.
7
collinearity.5This close relationship between AMCEs and marginal means is visible in
Figure 1 which presents a replication of the AMCE-based analysis of the Hainmueller
et al. candidate experiment (left panel) and an analogous examination of the results
using marginal means (right panel). Note, in particular, how marginal means convey
information about the preferences of respondents for all feature levels while AMCEs
definitionally restrict the AMCE for the reference category to zero (or undefined). For
example, the AMCE for a candidate serving in the military is 0.09 (or a 9-percentage
point) increase in favorability, reflecting marginal means for serving and non-serving
candidates of 0.46 and 0.54, respectively. Similarly, the zero effect size for candidate
gender reflects identical marginal means for male and female candidates (0.50 in each
case). AMCEs in fully randomized designs are simply differences between marginal
means at each feature level and the marginal mean in the reference category, ignoring
other features.
The AMCE is often described as an estimate of the relative favorability of profiles
with counterfactual levels of a feature. For example, Teele, Kalla, and Rosenbluth
(2018) summarize their conjoint on public support “female candidates are favored
[over men] by 7.3 percentage points” (6). Similarly, Hainmueller, Hopkins, and Ya-
mamoto (2014) describe some of the results of conjoint on preferences toward political
candidates:
We also see a bias against Mormon candidates, whose estimated level of
support is 0.06 (SE = 0.03) lower when compared to a baseline candidate with
no stated religion. Support for Evangelical Protestants is also 0.04 percentage
points lower (SE = 0.02) than the baseline. (19)
These examples make clear that despite the causal inference potentially provided by
the AMCE, the quantity of interest is frequently used to provide a characterization of
a preferences that has a distinctly descriptive flavor about the relative levels of support
5In designs that entail constraints between profile features, the average marginal effect is a weighted
average of effects across each combination of the constrained features where the weights on the effects
are arbitrary but typically uniform. We ignore this distinction in the remainder of this article, as all of
our results apply equally to fully randomized and to constrained designs.
8
across profiles and also across subgroups of respondents. Indeed, this style of descrip-
tion is widespread in conjoint analyses. This use of conjoints to provide descriptive
inferences about patterns of preferences is important because AMCEs are defined as
relative quantities, requiring that patterns of preferences are expressed against a base-
line, reference category for each conjoint feature. A positive AMCE is read as higher fa-
vorability but it is only higher relative to whatever category serves as the baseline. For
example, in the Hainmueller, Hopkins, and Yamamoto candidate example, choosing
a non-religious candidate as a baseline and interpreting the resulting AMCES means
that the differences between other pairs of marginal means (e.g., evaluations of Mor-
mon and Evangelical candidates) are not obvious. The negative direction, and the size,
of the AMCEs for Mormon and Evangelical candidates would be different if the least-
liked category of Mormons were the reference group. More trivially, Teele, Kalla, and
Rosenbluth (2018) describe their comparisons about public preferences for female can-
didates relative to male candidates, but could have equivalent described patterns of
equal size but opposite sign comparing preferences over male relative to female can-
didates. Appendix B includes some additional illustrations of this point for interested
readers.
Consequences of Arbitrary Reference Category Choice
How do researchers decide which of tens of thousands of possible experimental cells
should be selected as the reference category? Examining recently published conjoint
analyses, it appears that the choice of reference category is either arbitrary or based
upon substantive intuition about the meaning of feature levels. For example, Hain-
mueller, Hopkins, and Yamamoto (2014) choose female immigrants as a baseline in
their immigration experiment, thus providing an estimate of the AMCE of being male,
while Teele, Kalla, and Rosenbluth (2018) choose male candidates as a baseline in their
conjoint, thus providing an estimate of the AMCE of being female. The choice is
seemingly innocuous. Sometimes choices of reference category appear to be driven
9
by substantive knowledge: on language skills of immigrants in their immigration ex-
periment, Hainmueller, Hopkins, and Yamamoto (2014) choose fluency as a baseline;
on the prior trips to the US feature, “never” is chosen as the baseline.
While seemingly arbitrary and innocuous, the choice of reference category can
provide highly distorted descriptive interpretations of preferences among subgroups
of respondents. This occurs when researchers examine conditional AMCEs, wherein
AMCEs are calculated separately for subgroups of respondents and those conditional
estimates are directly compared (Hainmueller, Hopkins, and Yamamoto, 2014, 13).
Conditional AMCEs convey the causal effect of an experimental factor on overall fa-
vorability among the subgroup of interest. Consider, for example, a two-condition
candidate choice experiment where Democratic and Republican respondents are ex-
posed to either a male or female candidate and opinions toward the candidate serve
as the outcome. It is reasonable to imagine that effects of candidate sex might differ
for the two groups and therefore to compare the size of treatment between the two
groups. Perhaps Democrats are more responsive to candidate sex than are Republi-
cans, making the causal effect larger for Democrats than Republicans. When conjoint
analysts engage in subgroup comparisons, they are engaging in this kind of search for
heterogeneous treatment effects across subgroups, but across a much larger number
of experimental factors.
As Table 1 shows, discussions of conditional AMCEs in conjoint analyses often
compare the size, and direction, of subgroup causal effects. Given the common prac-
tice of descriptively interpreting conjoint experimental results, such subgroup analy-
ses seem perfectly intuitive. The set of subgroups listed in the last column of Table 1
contains some unsurprising covariates, such as partisanship, that are of obvious the-
oretical interest in almost any study of individual preferences. If interpreted as a dif-
ference in the size of the causal effect for two groups, such comparisons are perfectly
consistent with more traditional experimental analysis and a perfectly acceptable in-
terpretation of the conjoint results.
Yet, just as analysis of full sample conjoint data is often descriptive in nature, it
10
Table 1: Uses of Subgroup Analysis Published in Political Science Journals
Paper Journal Topic Subgroup Comparisons
Bechtel and Scheve (2013) PNAS Climate agreement prefer-
ences
Environmentalism and Inter-
national Reciprocity Attitudes
Franchino and Zucchini (2014) PSRM Candidate preferences Political Interest, Left-right
self-placement
Hainmueller, Hopkins, and
Yamamoto (2014)
Political Analysis Immigration preferences Ethnocentrism
Hansen, Olsen, and Bech
(2014)
Politcal Behavior Policy preferences Partisanship
Carlson (2015) World Politics Candidate preferences Co-ethnicity
Bansak, Hainmueller, and
Hangartner (2016)
Science Immigration preferences Left-right self-placement, age,
education, income
Ballard-Rosa, Martin, and
Scheve (2016)
JOP Tax preferences Various
Campbell et al. (2016) BJPS Candidate preferences Partisanship
Carnes and Lupu (2016) APSR Candidate preferences Partisanship
Mummolo (2016) JOP News selection Various
Vivyan and Wagner (2016) EJPR Candidate preferences Political attitudes
Mummolo and Nall (2017) JOP Mobility preferences Partisanship
Bechtel, Genovese, and Scheve
(2017)
BJPS Climate agreement prefer-
ences
Employment sector emissions
Bechtel, Hainmueller, and
Margalit (2017)
EJPR International bailout prefer-
ences
Various
Gallego and Marx (2017) J. European Public Policy Labor market policy Left-right self-placement
Kirkland and Coppock (2017) Political Behavior Candidate preferences Partisanship
Sen (2017) PRQ Judicial candidate preferences Partisanship
Sobolewska, Galandini, and
Lessard-Phillips (2017)
J. Ethnic & Migration Studies Immigrant integration Various
Eggers, Vivyan, and Wagner
(2018)
JOP Candidate preferences Sex
Hankinson (2018) APSR Housing policy preferences Various
Oliveros and Schuster (2018) CPS Bureaucrat candidate prefer-
ences
Various
Teele, Kalla, and Rosenbluth
(2018)
APSR Candidate preferences Sex, Partisanship
Carey et al. (2018) Politics, Groups, and Identities Hiring preferences Various
All articles in this table use subgroup conditional AMCEs to make inferences about
differences in preferences between subgroups.
11
is also the case that conjoint analysts frequently interpret differences in conditional
AMCEs descriptively rather than causally. For example, in one analysis Hainmueller,
Hopkins, and Yamamoto (2014) visually compare the pattern of AMCEs among high-
and low-ethnocentrism respondents and interpret that “the patterns of support are
generally similar for respondents irrespective of their level of ethnocentrism” (22).
Ballard-Rosa, Martin, and Scheve (2016) make similar comparisons in their tax policy
conjoint: “While there are few strong differences in preferences for taxing the lower
three income groups (the ‘hard work’ group has slightly lower elasticities for taxing
the poor), there are strong differences in preferences for taxing the rich” (12). In the
Bechtel and Scheve (2013) conjoint on support for international climate change agree-
ments in the United States, United Kingdom, Germany, and France, they summarize
their results as “We find that individuals in all four countries largely agree on which
dimensions are important and to what extent” (13765). In these examples, the differ-
ences between conditional AMCEs are used as a way of descriptively characterizing
differences in preferences (i.e. levels of support) between the groups rather than differ-
ences in causal effects on preferences in the groups.
The selection of a reference category, while earlier an innocuous analytic decision,
becomes substantially consequential for a descriptive reading of conditional AMCEs.
Most obviously, using AMCEs descriptively prevents any description of the levels of
favorability in the reference category. It can also lead to misinterpretations of pat-
terns in preferences. AMCEs are relative, not absolute, statements about preferences.
As such, there is simply no predictable connection between subgroup causal effects
and the levels of underlying subgroup preferences. Yet analysts and their readers
frequently interpret differences in conditional AMCEs as differences in underlying
preferences. AMCEs do provide insight into the descriptive variation in preferences
within-group and across-features, and conditional AMCEs do estimate the size of
causal effects of features within groups. But AMCEs cannot provide direct insight into
the pattern of preferences between groups because they do not provide information
about absolute levels of favorability toward profiles with each feature (or combination
12
of features).
This additional information matters. Consider again the simple two-condition ex-
periment in which the effect of a male as opposed to female candidate, x0, 1, is
compared across a single two-category covariate, z0, 1 such as Democratic or Re-
publican self-identification. Subgroup regression equations to estimate effects for each
group are:
ˆ
y=β0+β1x+e,z=0
ˆ
y=β2+β3x+e,z=1
The effect of xwhen z=0 is given by β1. The effect of xwhen z=1 is given by
β3. These are, in essence, the conditional AMCEs in a conjoint analysis. Yet the dif-
ference in AMCEs (β3β1) is not equal to the difference in preferences between the
two groups, which is ¯
yz=1|x=1¯
yz=0|x=1(estimated by (β2+β3)(β0+β1)). The
difference-in-AMCEs only equals the difference in preferences when β2β0. Yet
the standard AMCE-centric conjoint analysis does not present absolute favorability in
the reference category. Similarity of conditional AMCEs only means similarity of the
causal effect of the feature across groups, not similarity of preferences unless preferences
toward profiles with the reference category are equivalent in both groups. Given the
reference category choice is typically arbitrary or driven by substantive knowledge of
the levels, there is never any reason to expect that the reference category satisfies this
equality requirement. When using a difference-in-AMCEs comparison to estimate a
difference in preferences, the size and direction of the bias is determined by the size of
the difference in preferences toward the reference category within each subgroup.
To draw this example out more fully, the upper panel of Figure 2 shows AMCEs
for Teele, Kalla, and Rosenbluth’s candidate choice experiment for the full sample of
respondents. The second panel shows full sample marginal means. Respondents’ pref-
erence for female candidates is very apparent in both forms of analysis in the upper
13
Figure 2: Replication of Results for ‘Candidate Sex’ Feature from Teele et al. (2018)
Candidate Experiment using Full Sample AMCEs and MMs and Subgroup AMCEs
and MMs for Democrats and Republicans
Male
Female
−0.10 −0.05 0.00 0.05 0.10
Full Sample AMCEs
Male
Female
0.40 0.45 0.50 0.55 0.60
Full Sample MMs
Republicans Democrats
Male
Female
−0.10 −0.05 0.00 0.05 0.10
Subgroup AMCEs
Rep.
Rep.
Dem.
Dem.
Male
Female
0.40 0.45 0.50 0.55 0.60
Subgroup MMs
14
Figure 3: True Difference in Favorability and Implied Preference Differences between
High and Low Environmentalism Respondents for ‘Monthly Cost’ Feature from Bech-
tel and Scheve (2013) Climate Agreement Experiment for Each Possible Reference Cat-
egory
€28 per month
€56 per month
€84 per month
€113 per month
€141 per month
−0.10 −0.05 0.00 0.05 0.10
Estimated Preference Difference between High and Low Environmentalism Respondents
Statistic
Originally Reported Difference Potential Differences in AMCEs Difference in MMs
two panels because the AMCE definitionally equals the difference in marginal means.
But how do Republicans and Democrats differ in their preferences over male and fe-
male candidates? The third panel shows conditional AMCEs separately for Demo-
cratic and Republican voters, as provided in the original paper and the lower panel
shows the results using conditional marginal means for Democratic and Republican
voters.6By requiring a reference category fixed to zero, the conditional AMCE results
in the third panel suggest that there is a very large difference in favorability toward
female candidates between Republican and Democratic respondents. In reality, how-
ever, the difference in these conditional AMCEs (0.089) reflects the true difference in
favorability toward female candidates (difference: 0.045; Democrats: 0.537, Repub-
licans: 0.492) plus the difference in favorability toward male candidates (difference:
0.045; Democrats: 0.463, Republicans: 0.508). Because Democrats and Republicans ac-
tually differ in their views of profiles containing the reference (male) category, AMCEs
sum the true differences in preferences for a given feature level with the difference in
preferences toward the reference category.7
6We opt here for visual presentation of results; tabular presentation of AMCEs, marginal means, and
associated standard errors for all examples are included in the Appendix.
7Another example that clearly demonstrates the discrepancy between the differences in preferences
and the differences in conditional AMCEs can be seen very clearly in the “political experience” feature
of this experiment (see Appendix C).
15
Visual or numerical similarity of subgroup AMCEs is therefore an analytical arte-
fact, not an accurate statement of the similarity of patterns of preferences. We can see
this bias in a reanalysis of Bechtel and Scheve’s four-country climate change agreement
experiment. Figure 3 shows an analysis for the feature capturing the monthly house-
hold cost for a potential international climate agreement. This replicates a portion
of their results which compare high- and low-environmentalism respondents pooled
across countries (Bechtel and Scheve, 2013, 13767 figure 4). The original analysis has
conditional AMCEs for the two subgroups with 28 Euro per month as the reference cat-
egory. Conditional AMCEs for both groups are presented as negative with conditional
AMCEs for low-environmentalism respondents being more negative than the condi-
tional AMCEs for high-environmentalism respondents at every feature level. This
implies positive differences in favorability toward each monthly cost between high-
and low-environmentalism respondents. Figure 3 presents the implied difference-in-
AMCEs from the original analysis as black circles, demonstrating the substantial and
positive apparent differences between the two groups. For example, the difference-in-
AMCEs for the 56 Euro per month level (incorrectly) implies that high-environmentalism
respondents are more favorable toward a 56 Euro per month household cost of an
agreement than are low-environmentalism respondents. Yet the opposite is actually
true: high environmentalism respondents are less favorable toward this option than
low environmentalism respondents. By using the 28 Euro per month level as the refer-
ence category, the original analysis implies that preferences are identical between the
two groups when in reality high-environmentalism respondents are much less favor-
able toward a 28 Euro per month cost than low-environmentalism respondents. The
black diamonds in Figure 3 show these true differences in favorability as marginal
means for the two groups.
Furthermore, the gray dots in Figure 3 represent the alternative differences-in-
AMCEs that could have been generated from alternative choices of reference category
using the same data. Not only is it possible for reference categories choice to signif-
icantly color the apparent size of differences between subgroup, that choice can also
16
impact the direction and statistical significance of subgroup differences. An analyst
could easily choose a reference category that presents differences between these two
group as large and positive, small and positive, small and negative, large and nega-
tive, or negligible. The original analysis (again, black circles) happens to show large
and positive differences between the groups.
It is worth highlighting two further features in Figure 3. First, the alternative
differences-in-AMCEs estimates vary mechanically around the difference in marginal
means, as the reference category varies. The difference between marginal means for
two groups are always fixed in the data, so the differencing of subgroup AMCEs is
merely an exercise is centering those differences at arbitrary points along the range of
observed differences in marginal means. Second, and more practically, because there
is no category for which the preferences of the two subgroups in this example are iden-
tical, no choice of reference category would have led to inferences from differences-in-
AMCEs that accurately reflect the underlying difference in preferences. Even in the
84 Euro per month level, the difference between the two groups is slightly positive.
Were there a category for which subgroup preferences were exactly equal, then we
could choose that as the reference category and interpret differences-in-AMCEs as dif-
ferences in preferences. But there is never any guarantee that such a reference category
exists. Thus, there is no way to use conditional AMCEs or differences between those
conditional AMCEs to convey the underlying similarity or differences in preferences
across sample subgroups.
Improved Subgroup Analyses in Conjoint Designs
Researchers and consumers of conjoints interested in describing levels of respondent
favorability toward profiles with varying features can avoid the inferential errors that
accompany conditional AMCEs by focusing attention on (subgroup) marginal means,
differences between subgroup marginal means to infer subgroup differences in prefer-
ences toward particular features, and omnibus nested model comparisons to infer sub-
17
Figure 4: Comparison of AMCEs for Low- and High-Ethnocentrism Respondents
Using Two Alternative Reference Categories Choices for Three Features from Hain-
mueller et al.’s (2014) Immigration Experiment
A
B
−0.2 0.0 0.2 −0.2 0.0 0.2
Mexico
India
Germany
France
Philippines
Poland
China
Sudan
Somalia
Iraq
Doctor
Janitor
Waiter
Child care provider
Gardener
Financial analyst
Construction worker
Teacher
Computer programmer
Nurse
Research scientist
No formal education
Equivalent to completing fourth grade
Equivalent to completing eighth grade
Equivalent to completing high school
Equivalent to completing two years of college
Equivalent to completing a college degree
Equivalent to completing a graduate degree
Estimated AMCE
Ethnocentrism
high low
group differences across many features. To demonstrate each of these three techniques
we provide a complete example based upon Hainmueller, Hopkins, and Yamamoto’s
analysis of their immigration conjoint by respondent enthnocentrism, which finds that
“the patterns of support are generally similar for respondents irrespective of their level
of ethnocentrism” (Hainmueller, Hopkins, and Yamamoto, 2014, 22). First, we show
how different reference categories could have led to distinctly different conditional
AMCEs and, therefore, interpretations of subgroup preference similarity. Second, we
show how differences in marginal means clearly convey the similarity of these two
subgroups without any sensitivity to reference category. Finally, we show how tested
model comparisons would have provided Hainmueller, Hopkins, and Yamamoto with
a statistic test of the claimed similarity in levels of support between these two respon-
dent subgroups.
18
To begin, consider the left and right facets of Figure 4, which shows estimated
subgroup AMCEs for three features from the immigration study. In panel “A” (left),
all features are configured so that the reference category is the one with the largest
difference in levels of support between the two subgroups thus distorting the size of
differences at all other levels. In panel “B” (right), all features are configured so that
the reference category is the one with the smallest difference in preferences between
the two subgroups.
Panel A gives the impression that there are significant differences in preferences
between high and low ethnocentrism respondents toward immigrants from different
countries of origin, with different careers, and with different educational attainments
because the reference category choice cascades the difference in reference category
favorability into AMCEs for all other feature levels. By contrast, Panel B gives the
impression that these differences are negligible. The experimental data and analytic
approach in the two portrayals is identical; the only difference is the choice of reference
category. Given what we have shown about the relationship between differences in
conditional AMCEs and differences in conditional marginal means, Panel B is a more
“truthful” visualization, which Cairo (2016) uses to mean avoidance of self-deception
in the presentation of data, and a more “functional” visualization, by which Cairo
means choosing graphics based on how they will be interpreted by the visualization’s
consumers. The differences between subgroup AMCEs there more accurately convey
differences in underlying preferences because the reference categories used in Panel B
are the most similar between the two groups.
Next, making a comparison of levels of favorability toward different types of im-
migrants without using AMCEs would have been even more truthful. Figure 5 directly
shows that comparison of preferences as differences in subgroup marginal means be-
tween the two groups for these three features, with 95% confidence intervals for the
difference.8The two groups indeed have similar preferences, something that would
have happened to be clear had the conditional AMCEs in the right panel of Figure 4
8A presentation of subgroup marginal means for all features can be found in Appendix E.
19
Figure 5: Differences in Conditional Marginal Means, by Ethnocentrism, for Three
Features From Hainmueller et al.’s (2014) Immigration Experiment
Janitor
Waiter
Child care provider
Gardener
Financial analyst
Construction worker
Teacher
Computer programmer
Nurse
Research scientist
Doctor
(Job)
India
Germany
France
Mexico
Philippines
Poland
China
Sudan
Somalia
Iraq
(Country of Origin)
No formal education
Equivalent to completing fourth grade
Equivalent to completing eighth grade
Equivalent to completing high school
Equivalent to completing two years of college
Equivalent to completing a college degree
Equivalent to completing a graduate degree
(Education)
−0.2 0.0 0.2
Estimated Difference in Marginal Means
been presented but that would have been far less obvious were the conditional AM-
CEs in the left panel of that figure presented. Pairwise difference in means tests would
provide formal procedures for testing the statistical significance of these differences.
Yet, finally, the similarity of subgroup preferences in conjoints is often character-
ized in an omnibus fashion, as in the quote from Hainmueller, Hopkins, and Yamamoto
(2014) describing “patterns of support.” An appropriate test in such cases is one that
evaluates whether a model of support that accounts for group differences better fits the
data than a model of support with only conjoint features as predictors. This type of
test is known as a “nested model comparison” which compares the fit of a “restricted”
regression (the restriction being that interactions between features and a subgroup
identifier are held to be zero) nested within an “unrestricted” regression that allows
for arbitrary interactions between conjoint features and the subgroup identifier. For-
mally, a nested model comparison provides an F-test of the null hypothesis that all
interaction terms are equal to zero.9
9Like any ANOVA this hypothesis test may yield substantively different insight from a series of tests
of pairwise mean differences. Figure 5 shows three instances where the 95% confidence intervals for
20
To make this concrete, for a feature with four levels (one treated as a reference
category), the first (restricted) equation would be:
Y=β0+β1Level2+β2Level3+β3Level4+u(1)
The second (unrestricted) equation would allow for interactions between feature levels
and the subgroup identifier:
Y=β0+β1Level2+β2Level3+β3Level4+β4Group+
β5Level2Grou p +β6Level3Group +β7Level4Group +u
(2)
While Equation 1 imposes the constraint that β4=β5=β6=β7=0, Equation 2
allows for subgroup differences in favorability. Testing this null entails computing an
F-statistic comparing the fit of each equation:
F=
SSRRestricted SSRUnrestricted
r
SSRUnrestricted
nk1
(3)
where SSRRestricted is the sum of squared residuals for Equation 1, SSRUnrestricted is the
sum of squared residuals for Equation 2, where ris the number of restrictions (in the
above example, 4), nis the number of cases, and kis the number of feature levels in
the unrestricted model.10
For the education feature, the resulting F-test for the model comparison in this case
again gives us little reason to believe there are subgroup differences: F(7, 11493)=0.68,
p0.69. We could repeat such pairwise comparisons or omnibus comparisons for
each feature in the design — for country of origin (F(10, 11490)=1.56, p0.11) or
job (F(11, 11489)=0.87, p0.56) — or for all features as a whole (F(98, 11402)=1.16,
p0.14).
pairwise differences in marginal means do not include zero even though the omnibus test fails to reject
the null at α=0.05.
10Note that this test is not sensitive to reference category even though it requires specifying a regres-
sion equation.
21
This visual display in Figure 5 and these statistical tests make clear what could not
be directly inferred from conditional AMCEs alone: there are indeed no sizeable and
only a few statistically apparent differences in preferences between the two groups.
This kind of nested model comparison test can also be used to assess heterogeneity
across conjoint features (see also Egami and Imai, 2018). For example, Teele, Kalla,
and Rosenbluth (2018) report just such a test for how effects of features other than
candidate sex may differ between male and female candidates, finding no such het-
erogeneity (8–9). Fortunately, the original analysis accurately detected an absence of
subgroup differences, yet a subtly different set of analytic decisions about reference
categories (as shown in Figure 4) could have led to quite different inferences. As an
example, Bechtel and Scheve (2013) argue that their conjoint results show “individ-
uals in all four countries [Germany, France, United States, United Kingdom] largely
agree on which dimensions are important and to what extent” (Bechtel and Scheve,
2013, 13765), but a nested model comparison shows the countries do differ in their
preferences F(54, 67982)=3.72, p0.00. This cross-country variation is largely driven
by differences in sensitivity to monthly household costs feature, F(15, 67995)=3.80,
p0.00, with the United Kingdom and United States being more cost sensitive than
Germany and France. Visual comparisons of conditional AMCEs can sometimes pro-
vide accurate insights into subgroup differences in preferences (as in the Hainmueller,
Hopkins, and Yamamoto case), but ultimately there is no guarantee that they do in
any particular analysis.
Conclusion
This article has identified several challenges related to the analysis and reporting of
conjoint experimental designs, particularly analyses of subgroup differences. We sug-
gest that conjoint analyses should report not only average marginal component effects
(AMCEs) but also descriptive quantities about levels of favorability that better convey
underlying preferences over profile features and better convey subgroup differences
22
in those preferences. Marginal means contain all of the information provided by AM-
CEs and more. Consequently, our intention here is not to substantively undermine
any previous set of results, but instead to urge researchers moving forward to demon-
strate considerable caution in how they design, analyze, and present the results of
these types of descriptive experiments and how they test for differences in preferences
between subgroups.
We have relatively straightforward and hopefully uncontroversial advice for how
analysts of conjoint experiments should proceed:
1. Always report unadjusted marginal means when attempting to provide a descrip-
tive summary of respondent preferences in addition to, or instead of, AMCEs.
2. Exercise caution when explicitly, or implicitly, interpreting differences-in-AMCEs
across subgroups. Differences-in-AMCEs are differences in effect sizes for sub-
groups, not statements about the relative favorability of the subgroups toward
profiles with a given feature. Heterogeneous effects do not necessarily mean dif-
ferent underlying preferences. If differences in AMCEs are reported, the choice
of reference categories should be discussed explicitly and diagnostics should be
provided to justify it.
3. When descriptively characterizing differences in preference level between sub-
groups, directly estimate the subgroup difference using conditional marginal means
and differences between conditional marginal means, rather than relying on the
difference-in-AMCEs.
4. To formally test for group differences in preferences, regression with interaction
terms between the subgrouping covariate and all feature levels will generate esti-
mates of level-specific differences in preferences via the coefficients on the interac-
tion terms. A nested model comparison between this equation against one with-
out such interactions provides an omnibus test of subgroup differences, which
should be reported when characterizing overall patterns of subgroup differences.
Following this advice, we hope, will allow researchers to more clearly and more accu-
23
rately represent descriptive results of conjoint experiments.
The popularity of conjoint analyses in recent years highlights the power of the de-
sign and the important contributions made by Hainmueller, Hopkins, and Yamamoto
(2014) in providing a novel causal interpretation of these fully randomized factorial
designs. Yet with new tools always come new challenges. The now-common prac-
tice of descriptively interpreting conjoints requires more caution than is immediately
obvious. To facilitate improved analysis and, especially, to provide easy-to-use tools
for calculating marginal means and performing reference category selection diagnos-
tics, we provide software called cregg (Leeper, 2018) available from the Comprehen-
sive R Archive Network. Additionally, this manuscript is written as a reproducible
knitr document (Xie, 2015) that contains complete code examples that will perform all
analyses and visualization used throughout this article. With these resources in-hand,
researchers should be well-equipped to analyze subgroup preferences in conjoint de-
signs without running into the analytic challenges discussed here.
References
Ballard-Rosa, Cameron, Lucy Martin, and Kenneth Scheve. 2016. “The Structure of
American Income Tax Policy Preferences.” The Journal of Politics 79(1): 1–16.
Bansak, Kirk, Jens Hainmueller, and Dominik Hangartner. 2016. “How economic, hu-
manitarian, and religious concerns shape European attitudes toward asylum seek-
ers.” Science 354(6309): 217–222.
Bechtel, Michael M., and Kenneth F. Scheve. 2013. “Mass Support for Global Climate
Agreements Depends on Institutional Design.” Proceedings of the National Academy of
Sciences 110(34): 13763–13768.
Bechtel, Michael M., Federica Genovese, and Kenneth F. Scheve. 2017. “Interests,
Norms and Support for the Provision of Global Public Goods: The Case of Climate
Co-operation.” British Journal of Political Science: Forthcoming.
Bechtel, Michael M., Jens Hainmueller, and Yotam Margalit. 2017. “Policy Design and
Domestic Support for International Bailouts.” European Journal of Political Research
56(4): 864–886.
Cairo, Alberto. 2016. The Truthful Art. New Riders.
Campbell, Rosie, Philip Cowley, Nick Vivyan, and Markus Wagner. 2016. “Legislator
Dissent as a Valence Signal.” British Journal of Political Science: Forthcoming.
24
Carey, John M., Kevin R. Carman, Katherine P. Clayton, Yusaku Horiuchim, Mala
Htun, and Brittany Ortiz. 2018. “Who wants to hire a more diverse faculty? A
conjoint analysis of faculty and student preferences for gender and racial/ethnic
diversity.” Politics, Groups, and Identities: Forthcoming.
Carlson, Elizabeth. 2015. “Ethnic Voting and Accountability in Africa: A Choice Ex-
periment in Uganda.” World Politics 67(2): 353–385.
Carnes, Nicholas, and Noam Lupu. 2016. “Do Voters Dislike Working-Class Candi-
dates? Voter Biases and the Descriptive Underrepresentation of the Working Class.”
American Political Science Review 110(04): 832–844.
Clayton, Katherine, Jeremy Ferwerda, and Yusaku Horiuchi. 2018. “Exposure to Im-
migration and Admission Preferences: Evidence from France.” : Forthcoming. Un-
published paper, Dartmouth University.
Druckman, James N., Donald P. Green, James H. Kuklinski, and Arthur Lupia. 2006.
“The Growth and Development of Experimental Research in Political Science.”
American Political Science Review 100(4): 627–635.
Egami, Naoki, and Kosuke Imai. 2018. “Causal Interaction in Factorial Experiments:
Application to Conjoint Analysis.” Journal of the American Statistical Association:
Forthcoming.
Eggers, Andrew C., Nick Vivyan, and Markus Wagner. 2018. “Corruption, Account-
ability, and Gender: Do Female Politicians Face Higher Standards in Public Life?”
The Journal of Politics 80(1): 321–326.
Franchino, Fabio, and Francesco Zucchini. 2014. “Voting in a Multi-dimensional Space:
A Conjoint Analysis Employing Valence and Ideology Attributes of Candidates.”
Political Science Research and Methods 3(2): 221–241.
Gaines, Brian J., James H. Kuklinski, and Paul J. Quirk. 2007. “The Logic of the Survey
Experiment Reexamined.” Political Analysis 15(1): 1–20.
Gallego, Aina, and Paul Marx. 2017. “Multi-dimensional preferences for labour market
reforms: a conjoint experiment.” Journal of European Public Policy 24(7): 1027–1047.
Green, Donald P., and Holger L. Kern. 2012. “Modeling Heterogeneous Treatment Ef-
fects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opin-
ion Quarterly 76(3): 491–511.
Grimmer, Justin, Solomon Messing, and Sean J. Westwood. 2017. “Estimating Het-
erogeneous Treatment Effects and the Effects of Heterogeneous Treatments with En-
semble Methods.” Political Analysis 25(4): 413–434.
Hainmueller, Jens, and Daniel J. Hopkins. 2015. “The Hidden American Immigration
Consensus: A Conjoint Analysis of Attitudes toward Immigrants.” American Journal
of Political Science: Forthcoming.
Hainmueller, Jens, Daniel J. Hopkins, and Teppei Yamamoto. 2014. “Causal Inference
in Conjoint Analysis: Understanding Multi-Dimensional Choices via Stated Prefer-
ence Experiments.” Political Analysis 22: 1–30.
25
Hankinson, Michael. 2018. “When Do Renters Behave Like Homeowners? High Rent,
Price Anxiety, and NIMBYism.” American Political Science Review 112(3): 473–493.
Hansen, Kasper M., Asmus L. Olsen, and Mickael Bech. 2014. “Cross-National Yard-
stick Comparisons: A Choice Experiment on a Forgotten Voter Heuristic.” Political
Behavior 37(4): 767–789.
Kirkland, Patricia A., and Alexander Coppock. 2017. “Candidate Choice Without
Party Labels.” Political Behavior 40(3): 571–591.
Leeper, Thomas J. 2018. cregg: Simple Conjoint Analyses and Visualization. R package
version 0.2.1.
Leeper, Thomas J., Sara B. Hobolt, and James Tilley. 2019. Replication Data for ‘Measur-
ing Subgroup Preferences in Conjoint Experiments’. doi:10.7910/DVN/ARHZU4.
Mummolo, Jonathan. 2016. “News from the Other Side: How Topic Relevance Limits
the Prevalence of Partisan Selective Exposure.” The Journal of Politics 78(3): 763–773.
Mummolo, Jonathan, and Clayton Nall. 2017. “Why Partisans Do Not Sort: The Con-
straints on Political Segregation.” The Journal of Politics 79(1): 45–59.
Mutz, Diana C. 2011. Population-Based Survey Experiments. Princeton, NJ: Princeton
University Press.
Oliveros, Virginia, and Christian Schuster. 2018. “Merit, Tenure, and Bureaucratic Be-
havior: Evidence From a Conjoint Experiment in the Dominican Republic.” Compar-
ative Political Studies 51(6): 759–792.
Ratkovic, Marc, and Dustin Tingley. 2017. “Sparse Estimation and Uncertainty with
Application to Subgroup Analysis.” Political Analysis 25(1): 1–40.
Sen, Maya. 2017. “How Political Signals Affect Public Support for Judicial Nomina-
tions.” Political Research Quarterly 70(2): 374–393.
Shmueli, Galit. 2010. “To Explain or to Predict?” Statistical Science 25(3): 289–310.
Sniderman, Paul M. 2011. “The Logic and Design of the Survey Experiment: An Auto-
biography of a Methodological Innovation.” In Cambridge Handbook of Experimental
Political Science, eds. James N. Druckman, Donald P. Green, James H. Kuklinski, and
Arthur Lupia. New York: Cambridge University Press.
Sobolewska, Maria, Silvia Galandini, and Laurence Lessard-Phillips. 2017. “The public
view of immigrant integration: multidimensional and consensual: Evidence from
survey experiments in the UK and the Netherlands.” Journal of Ethnic and Migration
Studies 43(1): 58–79.
Teele, Dawn Langan, Joshua Kalla, and Frances Rosenbluth. 2018. “The Ties That
Double Bind: Social Roles and Women’s Underrepresentation in Politics.” American
Political Science Review 112(3): 525–541.
Vivyan, Nick, and Markus Wagner. 2016. “House or home? Constituent preferences
over legislator effort allocation.” European Journal of Political Research 55(1): 81–99.
26
Wright, Matthew, Morris Levy, and Jack Citrin. 2016. “Public Attitudes Toward Im-
migration Policy Across the Legal/Illegal Divide: The Role of Categorical and
Attribute-Based Decision-Making.” Political Behavior 38(1): 229–253.
Xie, Yihui. 2015. Dynamic Documents with R and knitr. 2nd ed. Boca Raton, Florida:
Chapman and Hall/CRC. ISBN 978-1498716963.
27
... Survey experiments have proven to be valuable tools for the investigation of public attitudes on international cooperation, ranging from climate change to potential reforms of the European Union ( Bechtel and Scheve 2013 ;Gampfer 2013 , Anderson, Bernauer, andKachi 2019 ;Dellmuth, Scholte, and Tallberg 2019 ;Hahm et al. 2019 ;Hahm, Hilpert, and König 2020 ;Kuhn, Nicoli, and Vandenbroucke 2020 ;Dellmuth and Tallberg 2021 ). Specifically, we used a full-profile conjoint analysis approach ( Hainmueller, Hopkins, and Yamamoto 2014 ;Leeper, Hobolt, and Tilley 2020 ). We showed respondents four pairs of UN profiles consisting of randomized combinations of reform or status quo options, and asked them to choose between and rate the two options they saw on each screen. ...
... Our analyses focus on four quantities of interest: marginal means (MMs), average marginal component effects (AM-CEs), differences in MMs (MM diffs), and omnibus Ftests ( Hainmueller, Hopkins, and Yamamoto 2014 ;Leeper, Hobolt, and Tilley 2020 ). Each of them contributes information relevant to different aspects of our research questions. ...
... AMCE analysis thus allows us to test whether one attribute level's MM is statistically different from another's, for example, whether direct enforcement by the UN is preferred to collective enforcement by UN member states (in a specific subsample of people or in the aggregate sample). Similar to Hahm, Hilpert, and König (2020) with respect to the European Union, we are particularly interested in the UN's status quo design features as baselines, testing whether and to what extent the reform proposals have positive or negative effects. 2 In the Tables folder of our online supplementary material, we present all our AMCE results using different reference categories, bearing in mind the advice of Leeper, Hobolt, and Tilley (2020) . ...
Article
Full-text available
Scholars and policy makers have intensely debated institutional reforms of the United Nations (UN) since its creation. Yet, relatively little attention has been given to institutional design preferences among the public in UN member states. This study examines two questions: Which possible rules concerning UN authority and representation do citizens prefer? Which personal and country characteristics are associated with their varying institutional preferences? A population-based conjoint survey experiment conducted in Argentina, China, India, Russia, Spain, and the United States is used to identify public preferences on nine distinct institutional design dimensions figuring prominently in UN reform debates. We find widespread support for increasing or at least maintaining UN authority over member states and for handing control over its decision-making to UN organs that would represent the citizens of every member state more directly. Citizens’ institutional preferences are associated with their political values and vary depending on whether their home countries would gain or lose influence from a specific reform.
... A conjoint analysis evaluates the association of vaccine attributes with the decision over one of the two COVID-19 vaccine choices provided in one task. This experimental design has been widely employed in the marketing research to examine consumer preferences (Green et al., 2001) and increasingly used in the social sciences (Hainmueller et al., 2014;Leeper et al., 2020) and the medical research (Almario et al., 2018). This design is especially effective in minimizing social desirability bias and predicting real-world behavior (Hainmueller et al., 2015). ...
... By contrast, the bias against vaccines manufactured in Russia and China is homogeneously strong and negative for all nationalism subgroups. To complement these findings, we provide the Marginal means (MMs) estimates in Online Appendix H. Marginal means (MMs) estimates are an important reference for AMCEs' causal interpretation and subgroup comparisons in a context when there are no well-defined baselines in a conjoint experiment (Leeper et al., 2020). While this is not a concern for our experiment since the homegrown vaccine was a natural choice, we report the results of MMs in Online Appendix H. Figures S5 and S6 show that our results remain unchanged. ...
Article
Full-text available
What types of vaccines are citizens most likely to accept? We administered a con-joint experiment requesting 15,000 adult citizens across 14 individual countries from around the world to assess 450,000 profiles of vaccines that randomly varied on seven attributes. Beyond vaccine fundamentals such as efficacy rate, number of doses, and duration of the protection, we find that citizens systematically favor vaccines devel-oped and produced in their own country of residence. These results indicate that a rarely discussed form of vaccine nationalism shapes vaccine acceptance. The ex-tent of preference in favor of vaccines developed and produced within the national borders is particularly large among citizens who identify more strongly with their nation, suggesting nationalism plays a role in explaining the bias in favor of vaccines developed and produced locally. This public opinion bias on vaccine preferences has significant theoretical and practical implications.
... We look at three subgroups: Islamists and non-Islamists; opponents and supporters of the government; and individuals who support or oppose suicide terrorism. It should be noted that we compare AMCEs across these subgroups rather than marginal means because we are explicitly interested in the direction of and difference between the treatment effects of the political levels relative to the apolitical reference categories (Leeper, Hobolt, and Tilley 2020). 10 For each subgroup analysis, we report only coefficients for the attributes of interest, with full results available in SI-C in the Online Supplementary Material. ...
... First, we compare self-described Islamists (n = 3,463) to non-Islamists (n = 8,562). As part of the survey, respondents were asked to identify their political ideology from a list of eleven options, 10 Marginal means are better for comparing subgroup effects in conjoint experiments when there is no clear justification for using one reference category over the others and when researchers are interested in estimating the difference in levels of subgroup favorability toward the various attributes rather than differences in average treatment effects (Leeper, Hobolt, and Tilley 2020). Comparing AMCEs is more relevant to our hypothesis. ...
Article
Full-text available
A growing body of research demonstrates that political involvement by Christian religious leaders can undermine the religion's social influence. Do these negative consequences of politicization also extend to Islam? Contrary to scholarly and popular accounts that describe Islam as inherently political, we argue that Muslim religious leaders will weaken their religious authority when they engage with politics. We test this argument with a conjoint experiment implemented on a survey of more than 12,000 Sunni Muslim respondents in eleven Middle Eastern countries. The results show that connections to political issues or politically active religious movements decrease the perceived religious authority of Muslim clerics, including among respondents who approve of the clerics' political views. The article's findings shed light on how Muslims in the Middle East understand the relationship between religion and politics, and they contribute more broadly to understanding of how politicized religious leaders can have negative repercussions for religion.
... This also implies that we do not follow the standard approach in conjoint experiments and do not estimate Average Marginal Component Effects (AMCE; Hainmueller et al., 2014) or Marginal Means (MMs; Leeper et al., 2020). Instead, we estimate logistic regressions in which the dependent variable is 1 if a respondent selected the illiberal politician and 0 otherwise. ...
Article
Many democracies are witnessing the rise and continuing success of parties and politicians who oppose fundamental principles of liberal democracy. Recent research finds that voters support illiberal politicians, because they trade off policy congruence against attitudes toward liberal democracy. Other studies, however, suggest that authoritarian and populist voters might actually have a preference to vote for illiberal candidates. We argue that both factors interact: Authoritarian and populist voters are more willing to trade off policy representation against support for liberal democracy. To test this mechanism, we rely on a survey experiment conducted in Germany. The results clearly demonstrate that voters indeed trade off policy congruence against liberal democracy. Moreover, this effect is particularly strong for populist and authoritarian voters. Overall, the results have important implications for understanding when and which voters support or oppose liberal democracy.
... In practice, we regressed a dummy variable indicating whether a respondent preferred a particular set of attributes, using cluster-robust Ses to account for withinrespondent clustering. We displayed the results in the form of marginal means (49). Given that our analysis combines responses across different contexts, we also ran mixed-effects regressions with the data from all cities independently (SI Appendix, Tables S2-S4). ...
Article
Full-text available
Dense and compact cities yield several benefits for both the population and the environment, including the containment of urban sprawl, reduced carbon emissions, and increased housing supply. Densification of the built environment is thus a key contemporary urban planning paradigm worldwide. However, local residents often oppose urban densification, motivating a need to understand their underlying concerns. In order to do so, we examined different factors driving public acceptance of housing densification projects through a combination of a conjoint survey experiment and different proximity frames among 12,402 participants across Berlin, Chicago, London, Los Angeles, New York, and Paris. Respondents compared housing densification projects with varying attributes, including their geographic proximity, project-related factors, and accompanying planning instruments. The results indicate that the acceptance of such projects decreases with project proximity and that project-related factors, such as the type of investor, usage, and climate goals, impact densification project acceptance. More specifically, we see a negative effect on acceptance levels for projects with for-profit investors and a positive effect when the suggested developments are mixed use or climate neutral. In addition, planning instruments, such as rent control, inclusionary zoning, and participatory planning, appear to positively influence acceptance. Interestingly, a cross-continental comparison shows overall higher acceptance levels of densification by US respondents. These multifaceted results allow us to better understand what drives people's acceptance of housing projects and how projects and planning processes can be designed to increase democratic acceptance of urban densification.
... AMCEs capture the change in support for the policy bundle caused by a specific stringency level of the respective policy instrument, relative to a baseline value (Hainmueller et al. 2014). Marginal Means, in contrast, indicate the overall proportion, or probability, of support for a given value of a policy instrument, averaging overall values of the other policy instruments (Leeper et al. 2019). It thus indicates average support for a policy bundle containing the specific attribute value of a policy instrument within the bundle. ...
Article
Environmental protection efforts commonly make use of two types of government interventions: command and control policies (C&C) and market-based instruments (MBIs). While MBIs are favored for their economic efficiency, visible prices on pollution may generate political backlash. We examine whether citizens are more likely to support policies that tend to obfuscate policy costs (C&C), as opposed to MBIs, which impose visible costs. Using conjoint experiments in Beijing and New Delhi, we examine support for 'policy bundles', including both C&C policies and MBIs, aimed at limiting air pollution from vehicles. In both cities, increasing fuel taxes (a MBI) reduces policy support. However, pledging revenue usage from fuel taxes to subsidize electric cars or public transport eliminates this negative effect. Furthermore, individuals with a lower evaluation of their government respond more negatively to MBIs. MBIs may be economically efficient, but are politically difficult unless policy-makers can offset visible costs through additional measures.
... illustrates to what extent partisans punish undemocratic behavior depending on the number of parties in the system. The upper panel shows the marginal means of vote shares for in-partisan candidates who either violate democratic principles or behave democratically compliant in two-and three-party scenarios across the increase-in-profiles (IP) and constituency-information (CI) designs(Leeper et al. 2020).8 The lower panel provides the direct test of the theoretical expectation by illustrating the effects of undemocratic behavior and interaction with the party system treatments.The theoretical expectation gains support if the interactions between undemocratic behavior and three-party scenarios are significantly negative. ...
Preprint
Does the number of choices offered by the party system affect whether citizens punish undemocratic behavior? I employ two innovative candidate choice experiments fielded in England to answer this question. Specifically, I implement two designs manipulating the number and effective number of parties displayed between two and three, reflecting the ambiguity of England's party system. Contrary to expectations, I find that Labour and Conservative identifiers do not defect more from undemocratic in-partisan candidates when they face three (effective) parties---Labour, the Conservatives, and the Liberal Democrats---rather than just the two major parties. Instead, defection from undemocratic in-partisans to the out-party drops and relocates to the Liberal Democrats even when the latter have no chance of winning. These findings highlight that more parties do not generate more defection from undemocratic politicians---and that voters prefer defecting to the option ideologically nearest to the in-party even when this option is chanceless.
... The AMCE represents the average difference in the probability of being chosen when comparing two different attribute values (e.g., a candidate who worked as a lawyer compared to a candidate who worked as a firefighter) where the average is taken over all other possible attribute combinations. To make sure our results are not driven by our choice of reference categories and to make valid subgroup comparisons, we also provide the estimates of marginal means in addition to AMCEs (Leeper et al., 2020). ...
Preprint
Full-text available
Is populism electorally effective and, if so, why? While scholars agree that populism is a complex communication construct that combines anti-pluralist, people-centric, and moralist rhetoric with various political stances, it is unclear to what extent its appeal among voters is based on each of these components or their combination. Since populist rhetoric is always combined with other ideologies, it is also hard to separate the effects of populism from the hosting ideology. To address these questions, we conduct a novel conjoint experiment to determine which parts of populist communication and related policy messaging are most effective for candidate choice. Our US survey asks respondents to evaluate pairs of realistic campaign messages with varying populism-related characteristics given by hypothetical primary candidates. Although party-congruent policy positions are expectedly much more popular, we find that none of the rhetorical elements of populist speech have had an independent or a combined effect on candidate choice. We conclude by discussing the implications of our findings for understanding the role of populist rhetoric in politics and its apparent (in)effectiveness.
... Our quantity of interest is the difference in marginal means for every feature in our three conjoint experiments between partisans and non-partisans. We focus on the marginal means, instead of the more heavily used Average Interactive Component Effect (Hainmueller et al., 2014), because these quantities are more appropriate to identify heterogeneous, subgroup effects when dealing with conjoint designs (Leeper et al., 2020). In addition, we separate the results between leftists and conservatives partisans using the vote choice independent variable. ...
Preprint
Full-text available
Our paper describes how the users' decisions to share content alter the frequencies of the frame elements observed by social media peers. Changes in the frequency of distinct frame elements, in different regions of a social network, shape how individuals interpret, classify, and define situations and events. We label this process Network Activated Frames (NAF). We test the mechanisms behind NAF with an original image-based conjoint design that replicates network activation in three surveys. Results show that partisans share more content than non-partisans and that their preferences are different from that of non-partisans. Our findings show that a network of peers with cross-cutting ideological preferences may be perceived as a bubble if partisans amplify content they like at higher rates. Beginning with fully randomized probabilities, the output from our experiments is more extreme than the preference of the median users, as partisans activate more and different frame elements than non-partisans. We implement the survey experiments in Argentina, Brazil, and Mexico.
Article
Full-text available
How rampant is political discrimination in the United States, and how does it compare to other sources of bias in apolitical interactions? We employ a conjoint experiment to juxtapose the discriminatory effects of salient social categories across a range of contexts. The conjoint framework enables identification of social groups’ distinct causal effects, ceteris paribus, and minimizes ‘cheap talk,’ social desirability bias, and spurious conclusions from statistical discrimination. We find pronounced discrimination along the lines of party and ideology, as well as politicized identities such as religion and sexual orientation. We also find desire for homophily along more dimensions, as well as specific out-group negativity. We also find important differences between Democrats and Republicans, with discrimination by partisans often focusing on other groups with political relevance of their own. Perhaps most striking, though, is how much discrimination emerges along political lines – both partisan and ideological. Yet, counter-stereotypic ideological labels can counter, and even erase, the discriminatory consequences of party.
Article
Full-text available
What explains the scarcity of women and under-represented minorities among university faculty relative to their share of Ph.D. recipients? Among many potential explanations, we focus on the “demand” side of faculty diversity. Using fully randomized conjoint analysis, we explore patterns of support for, and resistance to, the hiring of faculty candidates from different social groups at two large public universities in the U.S. We find that faculty are strongly supportive of diversity: holding other attributes of (hypothetical) candidates constant, for example, faculty at both universities are between 11 and 21 percentage points more likely to prefer a Hispanic, black, or Native American candidate to a white one. Furthermore, preferences for diversity in faculty hiring are stronger among faculty than among students. These results suggest that the primary reason for the lack of diversity among faculty is not a lack of desire to hire them, but the accumulation of implicit and institutionalized biases, and their related consequences, at later stages in the pipeline.
Article
Full-text available
Mitigating climate change requires countries to provide a global public good. This means that the domestic cleavages underlying mass attitudes toward international climate policy are a central determinant of its provision. We argue that the industry-specific costs of emission abatement and internalized social norms help explain support for climate policy. To evaluate our predictions we develop novel measures of industry-specific interests by cross-referencing individuals’ sectors of employment and objective industry-level pollution data and employing quasi-behavioral measures of social norms in combination with both correlational and conjoint-experimental data. We find that individuals working in pollutive industries are 7 percentage points less likely to support climate co-operation than individuals employed in cleaner sectors. Our results also suggest that reciprocal and altruistic individuals are about 10 percentage points more supportive of global climate policy. These findings indicate that both interests and norms function as complementary explanations that improve our understanding of individual policy preferences.
Article
Full-text available
Financial bailouts for ailing Eurozone countries face deep and widespread opposition among voters in donor countries, casting major doubts over the political feasibility of further assistance efforts. What is the nature of the opposition and under what conditions can governments obtain broader political support for funding such large-scale, international transfers? This question is addressed by distinguishing theoretically between ‘fundamental’ and ‘contingent’ attitudes. Whereas the former entail complete rejection or embrace of a policy, the latter depend on the specific features of the policy and could shift if those features are altered. Combining unique data from an original survey in Germany – the largest donor country – together with an experiment that varies salient policy dimensions, the analysis indicates that less than a quarter of the public exhibits fundamental opposition to the bailouts. Testing a set of theories on contingent attitudes, particular sensitivity is found to the burden-sharing and cost dimensions of the bailouts. The results imply that the choice of specific features of a rescue package has important consequences for building domestic support for international assistance efforts.
Article
This paper theorizes three forms of bias that might limit women's representation: outright hostility, double standards, and a double bind whereby desired traits present bigger burdens for women than men. We examine these forms of bias using conjoint experiments derived from several original surveys—a population survey of American voters and two rounds of surveys of American public officials. We find no evidence of outright discrimination or of double standards. All else equal, most groups of respondents prefer female candidates, and evaluate men and women with identical profiles similarly. But on closer inspection, all is not equal. Across the board, elites and voters prefer candidates with traditional household profiles such as being married and having children, resulting in a double bind for many women. So long as social expectations about women's familial commitments cut against the demands of a full-time political career, women are likely to remain underrepresented in politics.
Article
We study causal interaction in factorial experiments, in which several factors, each with multiple levels, are randomized to form a large number of possible treatment combinations. Examples of such experiments include conjoint analysis, which is often used by social scientists to analyze multidimensional preferences in a population. To characterize the structure of causal interaction in factorial experiments, we propose a new causal interaction effect, called the average marginal interaction effect (AMIE). Unlike the conventional interaction effect, the relative magnitude of the AMIE does not depend on the choice of baseline conditions, making its interpretation intuitive even for higher-order interactions. We show that the AMIE can be nonparametrically estimated using ANOVA regression with weighted zero-sum constraints. Because the AMIEs are invariant to the choice of baseline conditions, we directly regularize them by collapsing levels and selecting factors within a penalized ANOVA framework. This regularized estimation procedure reduces false discovery rate and further facilitates interpretation. Finally, we apply the proposed methodology to the conjoint analysis of ethnic voting behavior in Africa and find clear patterns of causal interaction between politicians' ethnicity and their prior records. The proposed methodology is implemented in an open source software package.
Article
How does spatial scale affect support for public policy? Does supporting housing citywide but “Not In My Back Yard” (NIMBY) help explain why housing has become increasingly difficult to build in once-affordable cities? I use two original surveys to measure how support for new housing varies between the city scale and neighborhood scale. Together, an exit poll of 1,660 voters during the 2015 San Francisco election and a national survey of over 3,000 respondents provide the first experimental measurements of NIMBYism. While homeowners are sensitive to housing’s proximity, renters typically do not express NIMBYism. However, in high-rent cities, renters demonstrate NIMBYism on par with homeowners, despite continuing to support large increases in the housing supply citywide. These scale-dependent preferences not only help explain the deepening affordability crisis, but show how institutions can undersupply even widely supported public goods. When preferences are scale dependent, the scale of decision-making matters.
Article
Previous research suggests that female politicians face higher standards in public life, perhaps in part because female voters expect more from female politicians than from male politicians. Most of this research is based on observational evidence. We assess the relationship between accountability and gender using a novel survey vignette experiment fielded in the United Kingdom in which voters choose between a hypothetical incumbent (who could be male or female, corrupt or noncorrupt) and another candidate. We do not find that female politicians face significantly greater punishment for misconduct. However, the effect of politician gender on punishment varies by voter gender, with female voters in particular more likely to punish female politicians for misconduct. Our findings have implications for research on how descriptive representation affects electoral accountability and on why corruption tends to correlate negatively with women’s representation.
Article
Randomized experiments are increasingly used to study political phenomena because they can credibly estimate the average effect of a treatment on a population of interest. But political scientists are often interested in how effects vary across subpopulations—heterogeneous treatment effects—and how differences in the content of the treatment affects responses—the response to heterogeneous treatments. Several new methods have been introduced to estimate heterogeneous effects, but it is difficult to know if a method will perform well for a particular data set. Rather than using only one method, we show how an ensemble of methods—weighted averages of estimates from individual models increasingly used in machine learning—accurately measure heterogeneous effects. Building on a large literature on ensemble methods, we show how the weighting of methods can contribute to accurate estimation of heterogeneous treatment effects and demonstrate how pooling models lead to superior performance to individual methods across diverse problems. We apply the ensemble method to two experiments, illuminating how the ensemble method for heterogeneous treatment effects facilitates exploratory analysis of treatment effects.
Article
In the absence of party labels, voters must use other information to determine whom to support. The institution of nonpartisan elections, therefore, may impact voter choice by increasing the weight that voters place on candidate dimensions other than partisanship. We hypothesize that in nonpartisan elections, voters will exhibit a stronger preference for candidates with greater career and political experience, as well as candidates who can successfully signal partisan or ideological affiliation without directly using labels. To test these hypotheses, we conducted conjoint survey experiments on both nationally representative and convenience samples that vary the presence or absence of partisan information. The primary result of these experiments indicates that when voters cannot rely on party labels, they give greater weight to candidate experience. We find that this process unfolds differently for respondents of different partisan affiliations: Republicans respond to the removal of partisan information by giving greater weight to job experience while Democrats respond by giving greater weight to political experience. Our results lend microfoundational support to the notion that partisan information can crowd out other kinds of candidate information.