UNDER WHAT CONDITIONS? STEREOTYPE THREAT AND PRIME ATTRIBUTES

Article (PDF Available) · January 2009with 176 Reads
Cite this publication
Abstract
The experimental literature has consistently uncovered evidence of stereotype threat, but the field studies evaluating stereotype threat have not. An important difference between most lab and field studies is the primes used. I explore this issue through a meta-analysis correlating cross-study variation in primes and stereotype findings. I also conduct a lab experiment systematically manipulating gender primes to examine the effects on the gender gap in math. The results from both exercises suggest that stereotypical primes (i.e., presenting a stereotype in the presumed direction) predict stereotype threat, except when they are also self-affirming (i.e., allowing students to state an opinion, make an argument against the stereotype, or re-affirm positive values about themselves), in which case an opposite reactance effect is more likely. These findings highlight the importance of framing and also suggest that seemingly conflicted findings between field and lab studies may be partially explained by differences in how primes are worded.
UNDER WHAT CONDITIONS?
STEREOTYPE THREAT AND PRIME ATTRIBUTES
Current Draft: October 15, 2009 1
First Draft: November 12, 2008
Thomas E. Wei
Harvard University
Abstract
The experimental literature has consistently uncovered evidence of stereotype threat, but the field
studies evaluating stereotype threat have not. An important difference between most lab and field
studies is the primes used. I explore this issue through a meta-analysis correlating cross-study
variation in primes and stereotype findings. I also conduct a lab experiment systematically
manipulating gender primes to examine the effects on the gender gap in math. The results from
both exercises suggest that stereotypical primes (i.e., presenting a stereotype in the presumed
direction) predict stereotype threat, except when they are also self-affirming (i.e., allowing students
to state an opinion, make an argument against the stereotype, or re-affirm positive values about
themselves), in which case an opposite reactance effect is more likely. These findings highlight the
importance of framing and also suggest that seemingly conflicted findings between field and lab
studies may be partially explained by differences in how primes are worded.
1 Comments welcome and can be sent to twei@fas.harvard.edu. Iris Bohnet, Susan Dynarski, Brian Jacob, Richard
Freeman, and Lawrence Katz are deserving of my gratitude for their essential guidance. I thank Ayres Heller and the
Harvard Decision Science Laboratory for facilities use, as well as the Women and Public Policy Program and Taubman
Center for State and Local Governments at Harvard Kennedy School for financial support. All remaining errors are my
own.
Stereotype threat suggests that priming a group along an identity domain (i.e., gender) for
which there exists a negative stereotype (i.e., girls cannot do math) may lead to worse performance
on a relevant task (i.e., math tests). This phenomenon has been well-documented in experimental
settings along many identity domains, including race and intelligence (Steele and Aronson, 1995),
gender and math (Spencer, Steele, and Quinn, 1999), and age and memory (Levy, 1996). Such
findings have been offered as an environmental factor for persistent gender and racial gaps.2
Nevertheless, results from the few large-sample field studies have been less consistent with lab
findings (cf. Cullen, Hardison, and Sackett, 2004; Stricker and Ward, 2004; Cullen, Waters and
Sackett, 2006; Wei, 2009). For instance, Wei (2009) – from this point forward referred to as the
“NAEP study” – exploits a quirk in the National Assessment of Educational Progress (NAEP) to
quasi-experimentally test for stereotype threat in a nationally representative sample of 9, 13, and 17
year-olds over a 20 year period. No stereotype effects are found for some primes, and reactance
effects are found for others, whereby girls who are gender primed perform relatively better.
Explanations for disparate findings include differences in lab and natural settings or in representative
and selected samples (cf. List, 2006; Levitt and List, 2007; Andreoni and Bernheim, 2009).
The recent literature suggests that stereotype threat is complex. Several conditions appear
relevant for the existence of stereotype threat, such as test difficulty (Neuville and Croizet, 2007),
high identification with the domain (Aronson et al, 1999), relevance of the stereotype to the tested
group (Spencer, Steele, and Quinn, 1999), and perception that ability is being evaluated (Steele and
Aronson, 1995). Another potential mediator of stereotype threat is the prime’s attributes (Wheeler
and Petty, 2001; Shih et al, 2002; Smith and White, 2002; Brown and Pinel, 2003; Kiefer and
Sekaquaptewa, 2007). Though the NAEP study finds that stereotype effects are sensitive to the type
2 For example, some economists have found that teachers’ own stereotypes influence how they rate their students (cf.
Lavy, 2004; Hanna and Linden, 2009; Mechtenberg, 2009). Combined with experimental evidence suggesting that
stereotype threat may reduce women’s desire to pursue math and science majors (Davies et al, 2002; Gupta and Bhawe,
2007), this highlights the importance of studying socio-psychological determinants of the gender gap in labor markets.
- 1 -
of primes used, few studies have systematically examined this relationship. Given findings from
behavioral economics and psychology that highlight the importance of framing (cf. Roth, 1995;
Burnham, McCabe, and Smith, 2000), this issue warrants further investigation.
In this paper, I more definitively reconcile the disparate findings between lab and field studies
of stereotype threat. I also continue the recent trend of mapping the stereotype threat “genome” by
systematically exploring another condition that may mediate stereotype threat – namely, the
characteristics of primes. I employ two methodological approaches: 1) a meta-analysis exploiting
cross-study variation in primes (i.e., asking the respondent to indicate gender, flashing stereotypically
feminine words, claiming men outperform women in math, or asking respondents to argue their
opinion on the gender-math stereotype), and 2) a lab experiment that systematically manipulates the
wording of primes to examine its effect on stereotype threat.
My findings suggest that how one words primes matters for stereotype threat. In particular, the
quantitative literature review reveals that stereotypical primes (i.e., primes that present a stereotype
in the typical direction) are generally associated with threat, except when they are also self-affirming
(i.e., primes that allow subjects to state an opinion, make an argument against the stereotype, or just
generally affirm positive values about themselves), in which case reactance is more likely. Since
most experimental studies have used primes that differ from the ones in the NAEP study, this may
partly explain why the field results do not appear to accord with the experimental literature.
When I systematically manipulate the prime attributes in a lab setting using the same primes
from the NAEP study, I obtain results that mirror the NAEP findings and that also accord with
both theoretical predictions and those borne from the literature analysis.3 In particular, I find that
certain primes generate stereotype reactance, whereby girls perform relatively better when primed,
while others yield no effects. With stereotype reactance, I find that priming reduces the gender test
3 The one caveat is that when using the same prime from prior lab experiments, I, as have some recent studies (cf. Fryer,
Levitt, and List, 2008), fail to replicate traditional stereotype threat. See section V for more discussion.
- 2 -
score gap by three-quarters, and that this effect is primarily driven by girls performing better rather
than boys performing worse, as hypothesized in the NAEP study.
These findings do not imply that we should have full faith in experimental extrapolation. They
do however suggest that there may be more to the story than simply that lab results are non-
generalizable to the field. They also add to the puzzle of why recent experiments have not replicated
the original stereotype threat finding even with the same primes. Most importantly, the findings
confirm that increasing the salience of gender may lead to very different effects depending on how
gender is primed. As in the NAEP study, stereotype language may actually reduce the gender gap in
test scores. This is important in light of the well-documented gender gaps in labor market outcomes
(Goldin, 1994; Altonji and Blank, 1998) and the considerable amount of economics research
devoted to trying to understand the sources of these gaps (cf. Blau and Kahn, 2000; Croson and
Gneezy, 2009).
The remainder of the paper is organized as follows: section I describes how the meta-analytic
studies were gathered and presents hypotheses for how prime attributes affect stereotype threat,
section II presents results from the quantitative literature summary, section III describes the design
of the lab experiment, section IV presents results from the experiment, and section V concludes.
I. Summarizing the Literature and Hypotheses
With numerous studies on stereotype threat since the seminal experiments in 1995, conducting
a quantitative literature review is both feasible and fitting. I model my analysis after the “vote
counting” methods from the educational resources debate (cf. Krueger, 2003). A search for
prospective studies was conducted using the primary psychological research database, PsychInfo,
entering keywords such as “stereotype threat” and “primes.” The website
“ReducingStereotypeThreat.org” run by Professors Steven Stroessner and Catherine Good was also
consulted for sources. This search led to the inclusion of 103 studies spanning from January 1995 to
- 3 -
June 2008. Since some studies contain multiple experiments, all relevant experiments from each
study are included as separate data points, with a final sample size of 163 estimates.
Prospective studies were screened on several criteria. First, they had to be randomized
experimental studies rather than correlational ones. Second, they had to be published in peer-
reviewed academic journals. Third, they had to contain task performance scores as a dependent
measure (i.e., math test, intelligence test, athletic ability test, or memory test). Fourth, they had to
include a sample for which a negative stereotype about the group was being studied. For example, a
study examining how white men perform on math tests after being primed with the gender-math
stereotype would be excluded because men are not stereotyped to be worse than women in math.
However, a study examining the same group of white men but measuring how they perform after
being primed with the race-math stereotype, would be included if the prime activates the stereotype
that whites are less mathematically talented than Asians.
For each valid experiment, I recorded the stereotyped group domain (i.e., sex, age, race) and
the performance task domain (i.e., math, memory, verbal). I also noted whether there were
significant stereotype threat, stereotype reactance, or null effects. Threat (reactance) occurs when
the performance of the stereotyped group is significantly worse (better) with the prime than with
no/control primes. Null effects occur when no significant differences are found.
Stereotype primes in each experiment were categorized along two dimensions: valence and self-
affirmation. Valence is the direction of the prime, either neutral (i.e., asking for the subject’s
gender), stereotypical (i.e., boys are better than girls at math), or counter-stereotypical (i.e., girls are
better than boys at math). Self-affirmation refers to whether the prime makes a claim (i.e., research
has shown that boys are better than girls at math) or instead allows subjects to self-affirm a
particular issue by stating his or her opinion or writing an open-ended response to a particular prime
(i.e., do you think math is more for boys than girls?).
- 4 -
For valence, a prime was coded as -1 (counter-stereotypical), 0 (neutral or silent on the
stereotype), or 1 (stereotypical). For self-affirmation, a prime was coded as either 0 (not self-
affirming) or 1 (self-affirming). This 3x2 paradigm yields six types; most primes fit neatly into one
type. For example, the primes from the NAEP study are “do you feel math is more for boys than
girls” (stereotypical and self-affirming) and “do you feel math is more for girls than boys” (counter-
stereotypical and self-affirming). Other common primes include manipulating experimenter race,
having participants state their gender (neutral and not self-affirming), or claiming “research has
shown that boys outperform girls on math tests” (stereotypical and not self-affirming).
There are a few primes that are subject to interpretation, however. The most common
example: “this test is a diagnostic of your intellectual ability.” This is clearly not self-affirming and
could be neutral since it does not make an explicit suggestion about the relative intelligence of blacks
and whites. However, it is conceivable to interpret this instead as stereotypical since the “intellectual
ability” phrase is so loaded that respondents may immediately link this to the black-white intelligence
stereotype, especially when coupled with a “state your race” question. To ensure that my results are
not being driven by ad hoc categorizations, I default code “intellectual ability diagnostic” primes as
neutral but also alternatively code them as stereotypical. I conduct analyses for all categorizations as
a sensitivity check.
The prime attributes paradigm is based on the NAEP study primes, and how they principally
differ from each other and from the primes typically used in experimental settings. This paradigm is
useful on practical grounds since one of my goals is to test whether differences in primes can explain
differences in lab and field results. It is also justifiable on theoretical grounds, as I am interested in
understanding how prime attributes affect whether we observe traditional threat, reactance, or no
effects. For example, “do you feel math is more for boys than girls” is stereotypical and self-
affirming. Instead of provoking anxiety, this prime may provide incentives for greater effort,
- 5 -
whereby girls wish to “prove everyone wrong” (see Dee (2009) for a formal model of this
mechanism). The priming question triggers this response by increasing the salience of the negative
stereotype while empowering girls by enabling them to explicitly reject the stereotype. This “saying-
is-believing” effect (Higgins and Rholes, 1978) is consistent with the NAEP study’s findings of
reactance with this prime and that girls tend to more strongly reject all gender stereotypes regarding
math than boys do. This is also consistent with other evidence suggesting that women who endorse
gender stereotypes are more susceptible to threat effects, while those who reject the stereotypes take
pride in themselves and in other women’s success when the stereotype is disconfirmed (Blanton,
Christie, and Dye, 2002; Schmader, Johns, and Barquissau, 2004).
The prime “do you feel math is more for girls than boys” did not yield any reactance effects in
the NAEP study. The hypothesized reason is that although it empowers girls by allowing them to
voice their opinions (self-affirming), it lacks the polarizing effect necessary for reactance because the
direction of the implied stereotype is the opposite of the norm (counter-stereotypical). Several
studies focused on identifying reactance have found that it arises when primes are explicit and high
valence (Aronson et al, 1999; Oswald and Harvey, 2000/2001; Kray, Galinsky, and Thompson,
2001; Inzlicht and Ben-Zeev, 2003; Kray et al, 2004).4
This mechanism implies that drawing attention to stereotypes and then having women voice
their rejection of those stereotypes could soften the detrimental performance effects of stereotype
threat. Several studies have found that informing students about stereotype threat and having
students affirm the malleability of aptitude can substantially reduce decrements (Aronson, Fried, and
Good, 2002; Good, Aronson, and Inzlicht, 2003; Dar-Nimrod and Heine, 2006). Thus, I wish to
verify in a more rigorous way what comes out of a casual literature review: a stereotypical prime is
likely to generate threat, except when it is also self-affirming, in which case reactance is more likely.
4 For example, directly challenging women with the claim that good negotiators possess traditionally masculine traits, or
exposing women to denigrating cartoons that illustrate girls having difficulty solving math problems.
- 6 -
II. Vote Counting Results
A. Summary Statistics
Table 1 displays summary statistics for the 103 studies in the literature review. The first panel
characterizes the level of inequality in the distribution of estimates across studies. Column 1 shows
the number of studies from which various numbers of estimates were extracted. For example, a
single estimate was extracted from 64 studies, while two estimates were extracted from 26 studies.
Multiplying the number of estimates by column 1 gives the total number of estimates from studies
of various types in column 2. For example, 64 total estimates were gathered from studies with one
estimate, while 52 total estimates were gathered from studies with two estimates, and so on for a
total of 163 estimates. Columns 3 and 4 convert the counts in columns 1 and 2 into percentages.
For example, 62.1 percent of studies had one estimate, which accounts for 39.3 percent of all
estimates. Similarly, 3.9 percent of studies had four estimates, which accounts for 9.8 percent of all
estimates. There is unequal representation across studies (i.e., each study does not always have only
one estimate), but it is clear that a few studies are not dominating the entire pool of estimates.
The second panel reports the percentage of estimates from experiments of various groups. A
majority, 63.2 percent, are from studies of gender, while 26.4 percent are from experiments on race.
The remaining 10.4 percent examined a mixture of domains including age, college major, socio-
economic status, sexual orientation, and social group (i.e., musicians vs. athletes).
The third panel summarizes the findings as significant threat, significant reactance, or
insignificant. Column 1 reports the distribution with equal weights for all 163 estimates. As noted
in the educational resources literature (Krueger, 2003), results may be sensitive to weighting
schemes. Thus, I report distributions with alternative weights in columns 2, 3, and 4. Column 2
gives equal weight to all 103 studies by weighting each estimate by the inverse number of estimates
taken from its study. Column 3 weights the estimates by journal impact factor, which was gathered
- 7 -
from the 2007 Journal Citation Reports issued by the Institute for Scientific Information. There are
a total of 37 academic journals from the 103 studies, and all but six journals are in the Journal
Citation Reports. These six journals account for six of the 103 studies and nine of the 163 estimates
(about six percent of both studies and estimates). I assign these journals the lowest value from the
sample of journals with impact factors. Most journals are from the field of psychology, with a few
miscellaneous journals from economics (American Economic Review) and broader academic arenas
(Science). Column 4 weights the estimates by the square root of each study’s sample size to adjust for
the precision of estimates.
The distribution of findings are very similar across weighting schemes, which makes one less
worried about bias in how estimates were extracted. Most studies have found significant threat (58
to 67 percent of estimates). More precisely, I compute the ratio of threat to non-threat findings,
which ranges from 1.40 to 2.00, depending on the weighting scheme. I then calculate the p-value
for the likelihood that such a ratio or higher would be observed if threat and non-threat findings
were equally likely. This calculation is based on a binomial distribution with 103 independent draws
to reflect the 103 studies in my sample. The p-values are significant across weighting schemes,
which confirms the premise from the literature that most experimental studies have confirmed the
existence of significant stereotype threat.
The fourth panel presents the number of estimates for the two attribute categories in my
model: valence and self-affirmation. A prime can be stereotypical, counter-stereotypical, or neutral
valence, and self-affirming or not. This yields six possible categorizations. There is variation in
primes, although primes are predominantly not self-affirming (149 estimates) and neutral or
stereotypical (72 and 76 estimates, respectively). The lower matrix reports the same cell counts but
using the alternate prime categorization described in section I. The primary difference is that many
neutral primes are instead categorized as stereotypical (now 39 and 106 estimates, respectively). This
- 8 -
reflects the fact that the prime “this test is an accurate measure of mathematical ability” actually may
promote the stereotype that boys outperform girls in math, rather than being neutral.
B. The Relationship Between Prime Attributes and Stereotype Threat
I next examine whether a systematic relationship exists between prime attributes and stereotype
findings by running the following ordinary least squares (OLS) regression models:
iiiii affirmthreatvalaffirmthreatvalreact
ε
β
β
β
+
×
+
+= )_(**_* 321 [1]
where i indexes the study estimate, and i
ε
is a random error term with known distribution. react is a
dummy variable that indicates whether or not significant reactance was found. The val_threat and
affirm variables are indicators for whether an estimate comes from a stereotypical or neutral prime
and whether an estimate comes from a self-affirming prime or not, respectively.5 1
β
estimates the
average difference in likelihood of finding reactance between a stereotypical or neutral and counter-
stereotypical prime, while 2
β
estimates the average difference in likelihood of finding reactance
between self-affirming and non-self-affirming primes. The coefficient on the interaction term ( 3
β
)
captures whether having both a stereotypical or neutral and self-affirming prime affects the
likelihood of reactance relative to other primes. This specification provides a simple test of the
hypothesis in section I: stereotypical primes are more likely to lead to threat, except when also self-
affirming, in which case reactance is more likely. The first part implies that 1
β
should be negative,
while the second part implies that 3
β
should be positive.
Columns 1 to 4 of Table 2 show estimates under the first valence categorization, with equal
weights for each estimate, equal weights for each study, journal impact factor weights, and sample
size weights, respectively. Columns 6 to 9 report analogous results but instead use the alternate
5 I combine the stereotypical and neutral valence categories for simplicity, since preliminary analyses show that these
valences yield similar effects.
- 9 -
valence categorization described in section I. All other things equal, including a stereotypical or
neutral prime decreases the likelihood of observing reactance by 49.4 to 78.4 percent on average,
relative to a counter-stereotypical prime. The main effects of self-affirmation are imprecise, but the
interaction of valence and self-affirmation is positive and significant. This suggests that self-
affirming and stereotypical or neutral primes are especially more likely to induce reactance, relative to
other primes. These results support my hypothesis and are robust to different methods of
categorizing prime valence and weighting.6
I also computed the effect size for each study, which is the difference in treatment and control
means divided by the pooled standard deviation, adjusting for sample sizes across groups (Cohen’s
d). Negative effect sizes indicate threat while positive ones indicate reactance. The full sample mean
effect size (standard deviation), weighted by one minus the p-value so that significant estimates
receive more weight, is -0.39 (0.74). The median effect size is -0.61. This confirms the notion that
studies typically find stereotype threat. I then ran the same regression in equation 1 with effect size
as the dependent variable (including p-value weights to account for estimate precision). Columns 5
and 10 in Table 2 show the results using the original and alternative valence categorizations,
respectively. Consistent with the findings using a binary indicator for reactance, a stereotypical or
neutral prime is associated with a significant 1.2 to 1.5 lower effect size (stronger stereotype threat)
than counter-stereotypical primes. However, a stereotypical prime that is also self-affirming, is
associated with a significant 1.1 to 1.2 higher effect size (stronger reactance).
I do not attempt to precisely distinguish between the effects of stereotypical vis-à-vis counter-
stereotypical primes that are both self-affirming (this is how the primes differed in the NAEP study).
6 Appendix Table 1 shows the same set of results but instead has an indicator for a stereotype threat finding as the
dependent variable. The results also accord with my main hypothesis: having a stereotypical or neutral prime increases
the likelihood of observing threat by a significant 58.8 to 82.2 percent, relative to counter-stereotypical primes.
However, threat is much less likely to be observed when the prime is both stereotypical or neutral and self-affirming, as
evidenced by the negative and statistically significant interaction term. As with the reactance regressions, these estimates
are robust to alternative valence categorizations and weighting schemes.
- 10 -
An examination of the cells in the bottom panel of Table 1 shows that there are few studies in the
literature that use self-affirming and counter-stereotypical primes. Absent such variation, attempts
to estimate these effects would yield low statistical power. I instead conduct a lab experiment that
directly explores this issue.
III. Experimental Design
The experiment complements the vote counting analysis by systematically manipulating the
types of primes given in a study, rather than exploiting differences in primes across studies. This
approach allows me to use the same primes from the NAEP study and directly compare effects with
the typical primes used in prior experiments. Undergraduate students from Harvard University were
recruited to participate in the Spring of 2009 via campus flyers, e-mail lists, and class
announcements. Participants were not informed about the true purpose of the experiment, but were
simply told that they would be participating in a “study of SAT-style math problems.” Participants
were promised a $15 show-up fee for a 30 minute session.7
Participants arrived at the experiment and were placed in a classroom together. The
experimenter was an Asian-American male, who was blind to which conditions participants were
assigned to. Participants were first asked to provide informed consent. Participants were next read
a common set of instructions by the experimenter, which stated that they would be taking a math
test. They were informed that wrong answers would not be penalized and that their score would be
reported to them at the end of the experiment, at which time they would also be paid the show-up
fee. Participants were then given a set of written instructions and a math test. The math test was
invariant across participants and contained 20 difficult multiple choice questions taken from SAT
and GRE study guides.
7 This experiment was modeled after the stereotype threat study conducted by Fryer, Levitt, and List (2008). A complete
set of experimental instructions are available upon request.
- 11 -
The written instructions determined which condition participants were assigned to. In the
control condition, participants were simply instructed that “there are 20 questions on this test, for
which you will have 20 minutes to complete. For each of the following questions, decide which is
the BEST of the choices given.” This corresponds to a situation with no gender prime (control).
There were three stereotype treatment conditions. Each one contained the same instructions as the
control and a gender prime. The first treatment condition’s prime was: “This is a diagnostic test of
your mathematical ability. As you may know, there have been some academic findings about gender
differences in math ability. The test you are going to take today is one where men have typically
outperformed women.” This prime is stereotypical and not self-affirming (diagnostic). The second
treatment condition’s prime was: “How do you feel about the following statement: ‘math is more for
boys than girls.’ Do you strongly disagree, disagree, undecided, agree, or strongly agree?” This
prime is stereotypical and self-affirming (math for boys). The third treatment condition’s prime was:
“How do you feel about the following statement: ‘math is more for girls than boys.’ Do you
strongly disagree, disagree, undecided, agree, or strongly agree?” This prime is counter-stereotypical
and self-affirming (math for girls). The math for boys and math for girls primes are the same primes used
in the NAEP study, whereas the diagnostic prime is similar to many experimental studies, including
the seminal one on gender and stereotype threat (Spencer, Steele, and Quinn, 1999).8
After each participant was randomly and blindly assigned a condition, the experimenter
provided ample time for participants to carefully read the instructions and reminded them just
before the start of the exam to read the instructions. Following this, participants were given 20
minutes to take the test, with the experimenter providing ten, five, and one minute warnings.
Immediately after taking the test, participants were asked to complete a brief questionnaire that
requested demographics (such as age, class standing, gender, and race) and information on SAT I
8 Ideally, additional primes would be given to better identify the effect of prime attributes on stereotype effects;
however, this must be balanced with the importance of having sufficient cell sizes for the sake of statistical power.
- 12 -
math test scores, college grade point average, test anxiety, guessing behavior, and effort levels.
Participants were then compensated and dismissed.
A total of 107 men and 80 women participated in the experiment, the majority (about three-
quarters) being freshmen and sophomores from large introductory courses such as “Principles of
Economics.” The number of participants for each gender in a given experimental cell (2x4 design)
ranged from 18 to 29. The experiment took place in 11 sessions over 5 days.
IV. Experimental Results
A. Summary Statistics and Randomization Checks
Table 3 shows group means for self-reported background variables by gender and experimental
condition. On average, participants scored highly on the SAT I math test. The sample is largely
White and Asian, and each session generally contained more men than women (about 53 to 62
percent men). In addition to demographics, information was also gathered on the participant’s
familiarity with the concept of stereotype threat and on the participant’s guess of how the
experimenter expected him or her to perform on the math test. Approximately one-quarter to one-
third of participants claimed to be familiar with stereotype threat. Mean experimenter performance
expectations was slightly less than 2 (“1” = worse than normal, “2” = about normal, “3” = better
than normal), indicating that participants generally believed that the experimenter expected normal
performance on the math test.
The bottom of Table 3 reports F-statistics (p-values) from a joint F-test of differences between
the variables shown plus a few others not in the table, such as indicators for missing SAT math
score or missing GPA, session room location, session day of week, and session time of day, for each
treatment group relative to the control, separately by gender. All but the math for girls and diagnostic
treatment groups for women yield insignificant F-statistics; moreover, the significant F-statistics for
the two groups are driven by math SAT score differences. When removed from the joint test, all F-
- 13 -
statistics are insignificant. Thus, the randomization appears successful, with generally good covariate
balance across conditions.
B. The Effect of Primes on Test Scores
Table 4 reports the coefficients 1
β
, 2
β
, and 3
β
from the following OLS regression:
iiiii Xdiaggirlsmathboysmathout
ε
β
β
β
+
+
+
+= *_*_* 321 [2]
where i indexes the participant, and i
ε
is a random error term with known distribution. out is the
dependent variable of interest. The math_boys, math_girls, and diag variables are indicators for
whether a participant was in the math for boys, math for girls, or diagnostic treatment groups. X is a
vector of controls including age, SAT math score, cumulative GPA, class standing, race, and
experimental session fixed effects. 1
β
, 2
β
, and 3
β
are estimates of the average difference in
outcomes relative to the omitted control group for the math for boys, math for girls, and diagnostic prime
groups, respectively. This approach identifies the average treatment effects for an outcome,
controlling for other relevant variables.
The first two columns show the treatment effects of primes on test scores separately for men
and women. Women with the math for boys prime exhibited stereotype reactance by performing a
significant 2 questions better relative to the control group mean of 8.17 questions correctly answered
out of 20. The other two primes showed no differences relative to the control women. The math for
boys women also performed significantly better than the other two prime groups, on average. For
men, there were no significant differences in questions correctly answered among the four
experimental groups.9
9 Since two treatment groups were imbalanced on math SAT score, I also ran these regressions excluding math SAT
score, as a sensitivity check. The results are virtually identical with all coefficients retaining their sign and significance.
- 14 -
The stereotype reactance effect is quite large. Consistent with empirical evidence showing a
math gender gap in favor of boys (Fryer and Levitt, 2008), men generally outperformed women on
the test. The control group gender gap is 3 questions (control mean score of 11.17 for men and 8.17
for women), which with a control group standard deviation of 3.36 is a practically and statistically
significant 0.9 standard deviations. However, the gender gap shrinks to a statistically insignificant
0.6 questions with the math for boys prime, about a three-quarters reduction in the gap.
These experimental findings are remarkably consistent with the findings from the NAEP study
that used the same gender primes (math for boys and math for girls). There is little evidence of
traditional stereotype threat (priming gender does not lead women to perform worse, both
absolutely and relative to men). There is instead evidence of stereotype reactance with women
performing significantly better for some primes. The NAEP study finds that the gender gap in
favor of men in the control group shrinks significantly in the group receiving the math for boys prime
(indicating reactance), whereas the gender gap does not significantly change for the group with the
math for girls prime. This mirrors the findings from the experiment and quantitative literature review.
Since an identical set of test questions was administered to all participants in this experiment,
cells can be directly compared in absolute levels, which was not possible in the NAEP study.10 The
experimental findings thus help clarify an unresolved issue from the NAEP study: was the gender
gap shrinkage due to girls performing better or boys performing worse with the math for boys prime?
This experiment shows a small and insignificant decline in men’s performance with the math for boys
prime relative to control men; on the other hand, women perform significantly better with the math for
boys prime relative to control women. Stereotype reactance appears to be primarily driven by girls
leveling up rather than boys leveling down.
10 Because students were not given identical questions, the NAEP study employed a differences-in-differences
identification strategy whereby gender gaps were compared between the prime and control groups. See Wei (2009) for
further details.
- 15 -
C. Potential Mechanisms
Columns 3 to 10 in Table 4 employ the regression approach in equation 2 with different
outcomes to explore the mechanisms driving the test score effects. Columns 3 and 4 examine the
effect of primes on stress, measured by asking the participant: “on a scale of 1 to 10, how stressful
was this test for you?” Low numbers correspond to low stress. The estimates are all insignificant,
suggesting negligible differences in stress between experimental groups.
A similar conclusion arises for the effect on effort, measured by asking the participant: “on a
scale of 1 to 10, how much effort did you put into doing well on this test?” Low numbers
correspond to low effort. Columns 5 and 6 show insignificant differences in effort between
experimental groups for men and women.
Columns 7 and 8 examine the effect on general test anxiety, measured as the average score of
the following three statements: “on a scale of 1 to 4 where 1 = almost never and 4 = almost always,
1. during important tests I feel tense; 2. during examinations I get so nervous that I forget facts I
really know; 3. doing very well on math assessments is important to me.” Similar to Fryer, Levitt,
and List (2008), I average these three scores to form an index of general test anxiety. Lower values
correspond to lower test anxiety. Group differences in general test anxiety are insignificant for
women. However, for men, math for girls and diagnostic primed participants exhibited significantly less
general test anxiety relative to the control group.
Columns 9 and 10 examine guessing behavior, which may proxy effort level. This was
measured by asking the participant: “toward the end of the experiment, did you guess the remaining
answers?” The dependent variable is an indicator that equals one if the participant answered “yes”
to this question. The coefficients can be interpreted as the difference in guessing likelihood between
the control and treatment groups. The results show insignificant effects, except that for men, math
for girls participants were a significant 34.5 percent more likely to guess relative to the control.
- 16 -
In sum, there are few group differences along the more subjective, non-test score outcomes in
columns 3 to 10 of Table 4. The variation in these outcomes is not well explained by the treatment
manipulations and other demographic covariates, as indicated by the small R-squared values.
As in the NAEP study, the vast majority of experimental participants receiving the math for boys
and math for girls attitude questions reject the gender stereotype about math in either direction. Still,
there are gender differences in attitudes, with women being more likely to strongly reject that math
is more for boys than girls (over 60 percent of women strongly disagree compared to 20 percent of
men). This was also true in the NAEP study and is consistent with the notion that the math for boys
prime empowers girls to disprove the stereotype by enabling them to reaffirm their rejection of the
stereotype and providing an immediate opportunity to demonstrate it. Although marginally
insignificant, column 5 in Table 4 shows math for boys primed women reporting higher effort levels
than any other group. Finally, there were a few women who, without solicitation, reiterated at the
end of the post-test questionnaire that they reject the math for boys stereotype. One wrote, “As a
dedicated mathie, I BEG to differ…”; another wrote, “I answered ‘strongly disagree’ to the ‘math is
more for boys than girls’ statement”; a third wrote, “I felt pressure to disprove the statement.”
These anecdotes provide some confirmation of how the priming mechanism functions, despite the
fact that the regression results from the non-test score outcomes are generally inconclusive.
D. Sensitivity Checks
Table 5 reports regression results on subgroups listed in the first row of the Table. The
treatment effect estimates are analogous to those from the OLS regressions reported in Table 4
(equation 2), but for all columns in Table 5, test scores is the outcome.11
As a manipulation check, a question was included at the end of the post-test questionnaire
inquiring whether there was anything in the written instructions besides the basic directions. In the
11 I also conducted this analysis on all other outcomes from Table 4, with similar results.
- 17 -
treatment conditions, the primes were placed beneath the directions. Most participants correctly
answered this manipulation check question, although some did not. This is not too concerning for a
few reasons. First, participants were explicitly asked, given designated time, and reminded to
carefully read everything on the instructions page. Second, the manipulation check question was
cryptically phrased, and so it is likely that many found it confusing or were unclear about what the
question was asking. This is suggested by the many “?” responses to the question. Nevertheless, to
be cautious, I estimate the treatment effects using only the subset of individuals who definitively
passed the manipulation check. The results, shown in columns 1 and 2, are reassuringly similar to
those from the full sample.
Finally, one boundary condition in past stereotype threat work is that participants should be
“highly identified with the domain” (Aronson et al, 1999). In other words, stereotype threat affects
those who care about doing well in math. To see if this accounts for the lack of traditional
stereotype threat in this experiment, I examine a subgroup that is more likely to include those highly
identified with the domain: high-scorers on the SAT I math test, defined as having a score above
750.12 The results in columns 3 and 4 are quite similar to the full sample results. The findings of
stereotype reactance in women with the math for boys prime and the lack of traditional stereotype
threat overall appear to be quite robust.13
V. Discussion
This study contributes to the literature in two ways: first, it sheds light on why findings from
the recent NAEP field study seem not to accord with those from prior lab experiments, and second,
12 Another approach to categorizing highly identified with the domain participants is to see who indicated that doing
“very well on math assessments is important to me.” However, since this question was asked at the end of the
experiment, it may be endogenous with respect to the treatment. Ignoring this problem and running a regression on
those indicating a 3 or 4 on a 4-point scale yield results that are very similar to the full sample.
13 I also ran these specifications with different combinations of covariates: for example, controlling for specific
experimental session characteristics such as time of day, day of week, classroom location, number of total participants,
and proportion of males, instead of using session fixed effects. The results are robust.
- 18 -
it systematically tests a potential mediator of stereotype threat – namely, the way primes are framed.
The findings, using two methodological approaches (lab experiment and meta-analysis), suggest that
rather than simply whether one primes or not, how one words primes also matters for stereotype
threat. In particular, the quantitative literature review reveals that stereotypical primes (i.e., primes
that present a stereotype in the typical direction) are generally associated with threat, except when
they are also self-affirming (i.e., primes that allow subjects to state an opinion, make an argument
against the stereotype, or generally affirm certain positive values about themselves), in which case
reactance is more likely.
When I systematically manipulate prime attributes in a controlled lab setting with the same
primes from the NAEP study, I obtain results that mirror the NAEP findings and that also mostly
accord with the theory and evidence from the literature analysis. In particular, there is stereotype
reactance in women who are given a stereotypical and self-affirming prime (i.e., do you agree or
disagree with the statement that ‘math is more for boys than girls’), but no effects for those who
receive a counter-stereotypical and self-affirming prime (i.e., do you agree or disagree with the
statement that ‘math is more for girls than boys’) or who receive a stereotypical and not self-
affirming prime (i.e., this test measures math ability and men generally outperform women on it).
For men, all prime manipulations yielded insignificant effects. The reactance effect was, moreover,
driven primarily by women performing better rather than men performing worse, as suggested but
not fully confirmed in the NAEP study.
The hypothesized mechanism is that a stereotypical and self-affirming prime, unlike the others,
polarizes women but at the same time empowers them by enabling them to reaffirm their rejection
of the stereotype and to immediately demonstrate it on a test. There is some empirical support for
this theoretical mechanism given the fact that women much more strongly reject the ‘math is more
for boys’ attitude than men do, and that some respondents provided unsolicited comments
- 19 -
reasserting their rejection of the stereotype at the end of the experiment. Subjective outcomes of
stress, effort, general test anxiety, and guessing behavior are inconclusive. Thus, rigorously
understanding how reactance effects work is still an open empirical question.
This study suggests that differences in findings between the field and lab may not be solely due
to experimenter demand effects, situational differences, or selected samples. It does not mean that
we should always feel confident about experimental extrapolation; however, it does suggest that
there may be more complexity than simply that lab results do not generalize. Most experimental
studies have used primes that differ from the ones in the field, particularly in the NAEP study, so
this could partially explain why the NAEP results do not accord with the basic findings from the
literature. Still, the NAEP study is one study from a small set of field studies on stereotype threat.
Future field studies should use a variety of primes to further explore this issue.
This study raises a puzzle for why some recent studies have not replicated the traditional
stereotype threat findings from the 1990s. Perhaps it can be explained by differing experimenter
demand effects: recent researchers may have a bias against replicating the traditional findings,
although it is unclear why this would be true. Moreover, the experimenter is typically blind to who
receives which prime, making it unlikely that the experimenter could have treated some participants
differently, even unconsciously. In this study, participants’ guesses about how the experimenter
expected him or her to perform did not differ by treatment condition.
Publication bias is another possible explanation: older studies require significant findings to
withstand the test of time in a way that recent studies have yet to face, and given the small sample
sizes used in labs, effect sizes have to be much larger to attain significance. For these reasons, only
the experiments with the strongest findings may survive, even if they are not representative of the
universe of experiments on stereotype threat. If stereotype threat is a truly small effect, then
statistically significant results necessarily overstate the effect sizes, given small samples and
- 20 -
underpowered lab experiments. If publication bias is strong, many of the published studies may
reflect significant results that are purely due to chance (Gelman and Weakliem, 2009). This is
unlikely however since so many studies have found stereotype threat, and it is difficult to believe
that only five percent of all experiments on stereotype threat were published. Moreover, the
interaction between prime types and findings is quite consistent. Instead, this may suggest that
stereotype threat has many boundary conditions. In other words, the phenomenon could be fragile,
which in itself may make replication a challenge.
Yet another explanation is that the seminal study of gender stereotype threat (Spencer, Steele,
and Quinn, 1999) used a control group that received a “stereotype nullifying” prime mentioning that
the test showed no gender differences, instead of a control group that received no prime. Even if in
both studies the ‘stereotype threat’ group performed identically, disparate results would be expected
if ‘no prime’ groups tend to perform worse or if ‘stereotype null’ groups tend to perform better.
Still, several studies that have used the same diagnostic prime and ‘no prime’ control groups as the
experiment presented in this paper, have found traditional stereotype threat (cf. Walsh, Hickey, and
Duffy, 1999; Wicherts, Dolan, and Hessen, 2005). This does nevertheless raise the important
conceptual question about counterfactuals: what is the baseline setting, from which a prime-induced
deviation would be considered a stereotype effect?
A final possibility is that something has fundamentally changed over time. Perhaps there is
greater awareness of stereotype threat,14 or perhaps norms have been changing and acting at a
14 Running the experimental regressions from Table 4, split by whether participants indicated that they were familiar with
stereotype threat, yields results consistent with the full sample for those unfamiliar with stereotype threat. For those
who claim to be familiar with it, there instead appears to be significant traditional stereotype threat in girls with the math
for girls and diagnostic primes. This contradicts the hypothesis that greater awareness explains the lack of stereotype threat.
However, using these subgroups is a flawed approach since the familiarity with stereotype threat question was asked at
the end of the experiment, thus making responses potentially endogenous with the treatment. Eyeballing the mean
responses in Table 3 suggests that those who were primed were more likely to state that they were familiar with
stereotype threat relative to the unprimed control group (although these differences are generally too small to attain
conventional levels of significance).
- 21 -
subconscious level, undetected by shifts in stated attitudes.15 A check of the vote counting literature
reveals that studies with significant threat findings are statistically significantly more likely to be
published in earlier years than studies with insignificant or reactance findings (p-value = 0.034).16
This speculative discussion uncovers many fascinating questions that are ripe for future research.
15 Attitudes regarding gender stereotypes did not change substantially over time in the NAEP study, although the time
frame (1978-1999) does not fully coincide with the recent wave of stereotype threat studies (1995-2008). It is possible
that stated attitudes reflect stereotypes at the distribution mean. So while students may not have changed their opinions
about the average boy and girl’s relative performance in math, they may now believe that those in the upper tail of the
math distribution are much less likely to be boys. This could account for the results, particularly since stereotype effects
are thought to be most potent at the tails of the distribution.
16 This may reflect publication bias if it is easier to publish insignificant findings on a previously well-established
phenomenon than on a phenomenon that has yet to garner substantial empirical support.
- 22 -
References 17
Abrams, D., Bryant, J., and Eller, A. (2006). An age apart: the effects of intergenerational contact
and stereotype threat on performance and intergroup bias. Psychology and Aging. 21, 691-702.
Adams, G., Garcia, D.M., Purdie-Vaughns, V., and Steele, C.M. (2006). The detrimental effects of a
suggestion of sexism in an instruction situation. Journal of Experimental Social Psychology. 42, 602-
615.
*Altonji, J.G., and Blank, R. (1998). Race and gender in the labor market. In O. Ashenfelter and D.
Card (Eds.), Handbook of Labor Economics Volume 3C. Amsterdam: Elsevier.
Ambady, N., Paik, S.K., Steele, J., Owen-Smith, A., and Mitchell, J.P. (2004). Deflecting negative
self-relevant stereotype activation: the effects of individuation. Journal of Experimental Social
Psychology. 40, 401-408.
Ambady, N., Shih, M., Kim, A., and Pittinsky, T.L. (2001). Stereotype susceptibility in children:
effects of identity activation on quantitative performance. Psychological Science. 12, 385-390.
Andreoletti, C., and Lachman, M.E. (2004). Susceptibility and resilience to memory aging
stereotypes: education matters more than age. Experimental Aging Research. 30, 129-148.
*Andreoni, J., and Bernheim, B.D. (2009). Social image and the 50-50 norm: a theoretical and
experimental analysis of audience effects. Forthcoming in Econometrica.
Aronson, J., Fried, C.B., and Good, C. (2002). Reducing the effects of stereotype threat on African-
American college students by shaping theories of intelligence. Journal of Experimental Social
Psychology. 38, 113-125.
Aronson, J., Lustina, M.J., Good, C., and Keough, K. (1999). When white men can’t do math:
necessary and sufficient factors in stereotype threat. Journal of Experimental Social Psychology. 35,
29-46.
Beilock, S.L., McConnell, A.R., and Rydell, R.J. (2007). Stereotype threat and working memory:
mechanisms, alleviation, and spillover. Journal of Experimental Psychology. 136, 256-276.
Ben-Zeev, T., Fein, S., and Inzlicht, M. (2005). Arousal and stereotype threat. Journal of Experimental
Social Psychology. 41, 174-181.
Bergeron, D.M., Block, C.J., and Echtenkamp, B.A. (2006). Disabling the able: stereotype threat and
women’s work performance. Human Performance. 19, 133-158.
*Blanton, H., Christie, C., and Dye, M. (2002). Social identity versus reference frame comparisons:
the moderating role of stereotype endorsement. Journal of Experimental Social Psychology. 38, 253-
267.
17 Those references marked with an asterisk (*) were not used in the quantitative literature analysis.
- 23 -
Blascovich, J., Spencer, S.J., Quinn, D., and Steele, C. (2001). African Americans and high blood
pressure: the role of stereotype threat. Psychological Science. 12, 225-229.
*Blau, F.D., and Kahn, L.M. (2000). Gender differences in pay. Journal of Economic Perspectives. 14,
75-99.
Bosson, J.K., Haymovitz, E.L., and Pinel, E.C. (2004). When saying and doing diverge: the effects
of stereotype threat on self-reported versus non-verbal anxiety. Journal of Experimental Social
Psychology. 40, 247-255.
Brown, R.P., and Day, E.A. (2006). The difference isn’t black and white: stereotype threat and the
race gap on Raven’s Advanced Progressive Matrices. Journal of Applied Psychology. 91, 979-985.
Brown, R.P., and Pinel, E.C. (2003). Stigma on my mind: individual differences in the experience of
stereotype threat. Journal of Experimental Social Psychology. 39, 626-633.
*Burnham, T., McCabe, K., and Smith, V.L. (2000). Friend-or-foe intentionality priming in an
extensive form trust game. Journal of Economic Behavior and Organization. 43, 57-73.
Cadinu, M., Maass, A., Frigerio, S., Impagliazzo, L., and Latinotti, S. (2003). Stereotype threat: the
effect of expectancy on performance. European Journal of Social Psychology. 33, 267-285.
Cadinu, M., Maass, A., Lombardo, M., and Frigerio, S. (2006). Stereotype threat: the moderating
role of locus of control beliefs. European Journal of Social Psychology. 36, 183-197.
Cadinu, M., Maass, A., Rosabianea, A., and Kiesner, J. (2005). Why do women underperform under
stereotype threat? Psychological Science. 16, 572-578.
Chasteen, A.L., and Bhattacharyya, S. (2005). How feelings of stereotype threat influence older
adults’ memory performance. Experimental Aging Research. 31, 235-260.
Cheryan, S., and Bodenhausen, G.V. (2000). When positive stereotypes threaten intellectual
performance: the psychological hazards of “model minority” status. Psychological Science. 11, 399-
402.
Cohen, G.L., and Garcia, J. (2005). “I am us”: negative stereotypes as collective threats. Journal of
Personality and Social Psychology. 89, 566-582.
Cohen, G.L., and Garcia, J., Apfel, N., and Master, A. (2006). Reducing the racial achievement gap:
a social-psychological intervention. Science. 313, 1307-1310.
Croizet, J.C., and Claire, T. (1998). Extending the concept of stereotype threat to social class: the
intellectual underperformance of students from low socioeconomic backgrounds. Personality and
Social Psychology Bulletin. 24, 588-594.
Croizet, J.C., Despres, G., Gauzins, M.E., Huguet, P., Leyens, J.P., and Meot, A. (2004). Stereotype
threat undermines intellectual performance by triggering a disruptive mental load. Personality and
Social Psychology Bulletin. 30, 721-731.
- 24 -
*Croson, R., and Gneezy, U. (2009). Gender differences in preferences. Journal of Economic Literature.
47, 1-27.
*Cullen, M.J., Hardison, C.M., and Sackett, P.R. (2004). Using SAT-grade and ability-job
performance relationships to test predictions derived from stereotype threat theory. Journal of
Applied Psychology. 89, 220-230.
*Cullen, M.J., Waters, S.D., and Sackett, P.R. (2006). Testing stereotype threat theory predictions
for math-identified and non-math-identified students by gender. Human Performance. 19, 421-
440.
Danaher, K., and Crandall, C.S. (2008). Stereotype threat in applied settings re-examined. Journal of
Applied Social Psychology. 38, 1639-1655.
Dar-Nimrod, I., and Heine, S.J. (2006). Exposure to scientific theories affects women’s math
performance. Science. 314, 435.
Davies, P.G., Spencer, S.J., Quinn, D.M., and Gerhardstein, R. (2002). Consuming images: how
television commercials that elicit stereotype threat can restrain women academically and
professionally. Personality and Social Psychology Bulletin. 28, 1615-1628.
Davies, P.G., Steele, C.M., and Spencer, S.J. (2005). Clearing the air: identity safety moderates the
effects of stereotype threat on women’s leadership aspirations. Journal of Personality and Social
Psychology. 88, 276-287.
Davis, C., Aronson, J., and Salinas, M. (2006). Shades of threat: racial identity as a moderator of
stereotype threat. Journal of Black Psychology. 32, 399-417.
*Dee, T.S. (2009). Stereotype threat and the student-athlete. NBER Working Paper No. 14705.
Eriksson, K., and Lindholm, T. (2007). Making gender matter: the role of gender-based
expectancies and gender identification on women’s and men’s math performance in Sweden.
Scandinavian Journal of Psychology. 48, 329-338.
Ford, T.E., Ferguson, M.A., Brooks, J.L., and Hagadone, K.M. (2004). Coping sense of humor
reduces effects of stereotype threat on women’s math performance. Personality and Social
Psychology Bulletin. 30, 643-653.
*Fryer, R.G., and Levitt, S.D. (2008). An analysis of the gender gap in mathematics. Working
Paper, Harvard University.
Fryer, R.G. Levitt, S.D., and List, J.A. (2008). Exploring the impact of financial incentives on
stereotype threat: evidence from a pilot study. American Economic Review. 98, 370-375.
*Gelman, A., and Weakliem, D. (2009). Of beauty, sex, and power: statistical challenges in
estimating small effects. Under revision for American Scientist.
- 25 -
*Goldin, C. (1994). Understanding the gender gap: an economic history of American women. In P.
Burstein (Ed.), Equal Employment Opportunity: Labor Market Discrimination and Public Policy. Edison,
NJ: Aldine Transaction.
Gonzales, P.M., Blanton, H., and Williams, K.J. (2002). The effects of stereotype threat and double-
minority status on the test performance of Latino women. Personality and Social Psychology Bulletin.
28, 659-670.
Good, C., Aronson, J., and Harder, J.A. (2008). Problems in the pipeline: stereotype threat and
women’s achievement in high-level math courses. Journal of Applied Developmental Psychology. 29,
17-28.
Good, C., Aronson, J., and Inzlicht, M. (2003). Improving adolescents’ standardized test
performance: an intervention to reduce the effects of stereotype threat. Applied Developmental
Psychology. 24, 645-662.
Gresky, D.M., Ten Eyck, L.L., Lord, C.G., and McIntyre, R.B. (2005). Effects of salient multiple
identities on women’s performance under mathematics stereotype threat. Sex Roles. 53, 703-
716.
*Gupta, V.K., and Bhawe, N.M. (2007). The influence of proactive personality and stereotype
threat on women’s entrepreneurial intentions. Journal of Leadership and Organizational Studies. 13,
73-85.
*Hanna, R., and Linden, L. (2009). Measuring discrimination in education. NBER Working Paper
No. 15057.
Harrison, L.A., Stevens, C.M., Monty, A.N., and Coakley, C.A. (2006). The consequences of
stereotype threat on the academic performance of white and non-white lower income college
students. Social Psychology of Education. 9, 341-357.
Hess, T.M., and Hinson, J.T. (2006). Age-related variation in the influences of aging stereotypes on
memory in adulthood. Psychology and Aging. 21, 621-625.
Hess, T.M., Hinson, J.T., and Statham, J.A. (2004). Explicit and implicit stereotype activation effects
on memory: do age and awareness moderate the impact of priming? Psychology and Aging. 19,
495-505.
*Higgins, E.T., and Rholes, W.S. (1978). “Saying is believing”: effects of message modification on
memory and liking for the person described. Journal of Experimental Social Psychology. 14, 363-378.
Huguet, P., and Regner, I. (2007). Stereotype threat among schoolgirls in quasi-ordinary classroom
circumstances. Journal of Educational Psychology. 99, 545-560.
Inzlicht, M., Aronson, J., Good, C., and McKay, L. (2006). A particular resiliency to threatening
environments. Journal of Experimental Social Psychology. 42, 323-336.
- 26 -
Inzlicht, M., and Ben-Zeev, T. (2000). A threatening intellectual environment: why females are
susceptible to experiencing problem-solving deficits in the presence of males. Psychological Science.
11, 365-371.
Inzlicht, M., and Ben-Zeev, T. (2003). Do high-achieving female students underperform in private?
The implications of threatening environments on intellectual processing. Journal of Educational
Psychology. 95, 796-805.
Johns, M., Schmader, T., and Martens, A. (2005). Knowing is half the battle. Psychological Science. 16,
175-179.
Josephs, R.A., Newman, M.L., Brown, R.P., and Beer, J.M. (2003). Status, testosterone, and human
intellectual performance: stereotype threat as status concern. Psychological Science. 14, 158-163.
Keller, J. (2002). Blatant stereotype threat and women’s math performance: self-handicapping as a
strategic means to cope with obtrusive negative performance expectations. Sex Roles. 47, 193-
198.
Keller, J. (2007a). Stereotype threat in classroom settings: the interactive effect of domain
identification, task difficulty and stereotype threat on female students’ maths performance.
British Journal of Educational Psychology. 77, 323-338.
Keller, J. (2007b). When negative stereotypic expectancies turn into challenge or threat: the
moderating role of regulatory focus. Swiss Journal of Psychology. 66, 163-168.
Kellow, J.T., and Jones, B.D. (2008). The effects of stereotypes on the achievement gap:
reexamining the academic performance of African American high school students. Journal of
Black Psychology. 34, 94-120.
Kiefer, A.K., and Sekaquaptewa, D. (2007). Implicit stereotypes and women’s math performance:
how implicit gender-math stereotypes influence women’s susceptibility to stereotype threat.
Journal of Experimental Social Psychology. 43, 825-832.
Klein, O., Pohl, S., and Ndagijimana, C. (2007). The influence of intergroup comparisons on
Africans’ intelligence test performance in a job selection context. The Journal of Psychology. 141,
453-467.
Koenig, A.M., and Eagly, A.H. (2005). Stereotype threat in men on a test of social sensitivity. Sex
Roles. 52, 489-496.
Kray, L.J., Galinsky, A., and Thompson, L. (2001). Battle of the sexes: gender stereotype
confirmation and reactance in negotiations. Journal of Personality and Social Psychology. 80, 942-958.
Kray, L.J., Galinsky, A., and Thompson, L. (2002). Reversing the gender gap in negotiations: an
exploration of stereotype regeneration. Organizational Behavior and Human Decision Processes. 87,
386-409.
- 27 -
*Kray, L.J., Reb, J., Galinsky, A.D., and Thompson, L. (2004). Stereotype reactance at the
bargaining table: the effect of stereotype activation and power on claiming and creating value.
Personality and Social Psychology Bulletin. 30, 399-411.
Krendell, A.C., Richeson, J.A., Kelley, W.M., and Heatherton, T.F. (2008). The negative
consequences of threat. Psychological Science. 19, 168-175.
*Krueger, A.B. (2003). Economic considerations and class size. The Economic Journal. 113, F34-F63.
*Lavy, V. (2004). Do gender stereotypes reduce girls’ human capital outcomes? Evidence from a
natural experiment. NBER Working Paper No. 10678.
Lesko, A.C., and Corpus, J.H. (2006). Discounting the difficult: how high math-identified women
respond to stereotype threat. Sex Roles. 54, 113-125.
*Levitt, S.D., and List, J.A. (2007). What do laboratory experiments measuring social preferences
reveal about the real world? Journal of Economic Perspectives. 21, 153-174.
Levy, B. (1996). Improving memory in old age through implicit self-stereotyping. Journal of
Personality and Social Psychology. 71, 1092-1107.
Leyens, J.P., Desert, M., Croizet, J.C., and Darcis, C. (2000). Stereotype threat: are lower status and
history of stigmatization preconditions of stereotype threat? Personality and Social Psychology
Bulletin. 26, 1189-1199.
*List, J.A. (2006). The behavioralist meets the market: measuring social preferences and reputation
effects in actual transactions. Journal of Political Economy. 114, 1-37.
Martens, A., Johns, M., Greenberg, J., and Schimel, J. (2006). Combating stereotype threat: the
effect of self-affirmation on women’s intellectual performance. Journal of Experimental Social
Psychology. 42, 236-243.
Marx, D.M., and Goff, P.A. (2005). Clearing the air: the effect of experimenter race on target’s test
performance and subjective experience. British Journal of Social Psychology. 44, 645-657.
Marx, D.M., and Roman, J.S. (2002). Female role models: protecting women’s math test
performance. Personality and Social Psychology Bulletin. 28, 1183-1193.
Marx, D.M., and Stapel, D.A. (2006a). Distinguishing stereotype threat from priming effects: on the
role of the social self and threat-based concerns. Journal of Personality and Social Psychology. 91,
243-254.
Marx, D.M., and Stapel, D.A. (2006b). It’s all in the timing: measuring emotional reactions to
stereotype threat before and after taking a test. European Journal of Social Psychology. 36, 687-698.
Marx, D.M., Stapel, D.A., and Muller, D. (2005). We can do it: the interplay of construal orientation
and social comparisons under threat. Journal of Personality and Social Psychology. 88, 432-446.
- 28 -
Mayer, D.M., and Hanges, P.J. (2003). Understanding the stereotype threat effect with “culture-
free” tests: an examination of its mediators and measurement. Human Performance. 16, 207-230.
McFarland, L.A., Lev-Arey, D.M., and Ziegert, J.C. (2003). An examination of stereotype threat in a
motivational context. Human Performance. 16, 181-205.
McGlone, M.S., and Aronson, J. (2006). Stereotype threat, identity salience, and spatial reasoning.
Journal of Applied Developmental Psychology. 27, 486-493.
McIntyre, R.B., Lord, C.G., Gresky, D.M., Ten Eyck, L.L, Frye, G.D., and Bond, C.F. (2005). A
social impact trend in the effects of role models on alleviating women’s mathematics stereotype
threat. Current Research in Social Psychology. 10, 116-136.
McIntyre, R.B., Paulson, R.M., and Lord, C.G. (2003). Alleviating women’s mathematics stereotype
threat through salience of group achievements. Journal of Experimental Social Psychology. 39, 83-90.
McKay, P.F., Doverspike, D., Bowen-Hilton, D., and Martin, Q.D. (2002). Stereotype threat effects
on the Raven Advanced Progressive Matrices scores of African Americans. Journal of Applied
Social Psychology. 32, 767-787.
McKay, P.F., Doverspike, D., Bowen-Hilton, D., and McKay, Q.D. (2003). The effects of
demographic variables and stereotype threat on black/white differences in cognitive ability test
performance. Journal of Business and Psychology. 18, 1-14.
McKown, C., and Weinstein, R.S. (2003). The development and consequences of stereotype
consciousness in middle childhood. Child Development. 74, 498-515.
*Mechtenberg, L. (2009). Cheap talk in the classroom: how biased grading at school explains gender
differences in achievements, career choices, and wages. Forthcoming in The Review of Economic
Studies.
Muzzatti, B., and Agnoli, F. (2007). Gender and mathematics: attitudes and stereotype threat
susceptibility in Italian children. Developmental Psychology. 43, 747-759.
Neuville, E., and Croizet, J.C. (2007). Can salience of gender identity impair math performance
among 7-8 years old girls? The moderating role of task difficulty. European Journal of Psychology of
Education. 22, 307-316.
Nguyen, H.H., O’Neal, A., and Ryan, A.M. (2003). Relating test-taking attitudes and skills and
stereotype threat effects to the racial gap in cognitive ability test performance. Human
Performance. 16, 261-293.
O’Brien, L.T., and Crandall, C.S. (2003). Stereotype threat and arousal: effects on women’s math
performance. Personality and Social Psychology Bulletin. 29, 782-789.
Osborne, J.W. (2007). Linking stereotype threat and anxiety. Educational Psychology. 27, 135-154.
- 29 -
*Oswald, D.L., and Harvey, R.D. (2000/2001). Hostile environments, stereotype threat, and math
performance among undergraduate women. Current Psychology: Developmental, Learning, Personality,
Social. 19, 338-356.
Ployhart, R.E., Ziergert, J.C., and McFarland, L.A. (2003). Understanding racial differences on
cognitive ability tests in selection contexts: an integration of stereotype threat and applicant
reactions research. Human Performance. 16, 231-259.
Quinn, D.M., and Spencer, S.J. (2001). The interference of stereotype threat with women’s
generation of mathematical problem-solving strategies. Journal of Social Issues. 57, 55-71.
Rahhal, T.A., Colcombe, S.J., and Hasher, L. (2001). Instructional manipulations and age
differences in memory: now you see them, now you don’t. Psychology and Aging. 16, 697-706.
Rosenthal, H.E.S., Crisp, R.J., and Suen, M.W. (2007). Improving performance expectancies in
stereotypic domains: task relevance and the reduction of stereotype threat. European Journal of
Social Psychology. 37, 586-597.
*Roth, A.E. (1995). Bargaining experiments. In J.H. Kagel and E.R. Alvin (Eds.), Handbook of
Experimental Economics. Princeton, NJ: Princeton University Press.
Sawyer, T.P., and Hollis-Sawyer, L.A. (2005). Predicting stereotype threat, test anxiety, and cognitive
ability test performance: an examination of three models. International Journal of Testing. 5, 225-
246.
Schimel, J., Arndt, J., Banko, K.M., and Cook, A. (2004). Not all self-affirmations were created
equal: the cognitive and social benefits of affirming the intrinsic (vs. extrinsic) self. Social
Cognition. 22, 75-99.
Schmader, T. (2002). Gender identification moderates stereotype threat effects on women’s math
performance. Journal of Experimental and Social Psychology. 38, 194-201.
Schmader, T., and Johns, M. (2003). Converging evidence that stereotype threat reduces working
memory capacity. Journal of Personality and Social Psychology. 85, 440-452.
Schmader, T., Johns, M., and Barquissau, M. (2004). The costs of accepting gender differences: the
role of stereotype endorsement in women’s experience in the math domain. Sex Roles. 50, 835-
850.
Seibt, B., and Forster, J. (2004). Stereotype threat and performance: how self-stereotypes influence
processing by inducing regulatory foci. Journal of Personality and Social Psychology. 87, 38-56.
Sekaquaptewa, D., and Thompson, M. (2002). The differential effects of solo status on members of
high- and low-status groups. Personality and Social Psychology Bulletin. 28, 694-707.
Sekaquaptewa, D., and Thompson, M. (2003). Solo status, stereotype threat, and performance
expectancies: their effects on women’s performance. Journal of Experimental Social Psychology. 39,
68-74.
- 30 -
Shih, M., Ambady, N., Richeson, J.A., Fujita, K., and Gray, H.M. (2002). Stereotype performance
boosts: the impact of self-relevance and the manner of stereotype activation. Journal of Personality
and Social Psychology. 83, 638-647.
Shih, M., Bonam, C., Sanchez, D., and Peck, C. (2007). The social construction of race: biracial
identity and vulnerability to stereotypes. Cultural Diversity and Ethnic Minority Psychology. 13, 125-
133.
Shih, M., Pittinsky, T.L., and Ambady, N. (1999). Stereotype susceptibility: identity salience and
shifts in quantitative performance. Psychological Science. 10, 80-83.
Shih, M., Pittinsky, T.L., and Trahan, A. (2006). Domain-specific effects of stereotypes on
performance. Self and Identity. 5, 1-14.
Smith, J.L., and White, P.H. (2002). An examination of implicitly activated, explicitly activated, and
nullified stereotypes on mathematical performance: it’s not just a woman’s issue. Sex Roles. 47,
179-191.
Spencer, S.J., Steele, C.M., and Quinn, D.M. (1999). Stereotype threat and women’s math
performance. Journal of Experimental Social Psychology. 35, 4-28.
Steele, C.M., and Aronson, J. (1995). Stereotype threat and the intellectual test performance of
African Americans. Journal of Personality and Social Psychology. 69, 797-811.
Stone, J., Lynch, C.I., Sjomeling, M., and Darley, J.M. (1999). Stereotype threat effects on black and
white athletic performance. Journal of Personality and Social Psychology. 77, 1213-1227.
Stone, J., and McWhinnie, C. (2008). Evidence that blatant versus subtle stereotype threat cues
impact performance through dual processes. Journal of Experimental Social Psychology. 44, 445-452.
Stricker, L.J., and Ward, W.C. (2004). Stereotype threat, inquiring about test takers’ ethnicity and
gender, and standardized test performance. Journal of Applied Social Psychology. 34, 665-693.
Von Hippel, W., von Hippel, C., Conway, L., Preacher, K.J., Schooler, J.W., and Radvansky, G.A.
(2005). Coping with stereotype threat: denial as an impression management strategy. Journal of
Personality and Social Psychology. 89, 22-35.
Walsh, M., Hickey, C., and Duffy, J. (1999). Influence of item content and stereotype situation on
gender differences in mathematical problem solving. Sex Roles. 41, 219-240.
Walton, G.M., and Cohen, G.L. (2007). A question of belonging: race, social fit, and achievement.
Journal of Personality and Social Psychology. 92, 82-96.
*Wei, T. (2009). Stereotype threat, gender, and math performance: evidence from the National
Assessment of Educational Progress. Working paper, Harvard University.
- 31 -
*Wheeler, S.C., and Petty, R.E. (2001). The effects of stereotype activation on behavior: a review of
the possible mechanisms. Psychological Bulletin. 127, 797-826.
Wicherts, J.M., Dolan, C.V., and Hessen, D.J. (2005). Stereotype threat and group differences in test
performance: a question of measurement invariance. Journal of Personality and Social Psychology. 89,
696-716.
Yopyk, D.J.A., and Prentice, D.A. (2005). Am I an athlete or a student? Identity salience and
stereotype threat in student-athletes. Basic and Applied Social Psychology. 27, 329-336.
- 32 -
TABLE 1 – SUMMARY STATISTICS OF STEREOTYPE THREAT LITERATURE
I Distribut on o Estimates: . i f
(1) (2) (3) (4)
Number of estimates extracted Number of
studies
Total number
of estimates
Percent of
studies
Percent of
estimates
1 64 64 62.1 39.3
2 26 52 25.2 31.9
3 8 24 7.8 14.7
4 4 16 3.9 9.8
7 1 7 1.0 4.3
Total 103 163 100.0 100.0
II. Group Domain:
Percent of
estimates
Sex 63.2
Race 26.4
Other * 10.4
III. F ndings: i
Percent of
estimates
Percent of
studies
Journal
Impact ***
Sample
Size +
Threat 62.0 66.7 66.4 58.4
Insignificant Results 21.5 17.6 13.3 25.6
Reactance 16.6 15.7 20.3 16.0
Ratio of Threat to No Threat 1.63 2.00 1.98 1.40
p-value ** 0.0088 0.0004 0.0004 0.0378
IV. Prime Attributes:
Self-Affirmation
Valence Not Self-
Affirming Self-Affirming Total:
Counter-stereotypical 14 1 15
Neutral 64 8 72
Stereotypical 71 5 76
Total: 149 14 163
Alternate Categorization:
Counter-stereotypical 16 2 18
Neutral 37 2 39
Stereotypical 96 10 106
Total: 149 14 163
Notes: *Other group domains include age, college major, socio-economic status, sexual orientation, and social
group (i.e., musician or athlete). **p-values come from a binomial distribution of the likelihood of
experiencing this ratio or higher, given 103 independent draws assuming an equal probability of getting threat
and no threat estimates. ***Estimates weighted by each study’s journal impact factor. + Estimates weighted
by the square root of each study’s sample size.
- 33 -
TABLE 2 – RELATIONSHIP BETWEEN PRIME ATTRIBUTES AND STEREOTYPE THREAT
Original Valence Categorization Alternate Valence Categorization
Equal
weight for
estimates
Equal
weight for
studies
Weights by
Journal
Impact
Weights by
Sample
Size
Weights by
p-value
Equal
weight for
estimates
Equal
weight for
studies
Weights
by Journal
Impact
Weights
by Sample
Size
Weights
by p-value
(1)
(2) (3) (4) (5) (6) (7) (8) (9) (10)
Stereotype or
Neutral
Valence
-0.749**
(0.069)
-0.722**
(0.069)
-0.722**
(0.092)
-0.784**
(0.063)
-1.484**
(0.141)
-0.580**
(0.072)
-0.545**
(0.070)
-0.494**
(0.087)
-0.630**
(0.067)
-1.195**
(0.150)
Self-
Affirming
0.214
(0.253)
0.250
(0.197)
0.227
(0.238)
0.188
(0.280)
0.255
(0.470)
-0.125
(0.203)
-0.079
(0.159)
-0.036
(0.188)
-0.201
(0.219)
0.192
(0.449)
Interaction
Term
0.518*
(0.263)
0.468**
(0.208)
0.576**
(0.246)
0.588**
(0.287)
1.097**
(0.491)
0.913**
(0.219)
0.882**
(0.176)
0.887**
(0.199)
1.019**
(0.232)
1.174**
(0.476)
R20.58 0.60 0.60 0.64 0.55 0.47 0.50 0.54 0.55 0.45
Notes: N = 163 for all regressions. Regressions are OLS estimates with dependent variable being the binary indicator of a significant reactance finding
(columns 1-4 and 6-9) or the standardized effect sizes as measured by Cohen’s d, with positive values indicating reactance (columns 5 and 10). The
mean effect size (standard deviation), weighted by p-values for the full sample is: -0.390 (0.737). Explanatory variables are an indicator whether the
study’s prime is stereotypical/neutral or not, an indicator whether the study’s prime is self-affirming or not, and the interaction between these two
indicators. Each column corresponds to a regression conducted with alternate prime valence categorizations and weighting schemes, as noted. See text
for discussion. * = significant at 10% level. ** = significant at 5% level.
- 34 -
TABLE 3 – EXPERIMENTAL SUMMARY STATISTICS
Women Men
Control Math for Boys Math for
Girls Diagnostic Control Math for
Boys
Math for
Girls Diagnostic
Age 18.94 18.85 18.68 18.96 19.55 18.85 19.11 18.79
SAT Math Score 732 718 759 763 720 722 733 755
Cumulative GPA 2.904 3.526 3.391 3.321 3.649 3.354 3.235 3.505
Sophomore 0.167 0.300 0.211 0.217 0.276 0.115 0.357 0.083
Junior 0.000 0.050 0.053 0.087 0.103 0.154 0.036 0.042
Senior 0.056 0.000 0.053 0.043 0.069 0.000 0.000 0.000
Asian 0.389 0.600 0.526 0.565 0.207 0.154 0.214 0.292
Black 0.056 0.050 0.000 0.043 0.034 0.115 0.071 0.042
Hispanic 0.056 0.100 0.053 0.043 0.000 0.000 0.036 0.000
Experimenter Expectation 1.888 1.900 1.737 1.783 1.655 1.808 1.732 2.000
Know Stereotype Threat 0.278 0.300 0.368 0.391 0.207 0.269 0.286 0.375
Percent Male in Session 0.575 0.565 0.535 0.525 0.570 0.582 0.594 0.619
Number in Session 19.0 19.8 20.2 19.3 19.9 19.5 19.3 20.3
Joint F-Stat (p-value) 0.10
(1.00)
1.54*
(0.05)
3.16**
(0.00) 0.00
(1.00)
0.11
(1.00)
1.09
(0.35)
Joint F-Stat (p-value), excluding
SAT Math Score 0.26
(0.99)
0.38
(0.99)
0.12
(1.00) 0.28
(0.99)
0.33
(0.99)
0.31
(0.99)
Number of Participants 18 20 19 23 29 26 28 24
Notes: Each cell shows means for the variable and experimental group indicated. Control participants received no pre-test prime. Math for Boys
participants received a pre-test question asking them to agree or disagree with the statement: “Do you feel math is more for boys than girls?” Math for
Girls participants received a pre-test question asking them to agree or disagree with the statement: “Do you feel math is more for girls than boys?”
Diagnostic participants received a pre-test statement indicating that the test is a measure of mathematical ability and that men have typically outperformed
women on this test. The “experimenter expectation” variable indicates whether the participant thought the experimenter wanted him or her to perform
worse than normal (=1), about normal (=2), or better than normal (=3). The “know stereotype threat” variable indicates whether the participant said
he or she was familiar with the concept of stereotype threat. The F-Statistics are from a joint test of significance for the indicated treatment group
relative to the controls, separately for each gender. In addition to the variables shown, the following variables were included in this test: indicators for
missing SAT score or missing GPA, session room location, session day of week, and session time of day. * = significant at 10% level. ** = significant
at 5% level.
- 35 -
TABLE 4 – MAIN EXPERIMENTAL RESULTS
Questions Correct Stress on Current Test Effort on Current
Test
General Test
Anxiety
Guess at End of
Test?
Women Men Women Men Women Men Women Men Women Men
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Control Mean
8.17 11.17 5.44 3.79 6.11 6.41 2.74 2.73 0.333 0.276
Math for Boys 2.057**
(0.885)
-0.520
(0.910)
-1.171
(0.997)
0.019
(0.666)
1.035
(0.869)
0.308
(0.696)
0.061
(0.253)
-0.315
(0.221)
0.037
(0.170)
0.142
(0.150)
Math for Girls -0.576
(0.841)
-0.771
(0.833)
-1.331
(0.948)
0.337
(0.610)
0.162
(0.826)
-0.196
(0.637)
-0.240
(0.240)
-0.449**
(0.202)
0.006
(0.162)
0.345**
(0.137)
Diagnostic -0.158
(0.787)
0.329
(0.881)
-1.021
(0.887)
-0.104
(0.645)
-0.013
(0.773)
-1.050
(0.673)
-0.149
(0.225)
-0.521**
(0.214)
-0.134
(0.151)
0.108
(0.145)
Adj-R20.35 0.37 0.13 0.11 0.06 0.13 0.01 0.04 0.13 0.10
Notes: Dependent variables are noted in the first row (number of test questions correctly answered, 10-point scale self-report of test stress, 10-point
scale self-report of test effort, 4-point aggregated index self-report of general test anxiety, and an indicator for whether guessing occurred near the end
of the test). Each row reports the coefficient from an OLS regression of the dependent variable on indicators for whether participants received the
Math for Boys (do you feel math is more for boys than girls?), Math for Girls (do you feel math is more for girls than boys?), or Diagnostic (this test
measures math ability; men typically outperform women on it) gender primes. These are the mean treatment effects of being primed in that manner
relative to not being primed (control). The control group’s mean score is also reported for reference. N = 80 for female sample, N = 107 for male
sample. Each column corresponds to a separate OLS regression. Results are shown separately for women and men. Although coefficient estimates are
omitted, each regression controls for age, SAT math score, cumulative GPA, class standing, race, and experimental session fixed effects. Standard
errors are in parentheses. See text for discussion. * = significant at 10% level. ** = significant at 5% level.
- 36 -
TABLE 5 – EXPERIMENTAL RESULTS BY SUBGROUP
Manipulation Check OK Math SAT Score Above 750
Women Men Women Men
(1) (2) (3) (4)
Control Mean 8.17 11.17 9.38 12.75
Math for Boys 2.033**
(0.913)
-0.719
(0.978)
2.813**
(1.193)
-0.153
(1.244)
Math for Girls -0.698
(0.887)
-0.795
(0.861)
-0.608
(1.044)
-1.855
(1.164)
Diagnostic -0.044
(0.849)
-0.060
(1.056)
-0.202
(1.056)
0.453
(1.225)
Adj-R20.33 0.31 0.52 0.38
N 75 96 46 64
Notes: Dependent variable is number of questions correctly answered. Each row reports the coefficient from
an OLS regression of the number of correct questions on indicators for whether participants received the
Math for Boys (do you feel math is more for boys than girls?), Math for Girls (do you feel math is more for girls
than boys?), or Diagnostic (this test measures math ability; men typically outperform women on it) gender
primes, restricted to the subsample indicated in the first row (participants satisfied the manipulation check if
they properly answered the post-exam question about the presence of a pre-test prime; participants are
treated as highly identified with the math domain if they scored above a 750 on the SAT math test). These
are the mean treatment effects of being primed in that manner relative to not being primed (control). The
control group’s mean score is also reported for reference. Each column corresponds to a separate OLS
regression. Results are shown separately for women and men. Although coefficient estimates are omitted,
each regression controls for age, SAT math score, cumulative GPA, class standing, race, and experimental
session fixed effects. Standard errors are in parentheses. See text for discussion. * = significant at 10% level.
** = significant at 5% level.
- 37 -
APPENDIX TABLE 1 – RELATIONSHIP BETWEEN PRIME ATTRIBUTES AND STEREOTYPE
THREAT
Dependent Variable = Threat
Original Valence Categorization Alternate Valence Categorization
Equal
weight for
estimates
Equal
weight for
studies
Weights by
Journal
Impact
Weights by
Sample
Size
Equal
weight for
estimates
Equal
weight for
studies
Weights
by Journal
Impact
Weights
by Sample
Size
(1) (2) (3) (4) (5) (6) (7) (8)
Stereotype
or Neutral
Valence
0.748**
(0.112)
0.810**
(0.106)
0.822**
(0.124)
0.702**
(0.119)
0.619**
(0.109)
0.707**
(0.100)
0.620**
(0.115)
0.588**
(0.117)
Self-
Affirming
0.000
(0.414)
0.000
(0.304)
0.000
(0.322)
0.000
(0.531)
-0.125
(0.309)
-0.105
(0.227)
-0.201
(0.248)
-0.109
(0.379)
Interaction
Term
-0.748*
(0.430)
-0.810**
(0.320)
-0.822**
(0.331)
-0.702
(0.545)
-0.619*
(0.333)
-0.707**
(0.251)
-0.620**
(0.262)
-0.588
(0.401)
R20.33 0.42 0.46 0.27 0.28 0.39 0.42 0.23
Notes: N = 163 for all regressions. Regressions are OLS estimates with binary dependent variable indicating
a significant threat finding. Explanatory variables are an indicator whether the study’s prime is
stereotypical/neutral valence or not, an indicator whether the study’s prime is self-affirming or not, and the
interaction between these two indicators. Each column corresponds to a regression conducted with alternate
prime valence categorizations and weighting schemes, as noted. See text for discussion. * = significant at
10% level. ** = significant at 5% level.
- 38 -
  • Article
    This paper provides experimental evidence on the effect of affirmative action (AA). In particular, we investigate whether affirmative action has a ”stereotype threat effect” – that is, whether AA cues a negative stereotype that leads individuals to conform to the stereotype and adversely affects their performance. Stereotype threat has been shown in the literature to be potentially significant for individuals who identify strongly with the domain of the stereotype and who engage in complex stereotype-relevant tasks. We therefore explore this question in the context of gender-based AA for a complex math task. In this context, the stereotype is most relevant for women with high math ability, and the stereotype threat effects can be expected to work in the opposite direction to AA’s competition effect that encourages women to compete. We find that, consistent with the presence of a stereotype threat, AA has an overall negative effect on the performance of high-ability women performing complex math tasks.
  • Research
    Full-text available
    Age-related attitudes and stereotypes about aging have been shown to impact a variety of outcomes, including older adults’ cognitive performance. Past research has often focused on memory, a variable which does show age-related decline. The present research explored the relationship of such attitudes and stereotypes with vocabulary, a novel outcome variable in this area that differs from memory in that it remains stable well into older adulthood. Study 1 used data from 3631 participants in a large dataset to explore the ability of attitudinal variables (i.e., aging satisfaction, aging expectations, and optimism) to predict vocabulary performance. Optimism was the only statistically significant attitudinal predictor of vocabulary, though it did not have strong practical significance. Study 2 was designed to test the impacts of experimentally manipulated age-based stereotype threat on vocabulary performance and confidence in vocabulary performance in 71 young adults and 74 older adults. Participants were presented with a stereotype threat manipulation stating that their age group was either expected to do better or worse on vocabulary performance than the other age group. Older adults performed better on the vocabulary task and were more confident in their performance, consistent with past findings. However, the hypothesized effect of stereotype threat on older adults’ vocabulary performance, related to young adults’, was not found. Interpretation of the impact of stereotype threat is limited, however, as many participants did not accurately respond to a manipulation check item. Overall, this research did not support a relationship between age-related attitudes, optimism, age-based stereotypes, and vocabulary performance.
  • Article
    The present research examined the relationship between number of successful role models and alleviation of performance deficits that women suffer under mathematics stereotype threat. Men and women were reminded of the stereotype, read brief biographies of 0-4 successful women, and took a difficult math test. Women who read no biographies scored worse than men; women who read 4 biographies scored as well as men. Increases in women's performance across the number of role models were consistent with a power function trend predicted by social impact theory (Latane, 1981). This relationship with social impact theory suggests new directions in understanding how role models alleviate stereotype threat.
  • Article
    At the highest levels of math achievement, gender differences in favor of men persist on standardized math tests. We hypothesize that stereotype threat depresses women's math performance through interfering with their ability to formulate problem-solving strategies. In Study 1, women underperformed in comparison to men on a word problem test, however, women and men performed equally when the word problems were converted into their numerical equivalents. In Study 2, men and women worked on difficult problems, either in a high- or reduced-stereo-type-threat condition. Problem-solving strategies were coded. When stereo-type threat was high, women were less able to formulate problem-solving strategies than when stereotype threat was reduced. The effect of stereotype threat on cognitive resources and the implications for gender differences in mathematical testing are discussed.
  • Article
    Full-text available
    Recent theory and research suggest that certain situational factors can harm women’s math test performance. The three studies presented here indicate that female role models can buffer women’s math test performance from the debilitating effects of these situational factors. In Study 1, women’s math test performance was protected when a competent female experimenter (i.e., a female role model) administered the test. Study 2 showed that it was the perception of the female experimenter’s math competence, not her physical presence, that safeguarded the math test performance of women. Study 3 revealed that learning about a competent female experimenter buffered women’s self-appraised math ability, which in turn led to successful performance on a challenging math test.
  • Article
    Research on “stereotype threat” (Aronson, Quinn, & Spencer, 1998; Steele, 1997; Steele & Aronson, 1995) suggests that the social stigma of intellectual inferiority borne by certain cultural minorities can undermine the standardized test performance and school outcomes of members of these groups. This research tested two assumptions about the necessary conditions for stereotype threat to impair intellectual test performance. First, we tested the hypothesis that to interfere with performance, stereotype threat requires neither a history of stigmatization nor internalized feelings of intellectual inferiority, but can arise and become disruptive as a result of situational pressures alone. Two experiments tested this notion with participants for whom no stereotype of low ability exists in the domain we tested and who, in fact, were selected for high ability in that domain (math-proficient white males). In Study 1 we induced stereotype threat by invoking a comparison with a minority group stereotyped to excel at math (Asians). As predicted, these stereotype-threatened white males performed worse on a difficult math test than a nonstereotype-threatened control group. Study 2 replicated this effect and further tested the assumption that stereotype threat is in part mediated by domain identification and, therefore, most likely to undermine the performances of individuals who are highly identified with the domain being tested. The results are discussed in terms of their implications for the development of stereotype threat theory as well as for standardized testing.
  • Article
    Full-text available
    Claude Steele’s stereotype threat hypothesis has attracted significant attention in recent years. This study tested one of the main tenets of his theory—that stereotype threat serves to increase individual anxiety levels, thus hurting performance—using real‐time measures of physiological arousal. Subjects were randomly assigned to either high or low stereotype threat conditions involving a challenging mathematics task while physiological measures of arousal were recorded. Results showed significant physiological reactance (skin conductance, skin temperature, blood pressure) as a function of a stereotype threat manipulation. These findings are consistent with the argument that stereotype threat manipulations either increase or decrease situational‐specific anxiety, and hold significant implications for thinking about fair assessment and testing practices in academic settings.
  • Article
    African American college students tend to obtain lower grades than their White counterparts, even when they enter college with equivalent test scores. Past research suggests that negative stereotypes impugning Black students' intellectual abilities play a role in this underperformance. Awareness of these stereotypes can psychologically threaten African Americans, a phenomenon known as “stereotype threat” (Steele & Aronson, 1995), which can in turn provoke responses that impair both academic performance and psychological engagement with academics. An experiment was performed to test a method of helping students resist these responses to stereotype threat. Specifically, students in the experimental condition of the experiment were encouraged to see intelligence—the object of the stereotype—as a malleable rather than fixed capacity. This mind-set was predicted to make students' performances less vulnerable to stereotype threat and help them maintain their psychological engagement with academics, both of which could help boost their college grades. Results were consistent with predictions. The African American students (and, to some degree, the White students) encouraged to view intelligence as malleable reported greater enjoyment of the academic process, greater academic engagement, and obtained higher grade point averages than their counterparts in two control groups.
  • Article
    Three studies investigated whether affirming the self intrinsically (vs. extrinsically) would reduce defensive concerns and improve cognitive and social functioning in evaluative contexts. Study 1 found that an intrinsic self-affirmation reduced self-handicapping and increased performance on a threatening serial subtraction task relative to an extrinsic self-affirmation. Study 2 replicated the effects of Study 1, showing that an intrinsic (vs. extrinsic) self-affirmation increased women's performance on a math test under conditions that arouse stereotype threat. A third study extended these findings to threatening social contexts. Focusing participants on intrinsic (vs. extrinsic) aspects of self reduced thoughts about social rejection prior to an evaluative social interaction. Discussion focused on the need for further investigation into the multifaceted nature of the self and self-esteem.
  • Article
    Full-text available
    This study investigated Black racial identity attitudes as a moderator of intellectual performance in potentially stereotype threatening situations. Ninety-eight African American students were randomly assigned to one of three stereotype threatening conditions: low threat, medium threat, or high threat. Analyses confirmed a stereotype threat effect with participants performing significantly better on the task in the low threat condition. Additional analyses of the test takers’ racial identity profiles under high and low threat conditions revealed a significant interaction between Internalization status attitudes and the type of threat condition. In the low stereotype threat condition, Internalization status attitudes moderated performance on the intellectual task (i.e., items from the verbal section of the GRE). In this condition, after controlling for SAT verbal score, students who strongly endorsed Internalization racial identity attitudes correctly solved more items than students who did not identify as strongly with Internalization status attitudes. Implications of these findings are discussed.
  • Article
    Full-text available
    Women in quantitative fields risk being personally reduced to negative stereotypes that allege a sex-based math inability. This situational predicament, termed stereotype threat, can undermine women’s performance and aspirations in all quantitative domains. Gender-stereotypic television commercials were employed in three studies to elicit the female stereotype among both men and women. Study 1 revealed that only women for whom the activated stereotype was self-relevant underperformed on a subsequent math test. Exposure to the stereotypic commercials led women taking an aptitude test in Study 2 to avoid math items in favor of verbal items. In Study 3, women who viewed the stereotypic commercials indicated less interest in educational/vocational options in which they were susceptible to stereotype threat (i.e., quantitative domains) and more interest in fields in which they were immune to stereotype threat (i.e., verbal domains).
  • Article
    Full-text available
    This study investigated the interactive influences of diagnosticity instructions, gender, and ethnicity as they related to task performance. In a laboratory experiment of 120 male and female, Latino and White college students, both a gender-based and an ethnicity-based stereotype-threat effect were found to influence performance on a test of mathematical and spatial ability. Closer inspection revealed that the gender effect was qualified by ethnicity, whereas the ethnicity effect was not qualified by gender. This suggests that the ethnicity of Latino women sensitized them to negative stereotypes about their gender, leading to a performance decrement in a context in which stereotype threat was activated. In contrast, it appeared that the gender of Latino women did not sensitize them to negative stereotypes about their ethnicity, because both male and female Latinos evidenced ethnicity-based stereotype threat. These findings have implications for the interplay between multiple group identities as they relate to concern for confirming negative stereotypes.