Content uploaded by David S. Yeager

Author content

All content in this area was uploaded by David S. Yeager on May 15, 2021

Content may be subject to copyright.

Teacher Mindsets Help Explain Where a

Growth Mindset Intervention Does and Doesn’t Work

In press at Psychological Science

Authors:

David S. Yeager1*, Jamie M. Carroll1, Jenny Buontempo1, Andrei Cimpian2, Spencer Woody1,

Robert Crosnoe1, Chandra Muller1, Jared Murray1, Pratik Mhatre1, Nicole Kersting3, Christopher

Hulleman4, Molly Kudym1, Mary Murphy5, Angela Duckworth6, Gregory M. Walton7, & Carol

S. Dweck7

Affiliations:

1 University of Texas at Austin, Austin, TX, 2 New York University, New York, NY,

3 University of Arizona, Tucson, AZ, 4 University of Virginia, Charlottesville, VA,

5 Indiana University, Bloomington, IN, 6 University of Pennsylvania, Philadelphia, PA,

7 Stanford University, Stanford, CA

* Address correspondence to David S. Yeager (dyeager@utexas.edu).

Acknowledgments:

This paper uses data from the National Study of Learning Mindsets (PI: D. Yeager; Co-Is: R.

Crosnoe, C. Dweck, C. Muller, B. Schneider, & G. Walton; doi.org/10.3886/ICPSR37353.v1),

which was made possible through methods and data systems created by the Project for Education

Research That Scales (PERTS, PI: Dave Paunesku), data collection carried out by ICF (Project

directors: Kate Flint and Alice Roberts), meetings hosted by the Mindset Scholars Network at the

Center for Advanced Study in the Behavioral Sciences, assistance from M. Levi, M. Shankar, T.

Brock, C. Romero, C. Macrander, T. Wilson, E. Konar, E. Horng, H. Bloom, and M. Weiss, and

funding from the Raikes Foundation, the William T. Grant Foundation, the Spencer Foundation,

the Bezos Family Foundation, the Character Lab, the Houston Endowment, and the President and

Dean of Humanities and Social Sciences at Stanford University. Writing of this paper was

supported by the National Institutes of Health under award number R01HD084772, the National

Science Foundation under grant numbers 1761179 and 2004831, the William T. Grant

Foundation under grant 189706, the William and Melinda Gates Foundation under grant

numbers OPP1197627 and INV-004519, the UBS Optimus Foundation under grant number

47515, and an Advanced Research Fellowship from the Jacobs Foundation to David Yeager.

This research was also supported by grant P2CHD042849, Population Research Center, awarded

to the Population Research Center at The University of Texas at Austin by the National Institutes

of Health. The content is solely the responsibility of the authors and does not necessarily

represent the official views of the National Institutes of Health, the National Science Foundation,

and other funders.

Teacher and Student Mindsets

2

Abstract

A growth mindset intervention teaches the belief that intellectual abilities can be

developed. Where does the intervention work best? A prior paper examined school-level

moderators using data from the National Study of Learning Mindsets (NSLM), which delivered a

short growth mindset intervention during the first year of high school. This paper uses the NSLM

to examine moderation by teachers’ mindsets and answers a new question: Can students

independently implement their growth mindsets in virtually any classroom culture, or must

students’ growth mindsets be supported by their teacher’s own growth mindsets (i.e., the mindset

+ supportive context hypothesis)? The present analysis (N = 9,167 student records matched with

N = 223 math teachers) supported the latter hypothesis. This result stood up to potentially

confounding teacher factors and to a conservative Bayesian analysis. Thus, sustaining growth

mindset effects may require contextual supports that allow the proffered beliefs to take root and

flourish.

Keywords: Wise interventions, Growth mindset, Motivation, Adolescence, Affordances,

Implicit theories.

Teacher and Student Mindsets

3

Teacher Mindsets Help Explain Where a

Growth Mindset Intervention Does and Doesn’t Work

Psychological interventions change the ways that people make sense of their experiences,

and have led to improvement in a wide variety of domains of importance to society and to public

policy (Harackiewicz & Priniski, 2018; Walton & Wilson, 2018). These interventions offer

people new beliefs that encourage them to tackle rather than avoid a challenge or to persist rather

than give up. To the extent that people put these beliefs into practice, the interventions can

improve outcomes months or even years later (see Brady et al., 2020).

For instance, a growth-mindset of intelligence intervention conveys to students the

malleability of intellectual abilities in response to hard work, effective strategies, and help from

others. Short (<50-minute), online growth mindset interventions evaluated in randomized

controlled trials—including two pre-registered replications—have improved the academic

outcomes of lower-achieving high school students and first-year college students (e.g., Yeager et

al., 2019; see Dweck & Yeager, 2019). These interventions seek to dispel a fixed mindset, the

idea that intellectual abilities cannot be changed, which has been associated with more “helpless”

responses to setbacks and lower achievement around the world (OECD, 2021).

Is successfully teaching students a growth mindset enough? A fundamental tension

centers on the role of the educational context. Should psychological interventions be thought of

as “giving” people adaptive beliefs that they can apply and reap the benefits from in almost any

context, even ones that do not directly support its use? Or do interventions simply offer beliefs

that must later be supported by the context if they are to bear fruit?

In a previous paper (Yeager et al., 2019), we examined the role of a school factor,

namely, the peer norms in a school, and found that the student growth mindset intervention could

Teacher and Student Mindsets

4

not overcome the obstacle of a peer culture that did not share or support growth mindset

behaviors, such as challenge seeking. Here we ask how the growth mindset intervention might

fare in classrooms led by teachers who endorse more of a fixed mindset (a less supportive

context for students’ growth mindsets) versus classrooms led by teachers who endorse more of a

growth mindset (a more supportive context).

Why Might a Growth Mindset Intervention Depend on Teacher Beliefs?

The present paper tests the viability of the “mindset + supportive context” hypothesis. In

this hypothesis, a teacher’s growth mindset acts as an “affordance” (Walton & Yeager, 2020;

also see Gibson, 1977) that can draw out a student’s nascent growth mindset and make it tenable

and actionable in the classroom.

i

This hypothesis grows out of the recognition that as people try

to implement a belief or behavior in a given context, they become aware of whether it is

beneficial and legitimate in that context by attending cues in their environments.

According to the mindset + supportive context hypothesis, teachers with a growth

mindset may convey how, in their class, mistakes are learning opportunities, not signs of low

ability, and back up this view with assignments and evaluations that reward continual

improvement (Canning, Muenks, Green, & Murphy, 2019; Muenks et al., 2020). This could

encourage a student to continue acting on their growth mindsets. By contrast, teachers with more

of a fixed mindset may implement practices that make a budding growth mindset inapplicable

and locally invalid. For instance, they may convey that only some students have the talent to get

an A, or say that not everyone is “a math person” (Rattan, Good, & Dweck, 2012; also see

Muenks et al., 2020). These messages could make students think that their intelligence would be

evaluated negatively if they had to work hard or if they asked a question that revealed their

confusion, discouraging students from acting out key growth mindset behaviors. According to

Teacher and Student Mindsets

5

this hypothesis, the intervention is like planting a “seed,” but one that will not take root and

flourish unless the “soil” is fertile (a classroom with growth mindset affordances) (see Walton &

Yeager, 2020).

Despite its intuitive appeal, the mindset + supportive context hypothesis was not a

foregone conclusion. Perhaps students are more like independent agents who can achieve in any

classroom context so long as they bring adaptive beliefs to the context and put forth effective

effort. Therefore, teachers’ mindsets could be irrelevant to the effectiveness of the intervention.

Research could even find stronger effects in a classroom led by teachers espousing more of a

fixed mindset. This would imply that the intervention fortifies students to find ways to achieve

(for example, by being less daunted by difficult tasks, working harder, persisting longer) even in

contexts that are not directly encouraging these behaviors (Canning et al., 2019; Leslie, Cimpian,

Meyer, & Freeland, 2015; Muenks et al., 2020). In this view, a student’s growth mindset could

be like an asset that can compensate for something lacking in the environment. Because no study

has examined classroom context moderators of the growth mindset intervention, a direct test of

the mindset + supportive context hypothesis was needed.

The Importance of Studying Treatment Effect Heterogeneity

Our attention to teachers’ mindsets as a moderating agent continues an important

development in psychological intervention research: a focus on treatment effect heterogeneity

(Tipton, Yeager, Iachan, & Schneider, 2019). Psychologists have often viewed heterogeneous

effects as a limitation, as meaning that the effects are unreliable, small, or applicable in too

limited a way, and therefore not important (for a discussion see Miller, 2019).

ii

But this view is

shifting. First, heterogeneity is now seen as the way things in the world actually are (Bryan,

Tipton, & Yeager, in press; Gelman & Loken, 2014). Nothing, and particularly no psychological

Teacher and Student Mindsets

6

phenomenon, works the same way for all people in all contexts. This fact that has been pointed

out for generations (Bronfenbrenner, 1977; Cronbach, 1957; Lewin, 1952), but it has only

recently begun to be appreciated sufficiently. Second, systematically probing where an

intervention does and does not work provides a unique opportunity to develop better theories and

interventions (Bryan et al., in press; McShane, Tackett, Böckenholt, & Gelman, 2019), including

by revealing mechanisms through which the intervention operates.

The Present Research

This study analyzed data from the National Study of Learning Mindsets (NSLM, Yeager,

2019), which was an intervention experiment conducted with a U.S. representative sample of 9th

grade students (registration: osf.io/tn6g4). The NSLM focused on the start of high school

because this is when academic standards often rise and when students establish a trajectory of

higher or lower academic achievement with lifelong consequences (Easton, Johnson, & Sartain,

2017). The NSLM was designed primarily to study treatment effect heterogeneity. The first

paper, as mentioned, focused on a school’s peer norms as a moderator (Yeager et al., 2019). The

second planned analysis, presented here, focuses on teacher factors. Teachers are important to

students directly because they lead the classroom and establish its culture. For example, teachers

create the norms for instruction, set the parameters for student participation, and control grading

and assessments, and thereby influence student motivation and engagement (Jackson, 2018;

Kraft, 2019).

The present focus on math grades (rather than overall GPA as in Yeager et al., 2019) is

motivated by the fact that students tend to find math challenging and anxiety-inducing (Hembree,

1990) and therefore a growth mindset might help students confront those challenges

productively. Further, our focus on math is relevant to policy. Success in 9th grade math is a

Teacher and Student Mindsets

7

gateway to a lifetime of advanced education, profitable careers, and even longevity (Carroll,

Muller, Grodsky, & Warren, 2017).

In this study of heterogeneous effects, what kinds of effect sizes should be expected?

Brief online growth mindset interventions have tended to improve the grades of lower-achieving

high school students by about .10 grade points (or .11 SD) (Yeager et al., 2019, 2016). This may

seem small relative to benchmarks from laboratory research, but that is not an appropriate

comparison for understanding intervention effects obtained in field settings (Kraft, 2020). An

entire year of learning in 9th grade math is worth .22 SD as assessed by achievement tests (Hill,

Bloom, Black, & Lipsey, 2008), and having a high-quality math teacher for a year during

adolescence, as compared to an average one, is worth .16 SD (Chetty, Friedman, & Rockoff,

2014). Expensive and comprehensive education reforms for adolescents show a median effect of

.03 SD. The largest effects top out at around 0.20 SD, with effects this large representing striking

outliers (Boulay et al., 2018). Thus, Kraft (2020) concluded that “effects of .15 or even .10 SD

should be considered large and impressive” (pg. 248) especially if the intervention is scalable,

rigorously evaluated, and assessed in terms of consequential, official outcomes (e.g. grades).

Method

Data

Data come from the NSLM, which as noted was a randomized trial conducted with

nationally representative sample of 9th grade students during the 2015-2016 school year (Yeager,

2019). The NSLM was approved by the IRBs at Stanford University, the University of Texas at

Austin, and ICF International. The current analysis, which focuses on math teachers, was central

to the original design of the study, appeared in our grant proposals, and referenced as the next

analysis in our previous pre-analysis plan (osf.io/afmb6/). The present study followed the Yeager

Teacher and Student Mindsets

8

at al. (2019) pre-registered analysis plan for every step that could be repeated from the first paper

(e.g., data processing, stopping rule, covariates, and statistical model). Analysis steps that are

unique to the present paper are outlined in detail in the SOM-R and previewed below. There was

no additional pre-registration for the present paper. Instead, we used a combination of a

conservative Bayesian analysis and a series of robustness tests to guard against false positives

and portray statistical uncertainty more accurately for analysis steps not specified in the pre-

registration. The two planned analyses (i.e., for the present paper and Yeager et al., 2019) were

conducted sequentially. The present study’s math teacher variables were not merged with the

student data until after the Yeager et al., (2019) analyses were completed.

The analytic sample included students with a valid condition variable, a math grade, and

their math teacher’s self-reported mindset (see online supplement Table S6). This sample

included 9,167 records (8,775 unique students, as some students had more than one math

teacher) nested within 223 unique teachers. It comprises 76% of the overall NSLM sample of

students with a math grade. Those who are missing data either could not be matched to a math

teacher or their math teacher did not answer the mindset questions. Missing data did not differ by

condition (see online supplement Table S7). We retained students who took two math courses

with different teachers, each of whom completed the survey. Listwise-deletion of them produced

the same results (see Table 2). In terms of math level, 7% of records were from a math class at a

level below Algebra 1, 70% were in Algebra 1; 19% were in Geometry, and 3% were in Algebra

II or above. Students were 50% female and racially diverse; 14% reported being Black/African-

American, 21% Latinx, 6% Asian, 4% Native American or Middle Eastern and 55% white, and

37% reported mothers with a bachelor’s degree. Teachers’ characteristics were similar to

Teacher and Student Mindsets

9

population estimates: 58% were female, 86% were white, non-Latinx, and 51% reported having

earned a master’s degree; they had been teaching an average of 13.83 years (SD = 9.95).

The previous, between-school analysis (Yeager et al. 2019) examined grades in all

subjects (math, science, English, and social studies). That analysis focused on the pre-registered

group of lower-achieving students (whose pre-treatment grades were below the school median)

because it would be harder to detect improvement among the already higher-achieving students

and because a previous pre-registered study had shown the effects to be concentrated among the

lower achievers (Yeager et al., 2016), which replicated prior work (Paunesku et al., 2015). The

current focus on math teachers and math grades, however, required us to include students at all

achievement levels, a decision we made before seeing the results. This is because classrooms are

smaller units than schools, so excluding half the sample would have left us with too few students

in many teachers’ classrooms and could have made estimates too imprecise. In addition, math

grades are on average substantially lower than in other subjects, probably because students in the

U.S. are tracked into advanced math classes earlier than in other subject areas, which suggests

that students overall tend to be in math classes that they find challenging. This means that fewer

students were already earning As, and more students’ grades could improve in response to an

intervention, particularly one focused on helping students engage with and learn from challenges.

(See Table 2 for supplementary analyses among low-achievers).

Procedure

The NSLM implemented a number of procedures that allowed it to be informative with

respect to contextual sources of intervention effect heterogeneity (Tipton et al., 2019). First,

students were randomly assigned on an individual basis (i.e., within classroom and school) to a

growth mindset intervention or a control group, while math teachers (who were unaware of

Teacher and Student Mindsets

10

condition and study procedures) were surveyed to measure their mindsets. Thus, each teacher in

the analytic sample had some students in the control group and some students in the treatment

group. Consequently, we could estimate a treatment effect for each teacher and examine

variation in effects across teachers. The study procedures appear in Figure 2 and are described in

more detail next. Additional information is reported in the technical documentation available

from ICPSR (Yeager, 2019) and in the supplemental material in Yeager et al. (2019).

Figure 2. The student and teacher data collection procedure in the National Study of Learning

Mindsets. Note: The non-seeing eye icon represents masking of condition assignments from

teachers, students, and researchers. The coin flip icon represents the moment of random

assignment. +About 85% of students took session 1 during the fall semester before the

Thanksgiving break, as planned; the rest took it in January or early February, to accommodate

school schedules. The pre-analysis plan specifying data processing rules can be found here:

osf.io/afmb6/. The data processing was carried out by MDRC, an independent research firm,

while unaware of condition assignment or results.

Data collection and processing. To reduce bias in the research process, three

professional research firms were contracted to form the sample, administer the intervention, and

collect all the data. ICF International selected and recruited a nationally-representative sample of

Teacher and Student Mindsets

11

public schools in the U.S. during the 2015-2016 academic year. Students within those schools

completed online surveys hosted by the firm PERTS, during which they were randomly assigned

to a growth mindset intervention or a control group. The final student response rates were high

(median student response rate across schools: 98%), and the recruited sample of schools closely

matched the population of interest (Gopalan & Tipton, 2018).

Random assignment to condition was conducted by the survey software at the student

level, with 50/50 probability, when students logged on to the survey for the first time. To prevent

expectancy effects, condition information was masked from involved parties, in that students did

not know there were two conditions (i.e. a “treatment” and a “control”) while teachers in the

school were not allowed to “take” the treatment, were not told the hypotheses of the study, and

were not told that students were randomly assigned to alternative conditions. The treatment and

control conditions looked remarkably similar, to reduce the likelihood that teachers saw a

difference. The intervention sessions generally occurred during electives (like health or PE), and

schools were discouraged from conducting sessions in math classes. Math teachers were not used

as proctors (usually, non-teaching staff coordinated data collection) so as to keep math teachers

as unaware of the study as possible. The intervention involved two ~25-minute sessions,

generally 1 to 4 weeks apart, and under 50 minutes in total for nearly all students. Immediately

after the second intervention session, students completed self-reports of mindsets (which served

as a manipulation check).

Prior to data collection, schools provided the research firm with a list of all instructors

who taught a math class that academic year with more than two 9th grade students—the

definition of a “9th grade math teacher” used here. This sample restriction was necessary because

each teacher would need both treated and control students to provide a within-teacher treatment

Teacher and Student Mindsets

12

effect estimate. All such teachers were invited to complete an approximately one-hour online

survey in return for a $40 incentive, and a large majority of teachers (86.8%) did so. This high

response rate reduced the likelihood that biased non-response could have affected the distribution

or validity of the teacher mindset measure.

The independent research firm ICF International obtained student survey data from the

technology vendor PERTS and administrative data (e.g., grades) from the schools and readied

both for final processing. MDRC, another independent research firm, then processed these data

following a registered pre-analysis plan. They were all unaware of students’ condition

assignments. Only then did our research team access the data and execute the planned analyses.

(In parallel, MDRC developed an independent evaluation report that reproduced the overall

intervention impacts and between-school heterogeneity results, Zhu, Garcia, & Alonzo, 2019).

Growth mindset intervention. The growth mindset intervention presented students with

information about how the brain learns and develops using this metaphor: The brain is like a

muscle that grows stronger (and smarter) when it learns from difficult challenges (Aronson et

al., 2002). Then, the intervention unpacked the meaning of this metaphor for experiences in

school, namely that struggles in school are not signs that one lacks ability but instead that one is

on the path to developing one’s abilities. Trusted sources—scientists, slightly older students,

prominent individuals in society—provided and supported these ideas. Students were then asked

to generate their own suggestions for putting a growth mindset into practice; for example, by

persisting in the face of difficulty, seeking out more challenging work, asking teachers for

appropriate help, and revising one’s learning strategies when needed, among others.

The intervention involved a number of other exercises designed to help students articulate

the growth mindset, how they could use it in their lives, and how other students like them might

Teacher and Student Mindsets

13

use it. It was deliberately not a lecture or an “exhortation,” so as to avoid the impression that the

intervention was telling young people what to think, since we know that for adolescents an

autonomy-threatening framing could be ineffective or even backfire. Instead, the intervention

treated young people as collaborators in the improvement of the intervention, sharing their own

unique expertise on what it is like to be a high school student. Additional detail on the

intervention (and control) groups appears in the supplement to Yeager et al., (2019) (also see the

SOM-R).

Control group. The control group was provided with interesting information about brain

functioning and its relation to memory and learning, but the program did not mention the

malleability of the brain or intellectual abilities. As in the growth mindset condition, trusted

sources—scientists, older peers, and prominent individuals in society—provided this information

and students were asked for their opinions and treated as having their own unique expertise. The

graphic art, headlines, and overall visual layout was very similar to the treatment, to help

students and teachers remain masked and to discourage comparison of materials. Because most

students were taking biology at the time, the neuroscience taught in the control group would have

added content above and beyond what students were learning in class and could even have

increased interest in science and in school. Indeed, students have sometimes found the control

material if anything more interesting than the treatment material (Yeager, Romero, et al., 2016).

In sum, the active control condition was designed to provide a rather rigorous test of the

effectiveness of the growth mindset intervention.

Measures

Primary outcome: Math grades. The primary dependent variable was students’ post-

treatment grades in their math course, which were generally recorded 7 or 8 months after the

Teacher and Student Mindsets

14

intervention. All math grades were obtained from schools’ official records. Grades ranged from 0

(an F) to 4.3 (an A+). The mean math GPA was 2.44 leaving considerable room for

improvement for many students.

Grades are the dependent variable of interest, not test scores, for three reasons. First,

grades are typically better predictors of college enrollment and lifespan outcomes than test

scores, and the signaling power of grades is apparent even though schools and teachers could

potentially inflate their grading scales (Pattison, Grodsky, & Muller, 2013). Thus, grades are

relevant for policy and for understanding trajectories of development. Second, grades represent

the accumulation of many different assignments (homework, quizzes, tests) and therefore signal

the kind of dedicated persistence that a growth mindset is designed to instill. Third, test scores

were not an option in this study because 9th grade is not always a grade in which state

achievement tests are administered, and most students did not have a math test score.

Primary moderator: Teacher mindset. Math teachers rated two fixed mindset

statements: “People have a certain amount of intelligence and they really can't do much to

change it” and “Being a top math student requires a special talent that just can’t be taught”

(1=Strongly agree, 6=Strongly disagree, M = 4.74, SD = 0.76). The first is a general fixed

mindset item intended to capture beliefs that might lead to mindset practices that are not specific

to math, such as not allowing students to revise and resubmit their work or discouraging low-

achievers’ questions. The second item captures a belief that could lead to more math-specific

mindset practices (see Leslie, Cimpian, Meyer, & Freeland, 2015). The two items were

correlated (r = .48, p < .001) and were averaged. We scored them so that higher values

corresponded to more growth mindset beliefs. We note that respondent time on this national

Teacher and Student Mindsets

15

math teacher survey was limited to encourage participation and survey completion, so every

construct, even teacher mindset, was limited to a small number of items.

The two mindset items used for the composite had not been administered to large samples

of high school math teachers before, so we assessed their concurrent validity by administering

them to a large, pilot sample of high school teachers along with items that assessed teacher

practices (N = 368 teachers). (The details of the sample and the exact item wordings are reported

in the SOM-R.) In the pilot, we found that teachers’ mindsets in fact predicted their endorsement

of practices expected to follow from teachers’ mindsets, based on theory and past research

(Canning et al., 2019; Haimovitz & Dweck, 2017; Leslie et al., 2015; Muenks et al., 2020).

Specifically, teachers’ endorsement of a growth mindset was positively associated with learning-

focused practices, r = .30, p<.001 (e.g., saying to a hypothetical struggling student, “Let’s see

what you don’t understand and I’ll explain it differently,” and not agreeing that, “It slows my

class down to encourage lower achievers to ask questions”). Further, teacher mindsets were

negatively associated with ability-focused practices (emphasizing raw ability and implying that

high effort was a negative sign about ability), r = .28, p<.001 (e.g., comforting a hypothetical

struggling student with “Don’t worry, it’s okay to not be a math person,” a la Rattan, Good, &

Dweck, 2012, and praising a succeeding student with “You’re lucky that you’re a math person”

or “It’s great that it’s so easy for you”). This is by no means an exhaustive list of potential

mindset teacher practices, and this is certainly not the only way to measure teacher practices. But

this validation study suggests that the teacher mindset measure captures differences in teachers

that extend to classroom practices—practices that the student growth mindset treatment could

either overcome or that could afford the opportunity for it to work.

Teacher and Student Mindsets

16

Potential confounds for teacher mindsets. Because only the student mindset

intervention was randomly assigned, and not teachers’ mindsets, other characteristics of teachers

could be correlated with their mindsets and with the magnitude of the intervention effect. For

instance, perhaps teachers’ growth mindsets are simply a proxy for competent and fair

instructional practices in general. To account for this possibility, we measured several potential

confounds for teacher mindsets: a video-based assessment of pedagogical content knowledge, a

fluid intelligence test for teachers, teachers’ masters-level preparation in education or math, and

an assessment of implicit racial bias. We call these “potential” confounds because, during the

design of the study, these were raised by at least one advisor to the study as something that could

interfere with the interpretation of teacher mindsets (although, in the end, these factors showed

rather weak associations with teacher mindsets; see Table S10). To this list of a priori,

theoretically-motivated teacher confounds, we added teacher race, gender, years teaching, and

whether they had heard of growth mindset before. We describe the potential confounds in the

supplement because their inclusion or exclusion does not change the sign, significance, or

magnitude of the key moderation results. To these potential teacher-level moderators we can also

add the pre-registered school-level moderators (challenge-seeking norms among students/peers,

school achievement level, and school percent racial/ethnic minority; see Yeager et al. 2019).

Adding these school factors in interaction with the treatment did not change the teacher mindset

interaction (see Table 2), suggesting that these factors examined previously (Yeager et al., 2019)

and the classroom-level factors examined here account for independent sources of moderation.

Last, in a post-hoc analysis we examined three student perceptions of the classroom climate that

could be confounded with teacher mindset: the level of cognitive challenge in the course, how

interesting the course was, and how much students thought the teacher was “good at teaching.”

Teacher and Student Mindsets

17

None of these factors were moderators and none altered the teacher mindset interaction (see

Table S11 in the SOM-R).

Manipulation check and moderator: Students’ mindset beliefs. At pre-test and again

at immediate post-test participants indicated their level of agreement with the three fixed-mindset

statements used as a manipulation check by Yeager et al. (2019) (e.g. “You have a certain

amount of intelligence, and you really can’t do much to change it.”, 1 = Strongly agree, 6 =

Strongly disagree). We averaged responses (Pre-test, M = 2.95, SD = 1.14; = .72; Post-test, M

= 2.70, SD = 1.19; = .78), and higher values corresponded to more of a fixed mindset. An

extensive discussion of the validity of this three-item mindset measure and its relation to the

growth mindset “meaning system” appears in Yeager and Dweck (2020). The scale at pre-test

was used in exploratory moderation analyses. The scale at post-test was used as a planned

manipulation check.

Student-level covariates. Student-level control variables related to achievement

included: the pre-treatment measure of low-achieving student status specified in the overall

NSLM pre-analysis plan (osf.io/afmb6/), which indicates that the student received an 8th grade

GPA below the median of other incoming 9th graders in the school; students’ expectations of

how well they would perform in math class (“Thinking about your skills and the difficulty of

your classes, how well do you think you’ll do in math in high school?”; 1=Extremely poorly to

7=Extremely well); students’ racial minority status, gender, and whether their mother had earned

a bachelor’s or above. These covariates were specified in the NSLM pre-analysis plan because

each could be related to achievement, and so a chance imbalance with respect to any of these

within a teacher’s classroom could bias treatment effect estimates. Controlling for these factors

reduces the influence of chance imbalances. Covariates were school-mean-centered.

Teacher and Student Mindsets

18

Analysis Plan

Estimands. The primary analysis focused on the sign and significance of the student

growth mindset intervention teacher growth mindset interaction. If the interaction was positive

and significant it would be more consistent with the mindset + supportive context hypothesis.

The primary estimands of interest (i.e., values we wished to estimate) were the simple

effects listed in Table 1. Row 1 assumes that teacher mindsets are unassociated with other

teacher factors, but this is not sufficiently conservative so it is not our primary analysis of

interest. Row 2 in Table 1 accounts for potential confounding in the interpretation of teachers’

mindsets by fixing the levels of potentially-confounding moderators to their population averages

(denoted by c in Table 1) and looking at the moderated effects of teacher mindsets (see row 2 of

Table 1). Thus, later when we present the key results in the paper in Table 2, those estimates

correspond to the estimands in row 2 of Table 1.

Table 1. Estimands of Interest: Conditional Average Treatment Effects (CATEs).

Teachers reporting fixed mindsets

Teachers reporting growth mindsets

(i.e. mindset + supportive context)

Assuming no

confounding of

the moderator

CATE S = Fixed =

CATE S = Growth =

E Adjusting

for potential

confounding

(primary

estimand of

interest)

CATE S = Fixed, C = c =

CATE S = Growth, C = c =

Note: CATE = Conditional average treatment effect, or the treatment effect within a subgroup. i

indexes students, j indexes teachers, Y = math grades, T (for treatment) = treatment status, S =

teacher mindset, C (for confounds) = vector of teacher mindset confounds, c = population

average for potential teacher or school confounds. See proofs and justifications in Yamamoto

and Yeager (2019).

Primary statistical model: Linear mixed effects analysis. The primary analysis

examined the cross-level interaction using a typical multilevel, linear mixed effects model, with

a random treatment effect that varied across teachers and was predicted by teacher-level factors,

Teacher and Student Mindsets

19

but with one twist: fixed teacher intercepts. Such a model has become the standard approach for

multi-site trial heterogeneity analyses (Bloom, Raudenbush, Weiss, & Porter, 2017) because the

fixed intercept for each group prevents biases from chance imbalances in the random assignment

to treatment within small groups. This “hybrid” (fixed intercept, random slope) approach can

make a big difference in the present analysis, since some teachers may have small numbers of

students and, due to random sampling error, be more likely to have chance imbalances.

iii

This is

why the fixed intercept, random slope model was specified in the NSLM pre-analysis plan

(Yeager et al., 2019). As in all standard multilevel models, the random slope allows different

teachers’ students to have different treatment effects, but uses corrections to avoid overstating

the heterogeneity (called an empirical Bayesian shrinkage estimator). Specifically, the model we

estimate appears in Eq. 1,

(1)

where yij is the math grade for student i in teacher j’s classroom. At the student level, is a

vector of k-2 student-level covariates (prior achievement, prior expectations for success,

race/ethnicity, gender, and parents’ education, all school-centered). At the teacher level, is a

fixed intercept for each teacher. The large section in brackets represents the multi-level

moderation portion of the model, our main interest. The student-level treatment status, , is

interacted with the continuous measure of teachers’ mindset beliefs () with controls for

potential confounds of teacher mindset beliefs (, a vector that includes implicit bias,

pedagogical content knowledge, fluid intelligence, and teacher master’s certification). The

teacher-level random error is and the student-level error term is .

The primary hypothesis test concerns the regression coefficient , which is the cross-

level student treatment teacher mindset interaction. When is positive and significant, it

Teacher and Student Mindsets

20

means that treatment effects are higher when teachers’ growth mindset scores are higher. The

case for a stronger interpretation of is bolstered if the coefficient’s sign and significance

persists even when accounting for the potential confounds indexed by . The model in Eq. 1

allows (teachers’ mindsets, the primary moderator) to remain a continuous variable. We

estimated the CATEs in Table 1 by implementing a standard approach in psychology: calculating

the treatment simple effect at -1 SD (teachers reporting relatively more of a fixed mindset) and

+1 SD (teachers reporting relatively more of a growth mindset), while holding confounding

moderators constant. We used the margins post-estimation command in Stata SE to do so. We

call the former teachers “relatively” more fixed mindset because their position on the scale

suggests they are in an intermediate group, not clearly growth mindset, but, on the whole, not

extremely fixed.

Secondary statistical model: Bayesian analysis. The primary model had at least one

major limitation: it presumed that all student and teacher-level variables had linear effects and

did not interact. The pre-analysis plan for the NSLM therefore stated that we would follow-up

the primary analysis by using a multi-level application of a flexible but conservative approach

called Bayesian Causal Forest (BCF), which relaxes the assumptions of nonlinearity and of no

higher-order interactions. BCF has been found, in multiple open competitions and simulation

studies, to detect true sources of complex treatment effect heterogeneity while not lending much

credence to noise (Hahn, Murray, & Carvalho, 2020). See Eq.2:

(2)

The BCF model in Eq. 2 retained the key features of the primary statistical model in Eq. 1:

teacher-specific intercepts, student-level covariates, random variation in the treatment effect

across teachers (unexplained by covariates), and potential confounds for teacher mindset beliefs

Teacher and Student Mindsets

21

(collected in the vector ). The most notable change is that BCF replaces the additive linear

functions from the primary model with the nonlinear functions and . These nonlinear

functions have “sum-of-trees” representations that can flexibly represent interactions and other

non-linearities (thus avoiding the researcher degree of freedom of specifying a functional form),

and that can allow the data to determine how and whether a given covariate contributes to the

model predictions (thus avoiding the researcher degree of freedom of covariate selection). The

nonlinear functions are estimated using machine-learning techniques. Bayesian Additive

Regression Trees (BART) prior distributions that shrink the functions toward simpler structures

(like additive or nearly additive functions) while allowing the data to speak. See the SOM-R for

more detail about the priors used for BCF.

From the BCF output, there is no single regression coefficient to interpret, as there would

be in a typical linear regression model, because the output of the BCF model is a richer posterior

distribution of treatment effect estimates for each of the 9,167 teacher mindset/student grade

records in the sample. This means that we do not have to set the moderator to + or -1 SD.

Instead, we can summarize the subgroup treatment effects for each level of teacher mindsets,

while holding all of the potential confounds constant at their population means (see Figure 2 for

the plot). We note that conducting subgroup comparisons or hypothesis tests does not entail

changes to the model fit or prior specifications. The data were used exactly one time, to move

from the prior distribution over treatment effects to the posterior distribution. This facilitates

honest Bayesian inference concerning subgroup effects and subgroup differences, and eliminates

concerns with multiple hypothesis testing that can threaten the validity of a frequentist p-value

(Woody, Carvalho, & Murray, 2020).

Teacher and Student Mindsets

22

The BCF analysis had another advantage: it could accommodate the fact that there were

researcher degrees of freedom about which aspect of math classrooms might moderate the

treatment effect—teacher mindsets, the other teacher variables, or qualities of the schools in

which teachers were embedded. BCF allowed all of these teacher and school factors to have the

same possibility of moderating the treatment effect, and gave them equal likelihood in the prior

distribution. In other words, BCF built uncertainty into the model output, which helped to guard

against spurious findings (see the SOM-R).

Results

Preliminary Analyses

Effectiveness of random assignment. The intervention and control groups did not differ

in terms of pre-random-assignment characteristics (see Table S5 and see Yeager et al. 2019).

Average effect on the manipulation check. The manipulation check was successful on

average. The growth mindset intervention led students to report lower fixed mindset beliefs

relative to the control group, (Control M = 2.91, SD = 1.17; Growth mindset M = 2.48, SD =

1.16), t=16.82, p<.001, d = .37.

Homogeneity of the manipulation check. The immediate treatment effect on student

mindsets (the beliefs students reported on the post-treatment manipulation check) was not

significantly moderated by teachers’ mindsets, B = .04 [95% CI: -.031, .102], t = 1.04, p = .297.

Further, there was very little cross-teacher variability in effects on the manipulation checks to

explain. According to the BCF model’s posterior distribution, the standard deviation of the

intervention effect across teachers was just 5% of the average intervention effect, which means

that the posterior prediction interval ranged from 90% to 110% of the average intervention

effect, a very narrow range. Here is what this means: treated students, regardless of their math

Teacher and Student Mindsets

23

teacher mindsets, ended the intervention session with similarly strong growth mindsets that could

be tried out. If we later found heterogeneous effects on math grades, measured months into the

future, it could reflect differences in the affordances that allowed students to act on their

mindsets in class.

Preliminary analyses of effect on math grades. A previous paper (Yeager et al., 2019,

Extended Data Table 1) and an independent impact evaluation (Zhu et al., 2019) reported the

significant main effect of the growth mindset treatment on math grades for the sample overall (p

= .001). Next, the present study’s BCF model found that there was about as much heterogeneity

in treatment effects across teachers (47% of the variation) as there was across schools (49%, with

the remaining 4% of variation coming from covariation between the two). Combined, these

analyses mean that the present paper was justified in focusing on heterogeneity in the treatment

effect on math grades independently from the school factors reported by Yeager et al., (2019).

Primary Analyses: Moderation by Teachers’ Mindsets

Linear mixed effects model. Teachers’ mindsets positively interacted with the

intervention effect on math grades: Student intervention Teacher mindset interaction B = .09

[95% CI: .026, .150], t = 2.79, p = .005 (see Eq. 1). This result was robust to changes to the

model, including consideration of the school-level moderators previously reported by Yeager et

al. (2019), and changes in the sub-sample of participating students (see Table 2).

Thus the data were consistent with the mindset + supportive context hypothesis: the

intervention could alter students’ mindsets, but a growth-affording context was necessary for

students’ grades to be improved. Students whose teachers did not clearly endorse growth mindset

beliefs showed a significant manipulation check effect immediately after the treatment, but their

math grades did not improve.

Teacher and Student Mindsets

24

Effect sizes. The CATEs (conditional average treatment effects) for students with more

fixed versus more growth mindset teachers are presented in Table 2. The effect for students in

classrooms with growth mindset teachers was 0.11 grade points and was significant at p<.001,

and there was no significant effect in classrooms of teachers reporting more of a fixed mindset

(compare columns 2 and 3). Notably, our primary analyses did not exclude students whose

grades could not have been lifted any further. If we limit our sample to the three-fourths of

students who were not already making straight As across all of their core classes before the

study, and who therefore had room to improve their grades, the estimated effect among students

in classrooms with growth mindset teachers becomes slightly larger, .14 grade points (see row 5,

Table 2).

The present analysis included a representative sample and used “intent-to-treat” analyses.

This means that we included students who could not speak or read English, who had visual or

physical impairments, who had attentional problems, whose computers malfunctioned, and more.

Thus, there were many students in the data who could not possibly have shown treatment effects.

This study therefore estimates effects that could be anticipated under naturalistic circumstances.

Teacher and Student Mindsets

25

Table 2. Effect of Growth Mindset Intervention on Math Grades in 9th Grade Among

Students with Fixed Versus Growth Mindset Math Teachers, Estimated in Linear Mixed

Effects Models.

Model specification

Teachers reporting more of

a fixed mindset

Teachers reporting more

of a growth mindset

Student intervention

Teacher mindset (continuous)

interaction

Primary Model Specification

Teacher mindset as

moderator + potential

teacher confounds,

(N = 9,167)

CATE = -.02 [-.074, .038],

t = -0.63, p = .531

CATE = .11 [.046, .167],

t = 3.46, p < .001

B = .09 [026, .150],

t = 2.79, p = .005

Robustness Test: Accounting for School-Level Moderators from Yeager et al. (2019)

Plus school-level

moderators,

(N = 9,167)

CATE = -.02 [-.075, .039]

t = 0.61, p = .542

CATE = .11 [.045, .168]

t = 3.37, p < .001

B = .09 [.025, .151],

t = 2.76, p = .006

Robustness Tests: Alternative Sub-samples#

Only students with

only one math teacher,

(N=8,383)

CATE = -.04 [-.108, .026]

t = -1.20, p = .230

CATE = .11 [.040, .170]

t = 3.18, p = .001

B = .09 [.028, .159],

t = 2.81, p = .005

Only previously-

lower-achieving (i.e.

below-median pre-

intervention GPA)

students ,

(N=4,811)

CATE = .02 [-.050, .097]

t = 0.63, p = .527

CATE = .13 [.067, .196]

t = 4.01, p < .001

B = .09 [.008, .165],

t = 2.17, p = .030

Only students

previously without

straight As,

(N=6,958)

CATE = -.01 [-.062, .041]

t = -0.39, p = .696

CATE = .14 [.071, .203]

t = 4.07, p < .001

B = .11 [.040, .180],

t = 3.10, p = .002

Note: CATE = Conditional average treatment effect, in GPA units (0 to 4.3 scale) estimated with

the margins postestimation command in Stata SE, holding potentially-confounding moderators

constant at their population means. All CATES estimated using teacher survey weights provided

by ICF International to make the estimates generalizable to the nation as a whole. Teachers with

more of a growth mindset in this analysis are those reporting mindset at +1 SD for the continuous

teacher mindset measure, while teachers with more of a fixed mindset are at -1 SD. Numbers in

brackets represent 95% confidence intervals. Regression model specified in Eq. 1. B =

unstandardized regression coefficient (i.e. expected treatment effects on GPA). This was the

pre-registered subgroup in Yeager et al. (2019). # Models included all teacher-level moderators.

Teacher and Student Mindsets

26

Bayesian machine-learning analysis. The BCF analyses yielded conclusions consistent

with the primary linear mixed effects model. First, there was a positive zero-order correlation of

r(223) = .55 between teachers’ mindsets and the estimated magnitude of the classroom’s

treatment effect (i.e., the posterior mean for the CATE for each teacher), which mirrors the

moderation results of the primary linear model. Figure 2, which depicts the posterior distribution

for each level of teacher mindset, holding all other moderators constant at the population mean,

shows no overlap between the interquartile range (IQR) for teachers with more of a growth

mindset (5 or 5.5) and the IQR for teachers with more of a fixed mindset (4 or lower). This

supports the conclusion of a positive interaction, again consistent with the mindset + supportive

context hypothesis.

The model also shows that teachers who strongly endorse growth mindset beliefs show a

positive average intervention effect greater than zero with approximately 100% certainty (see

Figure 2), confirming the results of the simple effects analysis from the linear model. We note

again that the BCF model is relatively conservative. It utilizes a prior distribution centered at a

homogeneous treatment effect of zero. This should be taken as strong evidence of moderation

and strong evidence that the intervention was effective for students of growth mindset teachers.

The BCF analysis also yielded new evidence that extended the primary linear model’s results.

Figure 2 shows that teachers’ growth mindsets were related to higher treatment effect sizes in a

linear fashion for most of the distribution, but there was no marginal increase in treatment effects

when teachers endorsed a growth mindset to an even greater extent once they were already high

on the scale (see the rightmost groups of teachers in Figure 2). The non-linearity, discovered by

the BCF analysis, should invite further investigation into whether teachers already endorsing a

very high growth mindset are using practices that encourage all of their students (even those in

Teacher and Student Mindsets

27

the control group) to engage in growth mindset behaviors, potentially narrowing the contrast

between treatment and control group students.

Figure 2. Evidence for the mindset + supportive context hypothesis regarding teacher mindsets

and a student mindset intervention—up to a point—in a flexible Bayesian Causal Forest

model. Note: Posterior distributions are of the conditional average treatment effect (CATE), as a

proportion of the average treatment effect (ATE). Thus, 100% means the CATE is equal to the

population ATE. Red dots represent the estimated intervention effect (posterior means) at each

level of teacher mindset. The widths of the bars, from wide to narrow, represent the middle 50%

(i.e., IQR), 80% and 90% of the posterior distribution, respectively. The teacher mindset measure

ranges from 1 to 6. The dashed vertical line represents the population mean for teacher mindsets.

However, the x-axis stops at 3 because only five teachers had a mindset score below this and the

model cannot make precise predictions with so few teachers.

Exploratory Analyses of Baseline Student Mindsets

The brief, direct-to-student growth mindset intervention did not appear to overcome local

contextual factors that can suppress achievement (e.g., a teacher with a fixed mindset). Could it

address individual risk factors suppressing achievement, such as the student’s own fixed

mindset? A slight suggestion of this possibility appeared in one of the original growth-mindset

intervention experiments (Blackwell, Trzesniewski, & Dweck, 2007); a student’s prior growth

mindset negatively interacted with the intervention effect, but the result was imprecise (p = .07).

To revisit this question, we added students’ baseline mindsets as a moderator in the present

Teacher and Student Mindsets

28

study’s primary linear mixed effects model. We found a significant negative interaction with

student baseline growth mindsets, B = -.06 [95% CI: .018, .098], t = 2.85, p = .004, suggesting

stronger effects for students with more of a fixed mindset. Thus the (marginal) Blackwell et al.

(2007) moderation finding was borne out. This interaction was additive with, but not interactive

with, the teacher mindset interaction, which did not change in magnitude or significance by

including the student mindset interaction (two-way still p = .005; three-way interaction p > .20).

Exploring the CATEs, students reporting more fixed mindsets at baseline (- 1 SD), in classrooms

with a teacher reporting more of a growth mindset (+1 SD), showed an intervention effect on

their math grades of 0.16 grade points [0.079, 0.234], t = 3.957, p<.001. By contrast, and there

was no significant effect among students who already reported a strong growth mindset in

growth mindset classes, and, as noted, no effect overall in more fixed-mindset classes.

Discussion

In this nationally-representative, double-blind clinical trial, successfully teaching a

growth mindset to students lifted math grades overall, but this was not enough for all students to

reap the benefits of a growth mindset intervention. Supportive classroom contexts also mattered.

Students who were in classrooms with teachers who espoused more of a fixed mindset did not

show gains in their math grades over 9th grade compared to the control group, whereas students

in classrooms with more growth mindset teachers showed meaningful gains. This finding

suggests that students cannot simply carry their newly enhanced growth mindset to any

environment and implement it there. Rather, the classroom environment needs to support, or at

least permit, the mindset, by providing necessary affordances (see Walton & Yeager, 2020).

In addition, we discovered that students who formerly reported more of a fixed mindset

and who went back into a classroom with a teacher who had more of a growth mindset showed

Teacher and Student Mindsets

29

larger gains in achievement than did students who began the study with more of a growth

mindset. This finding supports the Walton and Yeager (2020) hypothesis that individuals at the

intersection of vulnerability (prior fixed mindset) and opportunity (high affordances) are the

most likely to benefit from psychological interventions.

The national sampling, and the use of an independent firm to administer the intervention,

permits strong claims of generalizability to U.S. public high school math classrooms. Future

studies could use or adapt a similar methodology to assess generalizability to other age groups,

content areas, or cultural contexts. In general, materials may need to be adapted, sometimes

extensively (see Yeager et al., 2016), to be appropriate to new settings.

A main limitation in our study is that teachers’ mindsets were measured, not manipulated.

The fact that teacher mindsets were moderators above and beyond other teacher confounders

lends support to our hypotheses about the importance of classroom affordances. But more

research is needed to determine whether teachers’ mindset beliefs, or the practices that follow

from them, play a direct, causal role. Thus, the mindset × context approach opens the window to

a new, experimental program of research.

If a future experimental intervention targeted both students and teachers, what kinds of

moderation patterns might be expected? There, we actually might see the largest effects for

formerly fixed mindset teachers. That is, the benefits of planting a seed and fertilizing the soil

should be greatest where soil was formerly inhospitable, and smaller where the soil was already

adequate.

In general, we view the testing and understanding of the causal effect of teacher mindsets

as the next step for mindset science—followed, if successful, by the creation of programs to

promote more growth-mindset-sustaining classroom practices. Such research will be challenging

Teacher and Student Mindsets

30

to carry out, however. For example, we do not think it will be enough to simply copy or adapt the

student intervention and provide it to teachers. A new intervention for teachers will need to be

carefully developed and tested. We do not yet know which teacher beliefs or practices (or

combinations thereof) may be most important in which learning environments. Even if we did,

there is much to be learned about how to best encourage and support key beliefs and practices in

teachers. The current findings, along with other recent findings about the importance of

instructors’ mindsets in promoting achievement for all groups and reducing inequalities between

groups (Canning et al., 2019; Leslie et al., 2015; Muenks et al., 2020), point to the urgency and

value of this research.

Teacher and Student Mindsets

31

References

Bailey, D. H., Duncan, G. J., Cunha, F., Foorman, B. R., & Yeager, D. S. (in press). Persistence

and fadeout of educational intervention effects: Mechanisms and potential solutions.

Psychological Science in the Public Interest.

Blackwell, L. S., Trzesniewski, K. H., & Dweck, C. S. (2007). Implicit theories of intelligence

predict achievement across an adolescent transition: A longitudinal study and an

intervention. Child Development, 78(1), 246–263. doi: 10.1111/j.1467-

8624.2007.00995.x

Bloom, H. S., Raudenbush, S. W., Weiss, M. J., & Porter, K. (2017). Using multisite

experiments to study cross-site variation in treatment effects: A hybrid approach with

fixed intercepts and a random treatment coefficient. Journal of Research on Educational

Effectiveness, 10(4), 817–842. doi: 10.1080/19345747.2016.1264518

Boulay, B., Goodson, B., Olsen, R., McCormick, R., Darrow, C., Frye, M., … Sarna, M. (2018).

The investing in innovation fund: Summary of 67 evaluations (No. NCEE 2018-4013).

Washington, DC: National Center for Education Evaluation and Regional Assistance,

Institute of Education Sciences, U.S. Department of Education.

Brady, S. T., Cohen, G. L., Jarvis, S. N., & Walton, G. M. (2020). A brief social-belonging

intervention in college improves adult outcomes for black Americans. Science Advances,

6(18), eaay3689. doi: 10.1126/sciadv.aay3689

Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American

Psychologist, 32(7), 513–531. doi: 10.1037/0003-066X.32.7.513

Bryan, C. J., Tipton, E., & Yeager, D. S. (in press). Behavioural science is unlikely to change the

world without a heterogeneity revolution. Nature Human Behaviour.

Teacher and Student Mindsets

32

Canning, E. A., Muenks, K., Green, D. J., & Murphy, M. C. (2019). STEM faculty who believe

ability is fixed have larger racial achievement gaps and inspire less student motivation in

their classes. Science Advances, 5(2), eaau4734. doi: 10.1126/sciadv.aau4734

Carroll, J. M., Muller, C., Grodsky, E., & Warren, J. R. (2017). Tracking health inequalities from

high school to midlife. Social Forces, 96(2), 591–628. doi: 10.1093/sf/sox065

Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the impacts of teachers I:

Evaluating bias in teacher value-added estimates. American Economic Review, 104(9),

2593–2632. doi: 10.1257/aer.104.9.2593

Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist,

12(11), 671.

Dweck, C. S., & Yeager, D. S. (2019). Mindsets: A view from two eras. Perspectives on

Psychological Science. doi: 10.1177/1745691618804166

Easton, J. Q., Johnson, E., & Sartain, L. (2017). The predictive power of ninth-grade GPA.

Chicago, IL: University of Chicago Consortium on School Research. Retrieved from

Chicago, IL: University of Chicago Consortium on School Research website:

https://consortium.uchicago.edu/sites/default/files/publications/Predictive%20Power%20

of%20Ninth-Grade-Sept%202017-Consortium.pdf

Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 460–465.

doi: 10.1511/2014.111.460

Gibson, J. J. (1977). The theory of affordances. In R. Shaw & J. Bransford (Eds.), Perceiving,

Acting, and Knowing (pp. 67–82). Hillsdale, NJ: Lawrence Erlbaum.

Teacher and Student Mindsets

33

Gopalan, M., & Tipton, E. (2018). Is the National Study of Learning Mindsets nationally-

representative? Https://Psyarxiv.Com/Dvmr7/. Retrieved from

https://psyarxiv.com/dvmr7/

Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal

inference: Regularization, confounding, and heterogeneous effects. Bayesian Analysis.

doi: 10.1214/19-BA1195

Haimovitz, K., & Dweck, C. S. (2017). The origins of children’s growth and fixed mindsets:

New research and a new proposal. Child Development, 88(6), 1849–1859. doi:

10.1111/cdev.12955

Harackiewicz, J. M., & Priniski, S. J. (2018). Improving student outcomes in higher education:

The science of targeted intervention. Annual Review of Psychology, 69, 409–435. doi:

10.1146/annurev-psych-122216-011725

Hembree, R. (1990). The nature, effects, and relief of mathematics anxiety. Journal for Research

in Mathematics Education, 33–46. doi: 10.2307/749455

Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for

interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177. doi:

10.1111/j.1750-8606.2008.00061.x

Kraft, M. A. (2020). Interpreting Effect Sizes of Education Interventions. Educational

Researcher, 49(4), 241–253. doi: 10.3102/0013189X20912798

Lazarus, R. S. (1993). From psychological stress to the emotions: A history of changing

outlooks. Annual Review of Psychology, 44(1), 1–22. doi:

10.1146/annurev.ps.44.020193.000245

Teacher and Student Mindsets

34

Leslie, S.-J., Cimpian, A., Meyer, M., & Freeland, E. (2015). Expectations of brilliance underlie

gender distributions across academic disciplines. Science, 347(6219), 262–265. doi:

10.1126/science.1261375

Lewin, K. (1952). Field theory in social science: Selected theoretical papers (D. Cartwright,

Ed.). London, England: Tavistock. Retrieved from

http://trove.nla.gov.au/version/21157377

McShane, B. B., Tackett, J. L., Böckenholt, U., & Gelman, A. (2019). Large-scale replication

projects in contemporary psychological research. The American Statistician, 73(sup1),

99–105. doi: 10.1080/00031305.2018.1505655

Miller, D. I. (2019). When Do Growth Mindset Interventions Work? Trends in Cognitive

Sciences, 23(11), 910–912. doi: 10.1016/j.tics.2019.08.005

Muenks, K., Canning, E. A., LaCosse, J., Green, D. J., Zirkel, S., Garcia, J. A., & Murphy, M. C.

(2020). Does my professor think my ability can change? Students’ perceptions of their

STEM professors’ mindset beliefs predict their psychological vulnerability, engagement,

and performance in class. Journal of Experimental Psychology: General. doi:

10.1037/xge0000763

OECD. (2021). Sky’s the limit: Growth mindset, students, and schools in PISA. Paris: PISA,

OECD Publishing. Retrieved from PISA, OECD Publishing website:

https://www.oecd.org/pisa/growth-mindset.pdf

Pattison, E., Grodsky, E., & Muller, C. (2013). Is the sky falling? Grade inflation and the

signaling power of grades. Educational Researcher, 42(5), 259–265. doi:

10.3102/0013189X13481382

Teacher and Student Mindsets

35

Rattan, A., Good, C., & Dweck, C. S. (2012). “It’s ok — not everyone can be good at math”:

Instructors with an entity theory comfort (and demotivate) students. Journal of

Experimental Social Psychology, 48(3), 731–737. doi: 10.1016/j.jesp.2011.12.012

Tipton, E., Yeager, D. S., Iachan, R., & Schneider, B. (2019). Designing probability samples to

study treatment effect heterogeneity. In P. J. Lavrakas (Ed.), Experimental Methods in

Survey Research: Techniques That Combine Random Sampling with Random Assignment

(pp. 435–456). New York, NY: Wiley. Retrieved from

https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119083771.ch22

Walton, G. M., & Wilson, T. D. (2018). Wise interventions: Psychological remedies for social

and personal problems. Psychological Review, 125(5), 617–655. doi:

10.1037/rev0000115

Walton, G. M., & Yeager, D. S. (2020). Seed and soil: Psychological affordances in contexts

help to explain where wise interventions succeed or fail. Current Directions in

Psychological Science, 29(3), 219–226. doi: 10.1177/0963721420904453

Woody, S., Carvalho, C. M., & Murray, J. S. (2020). Model interpretation through lower-

dimensional posterior summarization. ArXiv:1905.07103 [Stat]. Retrieved from

http://arxiv.org/abs/1905.07103

Yamamoto, T., & Yeager, D. S. (2019). Causal mediation and effect modification: A unified

framework. Working Paper, MIT.

Yeager, D. S. (2019). The National Study of Learning Mindsets, [United States], 2015-2016.

Inter-university Consortium for Political and Social Research [distributor]. doi:

10.3886/ICPSR37353.v1

Teacher and Student Mindsets

36

Yeager, D. S., Hanselman, P., Walton, G. M., Murray, J. S., Crosnoe, R., Muller, C., … Dweck,

C. S. (2019). A national experiment reveals where a growth mindset improves

achievement. Nature, 573(7774), 364–369. doi: 10.1038/s41586-019-1466-y

Yeager, D. S., Romero, C., Paunesku, D., Hulleman, C. S., Schneider, B., Hinojosa, C., …

Dweck, C. S. (2016). Using design thinking to improve psychological interventions: The

case of the growth mindset during the transition to high school. Journal of Educational

Psychology, 108(3), 374–391. doi: 10.1037/edu0000098

Zhu, P., Garcia, I., & Alonzo, E. (2019). An independent evaluation of growth mindset

intervention. New York, NY: MDRC. Retrieved from MDRC website:

https://files.eric.ed.gov/fulltext/ED594493.pdf

i

The mindset + supportive context, or “affordances,” hypothesis is akin to what Bailey and colleagues (2020) call

the “sustaining environments” hypothesis, which is the idea that intervention effects will fade out when people enter

post-intervention environments that lack adequate resources for an intervention to continue paying dividends.

ii

Lazarus (1993) summarized well the field’s pejorative view of treatment effect heterogeneity: “psychology has

long been ambivalent … opting for the view that its scientific task is to note invariances and develop general laws.

Variations around such laws are apt to be considered errors of measurement” (pg. 3).

iii

An exploratory analysis allowed the intercept and slope to vary randomly. It showed the same sign and

significance of results and supported the same conclusions as the pre-registered fixed intercept, random slope model.

1

Teacher Mindsets Help Explain Where a

Growth Mindset Intervention Does and Doesn’t Work:

Supplemental Online Materials – Reviewed

This online supplement contains the following information:

• An explanation of how the present paper aligns with the NSLM pre-registration.

• Details of the Bayesian Causal Forest analysis.

• Representativeness of the analytic sample (Tables S1 to S3)

• Screenshots of the treatment and control conditions (reproduced from Yeager et al., 2019)

• Descriptive statistics for student-level, teacher-level, and school-level variables (Table

S4)

• Student-level descriptive statistics by treatment condition (Table S5)

• Description of exclusions to the sample due to merging student and teacher data (Table

S6)

• Student-level descriptive statistics by sample selection (Table S7)

• Student-level measures in the National Study of Learning Mindsets

• Teacher-level measures in the National Study of Learning Mindsets

• Cross-level interaction effect coefficients of the measured teacher-level confounders

(Table S8)

• Correlations between teacher growth mindset beliefs and teacher practices from an

independent concurrent validity pilot sample (Table S9)

• Correlations between teacher growth mindset beliefs and other teacher- and school-level

moderators (Table S10)

• Post-hoc analysis of student perceptions of the classroom climate that could confound the

teacher mindset analysis (Table S11).

2

Alignment with the Pre-Analysis Plan

Here we explain how our analyses followed the analysis plan (https://osf.io/afmb6/) and how we

made decisions about analysis steps that were not described in the pre-analysis plan.

1

We

summarize the alignment with the pre-registration in the table below.

The pre-registered analysis plan’s primary question (RQ4) focused on school-level moderators.

At the end of the plan we stated that “we plan to study whether variance in the treatment impact

across math classes varies due to characteristics of teachers and classrooms” (pg. 15). In the

present paper, we did this. We performed analyses following the methods of the school-level pre-

registered plan, replacing school-level factors with teacher-level factors.

No changes were made to the underlying dataset except for the exclusions listed in the paper

(only including students whose data could be matched to teachers’ mindset reports), which

means that this analysis follows and builds on the extensive pre-registered steps for processing

the grades dataset. We further note that all decisions about data processing were made by third

parties (ICF International, which obtained the grades data, and MDRC, which processed them

and identified math courses). Therefore in the present document we do not review in detail the

many data processing steps that were pre-registered and followed.

Pre-registration component

Pre-registration

Present paper

Public record of the study and

all measures on OSF/ICPSR

Yes

Pre-registration applies

Disclose all manipulations

Yes

Pre-registration applies

Data processing rules

Yes

Pre-registration applies

Sample size and stopping rule

Yes

Pre-registration applies

Covariates

Yes

Pre-registration applies

Primary statistical model (fixed

intercept, random slope)

Yes

Pre-registration applies

Secondary statistical model

(Bayesian Causal Forest)

Yes

Pre-registration applies

Student subgroup

operationalization

(i.e., low-achieving student)

Yes

Pre-registration applies to the

definition of the low-achiever

group. This group was tested

in a supplemental analysis. A

non-pre-registered group (all

students) is primary here.

Outcome variable

Overall GPA

The pre-registration said that

we would focus on math

1

The pre-analysis plan stated that we would pre-register a new plan for the between-teacher analyses. However we

were already unblinded to the data, and so this would not have been useful. Therefore, in this document we have

been transparent about how we made each choice, and then we reported robustness analyses for aspects of the

analyses that allowed for degrees of freedom. Furthermore, we note that the Bayesian machine-learning analysis

(which was mentioned in the analysis plan) builds in penalties for uncertainties about model specifications. This

helps our results to be even more conservative, and robust to researcher degrees of freedom.

3

classes next. This refers to the

use of math GPA, as in the

current study.

School-level moderators

Yes

Pre-registration applies

Teacher-level moderators

No

The overall focus on teachers

was mentioned in the pre-

registration, but the specific

variables were not.

Research Question: Do teacher-level factors explain the variability in the size of the CATE

of the GM on math GPA in U.S. public high schools? (compare to RQ4 in the plan)

- We hypothesized that the intervention effect would vary with respect to what the analysis

plan called “mindset saturation level”, which we define here as math teachers’ growth

mindsets. In a previous paper (Yeager et al. 2019), mindset saturation was examined on

the school level and was defined by peers’ challenge-seeking behavioral norms. We

showed that the intervention effect varied by peer challenge-seeking norms at the school

level. The current paper’s focus on teachers’ mindsets is uniquely predictive, above and

beyond the school-level peer norms, as shown in Table 2.

- We had two competing directional hypotheses about contexts interacting with student

growth mindset interventions. Here is what we stated in the analysis plan:

2

i. “Larger effects on GPA in higher mindset saturation [contexts]. The

reason why is that the environment reinforces the message over time.

Giving the intervention in a high mindset saturation [context] is like

“planting a seed in tilled soil”. We now call it “fertile” soil. We have

labeled this hypothesis the “mindset + context” hypothesis in the present

paper. The label is different but it is the same hypothesis that we pre-

registered.

ii. Larger effects on GPA in lower mindset saturation [contexts]. The reason

why is that in high mindset saturation [contexts] students are already

receiving growth mindset from their teachers and peers (because the

control group is getting “treated”) – the intervention is a ‘drop in the

bucket’. Meanwhile, in lower mindset saturation [contexts], students are

most in need of a growth mindset – the intervention is like “water on

parched soil.” In the present paper we call this the “mindset only”

hypothesis, because implied in this second alterantive is the hypothesis

that students can overcome a fixed mindset classroom. Although the label

is different, it is the same hypothesis that we pre-registered.

2

In the pre-analysis plan, it said “schools” in this predictions section. We have replaced it with [contexts] to

facilitate the application to teachers, which is the level we extended these predictions to in the present paper.

4

- Here we find support for the “seed in fertile soil” hypothesis (called the mindset +

context hypothesis in the paper), just like the previous between-school analysis (Yeager

et al., 2019). In this case, we find that there are larger effects of the treatment on math

GPA in classrooms with a teacher who had a higher growth mindset. The Bayesian

analysis finds hints of support for the “drop in the bucket” hypothesis at the very high

end, and we discuss this in the paper in two places, following our pre-registered analysis

plan.

- The analysis plan stated that we would test a “hybrid” mixed effects model (teacher fixed

intercepts and random slope), using survey weights, and that is what we did.

- The analysis plan said that we would turn to math GPA next, and this is what we did.

Math GPA is the outcome because math teachers participated in the survey, thus only

math grades were relevant to math teachers’ mindsets.

- Our main results do not focus on previously low-performing students because math

grades overall were lower than other subjects, giving more students opportunities for

growth. As a supplementary analysis, we present the results among previously low-

performing students and students without a pre-treatment GPA of 4.00 and above in

Table 2. Table 2 shows that the primary results we report are conservative. Excluding

higher-achieving students yielded directionally-larger effect sizes.

- The analysis plan stated that “with consultation from statisticians, we will evaluate

potential non-parametric models to examine the independent and interactive impact of

school-level moderators on between-school variability in the treatment (e.g. likely a

variation on Bayesian Additive Regression Trees).” We have done this, focusing on

teacher-level moderators, by including the Bayesian Causal Forest (BCF) analysis, which

builds on the BART methodology, as described by Hahn et al. (2020).

- The analysis plan stated that we would include three school-level factors in the models:

mindset norms, school achievement level, and school minority percentage. These school-

level variables were of interest in Yeager et al. (2019). Here, when we included these

school-level factors in our primary model, the results were unchanged (see Table 2 in the

paper).

- The analysis plan stated that we would test for a significant reduction in variability in the

treatment effect () once accounting for the context-level factor. This could not be

estimated in the lmer models, and so instead we used the BCF analyses to understand

how much of the variability in treatment effects was explained by school versus teacher

factors, and the answer was that they were equally important, explaining about half of the

variability, as reported in the paper. Thus our paper includes variability statistics that are

more advanced and reliable than what we pre-registered.

- The analysis plan stated that we would conduct continuous and categorical analyses of

the school-level moderators. Here, when focusing on teacher mindsets, we only

conducted continuous analyses because they are more conservative than cutting teacher

5

mindsets into sub-groups. Further, the BCF analysis makes this point moot because the

BART algorithm can cut the results into subgroups using machine learning rather than

researcher-selected cutpoints, which can be arbitrary. Rather than choose cutpoints, in

Figure 2 in the paper we simply present data for all of the points of the scale.

- The analysis plan listed several exploratory analyses that were conducted for the NSLM

(see the online supplement for Yeager et al., 2019). We did not re-conduct them here

because they primarily focused on school-level factors that are not of interest here. These

exploratory analyses would also have inflated Type-I error rates due to multiple

hypothesis testing. However the analytic dataset for the present paper has been posted on

ICPSR so future analysts may conduct them.

- The analysis plan stated that we plan to “conduct correlational analyses of the math

classroom data.” We still plan to do this, but these correlational analyses will be the

subject of future research papers.

6

Details of the Bayesian Causal Forest (BCF) Analysis

As noted in the paper, BCF has been found, in multiple open competitions and simulation

studies, to detect true sources of treatment effect heterogeneity while not lending much credence

to noise (Hahn et al., 2020; McConnell & Lindner, 2019; Wendling et al., 2018). BCF builds on

and has in several cases surpassed the popular Bayesian Additive Regression Trees (BART,

Chipman, George, & McCulloch, 2010) approach. Both Bayesian regression tree models and

BCF in particular are consistently top performers in empirical evaluations of methods for causal

inference (Dorie et al., 2019; Hahn et al., 2019; McConnell & Lindner, 2019; Wendling et al.,

2018). Here we provide more details about how the BCF model was estimated.

In the BCF analysis, the math GPA for student in the classroom of teacher is denoted by ,

and is modeled by

As before, is the vector of student-level covariates (prior achievement, prior expectations for

success, race/ethnicity, gender, parents’ education). There treatment effect moderators for the

school and teacher, denoted by and respectively, which interact with the student treatment

indicator . For the teacher confounds , we use teacher mindsets, a measure of racial prejudice,

and a measure of pedagogical ability. For the school confounds , we use school achievement

level, peer challenge-seeking norms, and percentage of students who are a minority, as in Yeager

et al. (2019). We also allow for teacher-level intercept random effects, , and random effects on

the treatment, . The student-level error term is and is assumed to be normally distributed

with variance .

Here, and are nonparametric functions which allow for nonlinearities and interactions

between covariates in affecting the expected outcome. Furthermore, it allows for the level of the

intervention effect to vary across teachers and schools, via the interaction of and . This

model is meant to mimic that of the linear analysis in terms of the specification of the control and

treatment modifier variables, but relax the strict assumption of linearity and additivity between

the covariates and the expected value of the outcome.

Using a nonparametric Bayesian approach in this manner has several advantages. First, it allows

the data to speak and better inform us about the relationships between the covariates and the

outcome. It allows us to uncover (possibly unanticipated) sources of heterogeneity in the

intervention effect while requiring few prior assumptions ahead of time. The prior we use results

in posterior estimates that are inherently conservative, making it unlikely that we will

dramatically over- or under-estimate the effect of the intervention.

Prior specification

To complete our Bayesian model, we must specify prior distributions for the unknown values in

the equation above. These include the nonparametric functions and , the random teacher

effects and , and the error variance .

The prior for the functions and is taken from the Bayesian causal forests model (BCF;

Hahn et al., 2020). Under this model, both functions have a sum-of-trees representation, as first

defined for Bayesian methods in Chipman, George, and McCulloch (2010) Each tree consists of

7

a set of internal decision nodes which partitions the covariate space, and a set a of terminal

nodes, or leaves, corresponding to each element of the partition. The prior for each of and

is comprised of three parts: the number of trees, two parameters controlling the depth of

each tree, and a prior on the leaf parameters. Use of this sum-of-trees term allows for detection

of nonlinearity and interactions between covariates.

The key feature of the BCF model is that the prior for , which explains heterogeneity in the

intervention effect, is regularized more heavily compared to the control function in order to

shrink toward homogeneous effects, i.e. the unlikely case that the intervention effect is constant

across all values of the moderators. The prior for uses fewer trees, with each tree being

regularized to be shallower (that is, contain fewer partitions). Details are on prior specification

are given in Hahn, Murray, and Carvahlo (2020) and Chipman, George, and McCulloch (2010).

The random effects and are given a Gaussian prior with the standard deviation having a

prior of a half -distribution with 3 degrees of freedom, as recommended by Gelman (2006).

Finally, the error variance is given an inverse chi-squared prior with 3 degrees of freedom and

scale parameter informed by the data.

Posterior Inference

After conditioning on observed data, we can update the prior distribution to obtain a posterior

distribution. To calculate the posterior distribution for quantities of interest, we implement a

Markov chain Monte Carlo (MCMC) sampling scheme. Since the primary component of the

model is the sum-of-trees functions and , the MCMC scheme relies on a Bayesian

backfitting algorithm (Hastie & Tibshirani, 2000). The rest of the parameters in the model are

conditionally conjugate, making their posterior sampling relatively efficient.

The model as specified has a causal interpretation under the Rubin causal model (Imbens &

Rubin, 2015) if we meet several identifying assumptions. First, it would require no unmeasured

confounders, i.e. there are no unmeasured covariates which affect both the treatment and the

outcome under either control or treatment. Second, it must have positive probability of

assignment to treatment for each student. Both these assumptions are true by the randomization

of the study design.

Having met these assumptions, we may estimate relevant causal estimands. For instance, we can

estimate the individual treatment effect for any particular student. That is, we can estimate the

difference between their observed GPA (whether they received the treatment or the control), and

what their GPA would have been had their treatment assignment been the opposite. In

mathematical terms, for one student in classroom we can estimate the quantity

even though we only actually observe one of these potential outcomes, in the terminology of the

Rubin causal model. We define the intervention effect for one teacher to be the average of the

estimated intervention effects across the students in their classroom. Moreover, we can estimate

the conditional average treatment effect (CATE) function which describes how the intervention

effect fluctuates as a function of the moderators

CATE E

8

That is, the CATE function is identical to , and therefore the prior and posterior for the

CATE function is identical to that of .

Furthermore, we can obtain valid posterior distributions for functions of . In particular, we

can analyze the shift in the treatment effect when changing teacher growth mindset and fixing

the other moderators at their population means. This function can be written mathematically as

mindset mindset

where mindset is the population average of teacher confounds except for mindset, and is the

population average. This is the function produced in Figure 2 in the main text.

The BCF estimate of was approximately additive. Therefore, to give an interpretable estimate

of this conditional intervention effect, we created an additive summary of the fitted function

using splines, and looked at the partial effect of changing teacher mindset while fixing the other

confounds as specified above. This additive summary captures approximately 98% of the

predictive variance in the posterior of , so this additive summary is a faithful recapitulation of

the fitted CATE function.

Plotting the Results of the BCF Analyses

The additive summaries (i.e. generalized additive models) plotting the BCF results are presented

below. Each panel plots the partial effect of a moderator on the posterior predicted treatment

effect, which means that it is the effect of each moderator controlling for the alternative

moderators. Each dot represents a teacher’s treatment effect. Panel A is the focal moderator

(teacher mindset) and it shows an increasing treatment effect up to 5, and no increase after 5.

9

Panels B, C and D show no meaningful differences in the treatment effect across the spectrum of

the alternative moderators. Note that the y-axis in the top row of panels is not the average

treatment effect, but rather the “offset” – that is, by how much would you expect the average

treatment effect to go up or down depending on the level of the moderators. So negative numbers

do not mean the treatment was harmful in fixed mindset classrooms; it means that the treatment

effect was just smaller (i.e. subtracting from the average) in those classrooms.

Panel E in the figure above is a heat map of the posterior probability of a difference in treatment

effects at each percentile level of the moderator. It shows that teachers from approximately the

60th to the 100th percentile on teacher mindset (i.e. growth mindset teachers) show a rather high

likelihood of a difference in treatment effects relative to teachers at 40th percentile or lower (i.e.

fixed mindset teachers). Meanwhile, Panels F, G, and H show that there are no strong differences

between any point for the alternative moderators (i.e. the heat maps are very pale and close to

white, or a 50/50 probability of a directional difference).

Overall, that is rather strong evidence for moderation by teacher mindset and not by alternative,

potential confounds with teacher mindset. BCF found meaningful moderation for the focal

moderator (teacher mindset) but not for alternative, confounding moderators (teacher implicit

bias, pedagogical content knowledge, and fluid intelligence).

10

Representativeness of the Analytic Sample

School-Level:

Because not all schools in the National Study of Learning Mindsets (NSLM) provided student

grades attached to teachers’ names, we evaluated whether the schools in our analytic sample are

representative of our sampling frame (all regular U.S. public high schools with at least 25

students in 9th grade and in which 9th is the lowest grade). We performed this analysis in two

steps. First, we compared the schools in the analytic sample to schools in the sampling frame

using publicly available data such as the Common Core of Data (CCD), the Office of Civil

Rights (OCR), and a district-level tabulation of American Community Survey (ACS) data (See

Gopalan & Tipton, 2018 for more details). Table S1 below show that there are few differences

between characteristics of schools in the analytic sample as compared to the sampling frame.

Table S1: Comparisons of Benchmarks Across the Analytic Sample Schools and Schools in

the National Sampling Frame

Benchmarks

Sampling

Frame Mean

Analytic

Sample Mean

SMD

p

Proportion of 9th Grade Male Students

0.52

0.52

0.00

0.989

Proportion of 9th Grade Black Students

0.14

0.15

-0.01

0.712

Proportion of 9th Grade Hispanic Students

0.21

0.17

0.04

0.115

Proportion of 9th Grade White Students

0.57

0.60

-0.03

0.421

Proportion of 9th Grade Other Race

Students

0.05

0.06

-0.01

0.408

Total 9th Grade Enrollment

285.00

301.00

-0.06

0.564

Proportion of High School Students

Enrolled in Algebra 1

0.22

0.23

-0.01

0.512

Proportion of High School Students

Enrolled in Algebra 2

0.20

0.19

0.01

0.137

Proportion of High School Students

Enrolled in at least one AP Course

0.19

0.18

0.01

0.726

Proportion of High School Student Who

Took at least one AP Exam

0.70

0.70

0.00

0.925

Proportion of Students Who are

Chronically Absent

0.21

0.20

0.01

0.799

Note: SMD is the standardized mean difference. For proportions we report the absolute

differences. A small number of schools do not have information available from the CCD and/or

CRDC. Those schools are excluded from the mean calculations for missing benchmarks, as

appropriate. P-values are shown from one-sample t-tests comparing mean differences.

11

Second, we calculate the generalizability index (Tipton, 2014), a summary measure that provides

the degree of distributional similarity between the schools in the analytic sample and the

inference population, conditional on a set of covariates. The index is calculated using propensity

scores from a sampling propensity score model, which predicts membership in the analytic

sample, given a set of observed school-level characteristics, using logistic regression. The

generalizability index takes on values between 0 and 1, where a value of 0 means that the

analytic sample and inference population are completely different and a value of 1 means that the

analytic sample is an exact miniature of the inference population on the selected covariates (see

Tipton 2014 for more details). Table S2 below shows that the generalizability index is .98,

suggesting that the analytic sample is as good as a random sample from the population of

interest, conditional on the covariates included in the propensity score model. In all, we find that

site-level non-response does not compromise the generalizability of the results from the analytic

sample.

Table S2. Generalizability Index for the Present Sample

Inference Population

Generalizability

Index

Inference Population

N

Analytic Sample N

All Public High

Schools (9th grade

lowest level)

.988

9,522

52

Note: The analytic sample includes 58 schools, but schools must be omitted from calculations

due to missing values on one or more benchmarks. N refers to the number of schools with non-

missing benchmarks used in the sampling propensity score model: racial composition (%African

American, %Hispanic, %White), socioeconomic composition (%Free/Reduced-Price Lunch

Recipients), gender composition (%Male), the number of students in the school, the proportion

of students in the district that are English Language Learners in the district, and the proportion of

Special Education students in the district.

12

Teacher-Level:

The above analysis shows that the schools in the analytic sample are representative of the

inference population, but we also want to ensure that the teachers in the analytic sample are

representative of 9th grade math teachers within these schools. For this analysis, we used student

characteristics from the survey and administrative records to examine whether the characteristics

of students within the classrooms we examine are representative of the characteristics of students

in all of the classrooms in the NSLM. We aggregate student characteristics at the teacher-level

for three groups of teachers: those included in the analytic sample, those asked to participate in

the survey, and those who were matched to any students in our sample. We compare these

groups to ensure that nonresponse on the math teacher survey is not limiting the generalizability

of our sample. As Table S3 below shows, there are few differences between the teachers

represented in our sample compared to the full sample of teachers invited to participate in the

survey and all of the 9th math teachers matched to students in the survey.

Table S3. Generalizability of the Teacher Sample

Teachers in

Analytic

Sample

Teachers Asked to

Participate in Survey

Teachers Matched to

Students in Sample

N

223

255

439

Aggregate Student

Characteristics

Mean

Mean

SMD

p

Mean

SMD

p

Proportion Underrepresented

Minority Students

0.39

0.41

0.02

0.109

0.38

-0.01

0.687

Proportion Students with

College Educated Parent

0.36

0.35

-0.01

0.612

0.37

0.01

0.583

Proportion Female Students

0.50

0.49

-0.01

0.502

0.47

-0.03

0.002

Proportion Low-Achieving

Students

0.51

0.51

0.00

0.776

0.50

-0.01

0.540

Proportion Students in

Treatment Group

0.50

0.50

0.00

0.880

0.50

0.00

0.891

Average Student Math Grade

2.49

2.48

-0.01

0.842

2.60

0.11

0.020

Average Student

Expectations

5.22

5.18

-0.04

0.237

5.21

-0.01

0.670

Teacher Course Levels

Proportion teach Algebra 1

or below

0.80

0.82

0.02

0.443

0.73

-0.07

0.006

Proportion teach Geometry

or above

0.45

0.43

-0.02

0.604

0.45

0.00

0.923

Note: SMD is the standardized mean difference. For proportions we report the absolute

differences. A small number of schools do not have information available from the CCD and/or

CRDC. Those schools are excluded from the mean calculations for missing benchmarks, as

appropriate. P-values are shown from one-sample t-tests comparing mean differences.

13

Screenshots of the Growth Mindset Intervention and Control Materials

Illustrative Screenshots from the Control Condition

Treated students are presented with information about the malleability of the brain.

Treated students are presented with relevant scientific information.

14

Treated students are asked for their help in communicating these ideas to others as a means of

brining them into the story and including them in the narrative.

Treated students receive the message that effort is not enough—you also need strategies to

overcome challenges and develop your skills.

15

Materials convey that teenage years are a special time for brain growth that students can leverage

to their advantage.

Relating growth mindset to their own lives helps students internalize the message by customizing

it, and reduces defensive reactions that might emanate from the perception that adults are telling

the students what to believe.

16

Treated students are encouraged to see the value of applying a growth mindset to their own lives.

Student testimonials, obtained from prior study participants, help communicate that holding a

growth mindset puts them in line with what other students think, and that they’re not alone in

their concerns about school.

17

Treatment materials summarize evidence showing that holding a “learning mindset” (the term

used for growth mindset here) is helpful, using national data from Chile.

18

Illustrative Screenshots from the Control Condition

Control students are asked to help improve a lesson about the brain.

Control students are presented with information about the physiological features of the brain that

does not include the mindset content.

19

The control content includes examples of evidence about the brain.

Control students were asked to engage with the material by responding to short answer prompts.

20

Control students saw student testimonials about the value of the content.

21

Table S4. Descriptive statistics for student-, teacher-, and school-level variables

Student-Level Variables (N =8,775)

Mean/%

SD

In treatment group

49.55%

Grade in math class

2.44

1.24

Expectations of success in math

5.20

1.11

Lower achieving student

51.79%

Female

49.99%

Black, Latinx, or Native-American

40.19%

Mother earned a bachelor's or above

36.80%

Teacher-Level Variables (N = 223)

Growth Mindset

4.74

0.76

Pro-white Implicit Racial Bias

-0.03

0.16

Math Pedagogical Content Knowledge

0.39

0.28

Years Teaching

13.82

9.94

Number of Raven’s problems correct (out of 4)

3.39

1.38

Female

58.74%

White, non-Hispanic

86.04%

Earned a Master’s Degree or higher

51.12%

Heard About Growth Mindset

44.91%

School Sample (N = 58)

Challenge Seeking Norms (# of hard problems chosen)

2.57

0.58

School % Minority (Black, Latinx, or Native-American)

0.34

0.27

School Achievement Level (z-score)

0.08

0.95

22

Table S5. Student descriptive statistics by experimental group

Growth mindset

intervention

(N = 4,348)

Control

(N = 4,427)

Mean or %

SD

Mean or %

SD

p-value for

difference

Grade in Math Class

2.45

1.24

2.42

1.25

.256

Expectations

5.18

1.12

5.21

1.10

.341

Lower achieving student

52.02%

51.57%

.670

Background

Female

50.13%

49.85%

.798

Black, Latinx, or Native-American

40.18%

40.21%

.9783

Mother earned a Bachelor's or

Above

37.05%

36.55%

.625

23

Table S6. Description of sample exclusions

Sample Size

Number of Records (Students)

Explanation

Starting Sample

12,381 (11,508)

Students in the intent-to-treat (ITT)

sample who were assigned a math

grade (for information on this

starting sample, see Yeager et al.

2019)

Sample matched to

teachers

11,689 (10,816)

Excludes students in schools that

did not provide us with any math

teacher IDs (n=438) and student

records that could not be matched

to any math teacher (n=254)

Sample matched to at

least one teacher who

took the survey

9,622 (9,180)

Drops students linked to teachers

who did not take the survey (n=172

teachers)

Sample matched to

teachers’ mindsets

9,167 (8,775)

Drops students linked to teachers

with item nonresponse on the

teacher mindset questions

(n=11 teachers)

24

Table S7. Student descriptive statistics by whether students were matched to a teacher who

completed the survey.

Matched

(in analytic sample)

(N = 8,775)

Not Matched

(not in analytic sample)

(N = 2,733)

Mean or %

SD

Mean or %

SD

Treatment

49.55%

50.68%

Grade in Math Class

2.44

1.24

2.52

1.25

Expectations

5.20

1.11

5.16

1.24

Lower achieving student

51.79%

48.66%

Background

Female

49.99%

46.81%

Black, Latinx, or Native-American

40.19%

41.53%

Mother earned a Bachelor's or Above

36.80%

38.57%

25

Student Survey Measures in the National Study of Learning Mindsets

Students’ Expectations in Math

Scale

7=Extremely well

…

1=Extremely Poorly

Thinking about your skills and the difficulty of your classes, how well do you think you’ll do

in math in high school?

Students’ Racial Minority Status

Options

1=Black/African American

2=Hispanic/Latino

3=Native American/American Indian

4=White, not Hispanic

5=Asian/Asian-American

6=Middle Eastern

7=Pacific Islander/Native Hawaiian

8=Other

How would you classify your racial or ethnic group? Please check all that apply.

Coding Note: Students who checked Black/African American, Hispanic/Latino, Native

American/American Indian, Middle Eastern, and Pacific Islander/Native Hawaiian are

included in our indicator of racial minority status.

Mother’s Highest Education

Options

1=Did not finish high school

2=Finished high school, no college degree

3=Took some college courses, no college

4=AA or AS: Associate’s degree (i.e.,

community college or junior college)

5=BA or BS: Bachelor’s degree (four-year

college or university)

6=MA, MS, or MBA: Master’s degree

7=Doctorate: Lawyer, Doctor or PhD

8=Do not know

To the best of your knowledge, what is the HIGHEST level of education earned by your

mother?

If your mother was not the person who raised you, then answer this question thinking about

the adult who you spent the most time with growing up (such as your grandmother, father, or

legal guardian).

My mother or guardian’s highest level of education is:

Coding Note: Students who selected Bachelor’s degree, Master’s degree, or Doctorate are

included in our indicator of whether their mother earned bachelor’s degree or above.

26

Gender

Options

1=Male

2=Female

Are you:

Student Fixed Mindset

α=0.787

Scale

6=Strongly Agree

…

1=Strongly Disagree

1. Your intelligence is something about you that you can’t change very much.

2. You have a certain amount of intelligence, and you can’t do much to change it.

3. Being a math person or not is something you really can’t change. Some people are

good at math and other people aren’t.

27

Teacher Survey Measures in the National Study of Learning Mindsets

Teacher Mindset

α=0.646

Scale

6=Strongly Disagree

…

1=Strongly Agree

4. People have a certain amount of intelligence, and they really can’t do much to change

it.

5. Being a top math student requires a special talent that just can’t be taught.

Coding Note: Answers to these questions were averaged and the scale was reverse-coded so

higher values indicate a more growth mindset. In the primary analysis we bottom-coded the

scale at 3.5 because there are very few teachers who report a very fixed mindset and this

reduced the influence of extreme cases (N=15). In the BCF analysis we left the teacher

mindset values as-is because the nonlinear model is less likely to be misled by such outliers.

Teacher Covariates

Teacher Education: What degree or degrees have you earned? (Mark all that apply)

Response Options

Associate's

-Discipline

-College(s)/university(ies)

Bachelor's

-Discipline

-College(s)/university(ies)

Master's

-Discipline

-College(s)/university(ies)

PhD / JD / MD / EdD

-Discipline

-College(s)/university(ies)

Coding Note: We create a dichotomous indicator of whether the teacher received a master’s

degree or not.

Teacher Race: Which of these best matches how you would identify yourself? Please check all

that apply.

Response Options

Black / African-American

Hispanic / Latino

Native American / American Indian

White, not Hispanic

Asian / Asian-American - If so: how would

you describe your Asian descent? (e.g.

Chinese, Korean, Indian, etc.)

Middle Eastern

Pacific Islander / Native Hawaiian

Coding Note: We create a dichotomous indicator of whether the teacher is White, not Hispanic

28

or Asian.

Teacher Gender: How would you identify yourself?

Response Options

Male

Female

Other

Years Teaching: In what year did you first start a paid position as a teacher at any level in any

subject?

Open Response

Coding Note: We code the number of years teaching as 2016 – their response.

Previous Knowledge of Growth Mindset: Have you ever heard of the concept of a “Growth

Mindset?” If so, what have you heard about it? It’s okay if not, we’re just curious what

different people have heard. Write your answer in the text box below.

Response Options

Yes I have heard of a growth mindset

No I have not heard of a growth mindset

29

Potential Confounds for Teachers’ Mindsets

Three measures assessed potential confounds for teachers’ mindsets in the moderation analysis.

To illustrate why this can be informative, see Figure S1 below. It shows that, when measuring

moderators, then there could be confounds in the moderation analysis.

Figure S1. Schematic showing why accounting for potential correlates of teacher mindsets

could clarify the role of teacher mindsets in the recursive processes that sustain the effects of a

short, online growth mindset intervention on students’ math grades.

Prior to conducting the study, we did not hypothesize that these variables would, necessarily, be

confounds for teachers’ mindsets. However, each of them was raised by at least one advisor or

reviewer of a grant proposal as a plausible counter-explanation for our results. Therefore, we

included three task-based measures to measure and account for these confounds.

Teacher fluid intelligence. The investigative team thought it would be unlikely that growth

mindset teachers simply had higher levels of fluid intelligence and therefore could create

classrooms that afforded more opportunities for learning. Nevertheless, to control for this

possibility, teachers’ fluid intelligence scores were assessed with the Raven’s Progressive

Matrices task (RPM). The RPM was chosen because of its correlation with the fluid intelligence

factor is high and because it is straightforward to respondents (and therefore easy to mass-

administer). The NSLM administered a subset of 5 items. Item selection was informed by Item

Response Theory (IRT) analyses with large, validated samples of RPM respondents; these

analyses of past datasets led us to select the 5 items that best captured variance across the range

of ability. RPM scores ranged from 0 to 5 correct (For the sample overall, not just the analytic

sample, M = 3.4, SD = 1.38; Chance = .83 correct). See Figure S2.

30

Figure S2. One of the easier puzzles from the measure of fluid intelligence.

Teacher pedagogical content knowledge. To control for the possibility that growth mindset

teachers simply are more skilled in the formal content of math pedagogy, teachers’ pedagogical

content knowledge (PCK) was measured using a new, scalable method: the Kersting et al.

(Kersting et al., 2014) Classroom-Video-Analysis assessment. See the screen capture below in

Figure S3.

During this assessment, math teachers view three clips of math classroom instruction, lasting

roughly 3 minutes. Teachers then answer a single open-ended prompt: “Using your professional

judgment, in the box below please write the question or questions you might ask the students in

this situation. Then explain how your question(s) would improve the students’ mathematical

understanding.” These responses were coded by Kersting, the developer of the measure, and a

second reliable coder, on each of three dimensions of PCK (Kersting et al., 2014). Teachers’

open-ended responses were coded as having given a high-knowledge answer (1) or not (0) for

each dimension. Codes were averaged for each respondent (M = .39, SD = .28, Range: 0 to 1);

higher values corresponded to greater values of pedagogical content knowledge.

31

Figure S3. The math pedagogical content knowledge assessment (see Kersting et al. 2014).

Teacher implicit racial bias. To rule out the possibility that fixed mindset teachers were simply

racially biased against minority groups, we administered a leading method for assessing implicit

racial bias: the Affect Misattribution Procedure (AMP) (Payne & Lundberg, 2014). The AMP is

uniquely suited for the present purposes because of its record of predicting consequential

behaviors in real-world settings more than some other implicit bias measures. For instance, in the

2008 American National Election Study, white survey respondents who identified as democrats

but scored high on the AMP’s measure of implicit anti-black bias were more likely to abstain

from voting than to vote for Barack Obama, a black democratic nominee for U.S. President

(Payne et al., 2010).

Figure S4. The Affect Misattribution Procedure (AMP).

The AMP follows the pattern shown below in Figure S3. Participants view a face (of varying

race or ethnicity), followed by pictogram. People are asked to guess whether the pictogram

probably refers to something pleasant or unpleasant. Then they see a “noise” screen and make

32

their judgment. The overall measure used here is pro-white bias: the proportion of times that

participants guessed that the pictogram was “pleasant” following a white face prime, minus the

proportion of times the participants guessed that the pictogram was “pleasant” following a black

face prime.

33

Table S8. Coefficients of the other teacher-level potential moderators, from a single

multilevel regression model predicting student math grades.

B

SE

t

p

Treatment x Implicit Racial Bias

.207

[-.143, .556]

.179

1.16

.248

Treatment x Fluid Intelligence Score

-.009

[-.044, .026]

.018

-0.50

.614

Treatment x Pedagogical Content Knowledge

-.064

[-.235, .108]

.088

-0.73

.466

Treatment x White/Asian Teacher

-.096

[-.239, .047]

.073

-1.32

.187

Treatment x Male Teacher

-.061

[-.157, .035]

.049

-1.24

.216

Treatment x Years Teaching

.003

[-.002, .008]

.002

1.10

.271

Treatment x Heard About Growth Mindset

-.017

[-.136, .069]

.050

-0.34

.736

Treatment x Master's Degree

-.087

[-.186, .013]

.051

-1.71

.087

Note: Unstandardized regression coefficients from the teacher-level x treatment interactions,

estimated in a model including all potential confounders (Table 2 in the main text).

34

Concurrent Validity Analysis

To assess the concurrent validity of teachers mindset beliefs with respect to teachers mindset-relevant beliefs/practices, we collected

survey data from N = 368 teachers in the OnRamps professional development network in Texas. These teachers taught math and

science courses (e.g. pre-calculus, college algebra, computer science) and came from a diverse set of schools that was racially and

ethnically diverse and matched the demographics of Texas. Teachers answered the two teacher mindset items administered in the

NSLM (and described in the paper). Then they completed measures that assessed teacher beliefs/practices which could afford a

growth mindset (or not). Correlations of teacher mindset with teacher beliefs/practices appear in Table S9.

Table S9: In the concurrent validity sample, correlations between teacher growth mindset beliefs and teacher practices.

Correlations with Teachers’

Growth Mindset Beliefs

Learning-focused practices composite

r = .30

It slows my class down to encourage lower achievers to ask questions (reverse-

coded).

r = .17

There is usually only one correct way to solve a math problem. (reverse-coded)

r = .21

Mathematics mostly involves learning facts and procedures. (reverse-coded)

r = .15

Imagine a student was feeling discouraged in math class in the way just described on

the previous page. How likely would you be to say each of the following statements?

Let’s see what you don’t understand and I’ll explain it differently.

r = .13

Ability-focused practices composite

r = -.28

Imagine a student was feeling discouraged in math class in the way just described on

the previous page. How likely would you be to say each of the following statements?

Don't worry - it's okay to not be a math/science/computer science person

r = -.20

Imagine a student was doing well in your math class in the way just described on the

previous page. How likely would you be to say each of the following statements?

You're lucky that you're a math/science/computer science person

r = -.27

It's great that it's so easy for you

r = -.18

Note: All correlations significant at p < .01.

35

Table S10: In the NSLM, correlations between teacher growth mindset beliefs and other teacher- and school-level moderators

1

2

3

4

5

6

7

8

9

10

11

12

1

Teacher Growth

Mindset

1.000

Teacher-level

2

Implicit Racial Bias

-.094

1.000

3

Fluid Intelligence

Score

-.038

-.022

1.000

4

Pedagogical Content

Knowledge

-.087

-.009

.195

1.000

5

White/Asian Teacher

-.096

-.084

.065

.098

1.000

6

Female Teacher

.058

-.045

-.102

.008

-.047

1.000

7

Years Teaching

-.129

-.050

-.090

-.065

.032

-.066

1.000

8

Heard About Growth

Mindset

.148

-.002

.142

.027

-.040

-.055

.009

1.000

9

Master’s Degree

.046

-.033

.007

-.050

-.030

-.091

.196

.170

1.000

School-level

10

Challenge seeking

Norms

-.051

.004

-.010

.076

.081

-.075

.085

.110

.017

1.000

11

Percent Minority

.043

-.066

.005

-.073

-.411

.026

.012

.181

.005

.032

1.000

12

Achievement Level

-.096

-.056

.032

.115

.292

-.060

-.028

-.014

-.020

.358

-.443

1.000

36

Post-hoc Analysis of Student Perceptions of the Classroom Climate that Could Confound

the Teacher Mindset Analysis

Student Perceptions measures were created by taking the average of student responses within

teacher. We drop any teachers without student survey responses to these items from the analysis.

On average, about 28 students within teachers were used to create these indicators, described

below.

Student Perceptions: Good Teacher

Options

5=Extremely True

…

1=Not at all True

How true or not true is the statement below?

My math teacher is good at teaching math.

N= 4,128 students

Student Perceptions: Interesting

Combined Questions

5=Extremely True

…

1=Not at all True

How true or not true is the statement below?

1. My math teacher makes lessons interesting.

2. My math class does not keep my attention – I get bored. (Reverse Coded)

Coding Note: Students were asked either question 1 (N=2,091) or question 2 (N=2,039) above.

We reverse coded question 2 and combined these responses to make one indicator of whether

students think their teaching makes class interesting for them.

N=4,130

Student Perceptions: Academic Press

α=0.726

Scale

5=Extremely True

…

1=Not at all True

How true or not true is the statement below?

1. My math teacher accepts nothing less than out full effort.

2. In my math class, we learn a lot almost every day.

3. My math teacher asks questions to be sure we are following along when s/he is

teaching.

N=8,250

37

Table S11. Coefficients for student perceptions of teachers as potential confounding

moderators, from separate multilevel regression model predicting student math grades.

B

SE

t

p

Treatment x Growth Mindset Beliefs

0.089**

0.032

2.815

0.005

[.027, .151]

Treatment x Student Perceptions: Good Teacher

-0.077

0.067

-1.138

0.255

[-.209, .055]

N=9,151 records, 219 teachers

Treatment x Growth Mindset Beliefs

0.087**

0.032

2.759

0.006

[.025, .150]

Treatment x Student Perceptions: Interesting

-0.048

0.055

-0.873

0.383

[-.156, .060]

N=9,151 records, 219 teachers

Treatment x Growth Mindset Beliefs

0.088**

0.032

2.772

0.006

[.026, .150]

Treatment x Student Perceptions: Academic

Press

0.051

[-.129, .230]

0.092

0.551

0.581

N=9,160 records, 221 teachers

Note: Unstandardized regression coefficients from the teacher-level x treatment interactions,

estimated in a model including all potential confounders (Table 2 in the main text).

38

References

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression

trees. The Annals of Applied Statistics, 4(1), 266–298. https://doi.org/10.1214/09-

AOAS285

Dorie, V., Hill, J., Shalit, U., Scott, M., & Cervone, D. (2019). Automated versus do-it-yourself

methods for causal inference: Lessons learned from a data analysis competition.

Statistical Science, 34(1), 43–68. https://doi.org/10.1214/18-STS667

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment

on article by Browne and Draper). Bayesian Analysis, 1(3), 515–534.

https://doi.org/10.1214/06-BA117A

Hahn, P. R., Dorie, V., & Murray, J. S. (2019). Atlantic Causal Inference Conference (ACIC)

Data Analysis Challenge 2017. ArXiv Preprint ArXiv:1905.09515.

Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal

inference: Regularization, confounding, and heterogeneous effects. Bayesian Analysis.

https://doi.org/10.1214/19-BA1195

Hastie, T., & Tibshirani, R. (2000). Bayesian backfitting (with comments and a rejoinder by the

authors. Statistical Science, 15(3), 196–223. https://doi.org/10.1214/ss/1009212815

Imbens, G. W., & Rubin, D. B. (2015). Causal inference for statistics, social, and biomedical

sciences: An introduction. https://doi.org/10.1017/CBO9781139025751

Kersting, N. B., Sherin, B. L., & Stigler, J. W. (2014). Automated scoring of teachers’ open-

ended responses to video prompts: Bringing the classroom-video-analysis assessment to

scale. Educational and Psychological Measurement, 74(6), 950–974.

https://doi.org/10.1177/0013164414521634

McConnell, K. J., & Lindner, S. (2019). Estimating treatment effects with machine learning.

Health Services Research, 54(6), 1273–1282. https://doi.org/10.1111/1475-6773.13212

Payne, B. K., Krosnick, J. A., Pasek, J., Lelkes, Y., Akhtar, O., & Tompson, T. (2010). Implicit

and explicit prejudice in the 2008 American presidential election. Journal of

Experimental Social Psychology, 46(2), 367–374.

https://doi.org/10.1016/j.jesp.2009.11.001

Payne, B. K., & Lundberg, K. (2014). The affect misattribution procedure: Ten years of evidence

on reliability, validity, and mechanisms: Affect misattribution procedure. Social and

Personality Psychology Compass, 8(12), 672–686. https://doi.org/10.1111/spc3.12148

Tipton, E. (2014). How generalizable is your experiment? An index for comparing experimental

samples and populations. Journal of Educational and Behavioral Statistics, 39(6), 478–

501. https://doi.org/10.3102/1076998614558486

Wendling, T., Jung, K., Callahan, A., Schuler, A., Shah, N., & Gallego, B. (2018). Comparing

methods for estimation of heterogeneous treatment effects using observational data from

health care databases. Statistics in Medicine, 37(23), 3309–3324.

https://doi.org/10.1002/sim.7820

Yeager, D. S., Hanselman, P., Walton, G. M., Murray, J. S., Crosnoe, R., Muller, C., Tipton, E.,

Schneider, B., Hulleman, C. S., Hinojosa, C. P., Paunesku, D., Romero, C., Flint, K.,

Roberts, A., Trott, J., Iachan, R., Buontempo, J., Yang, S. M., Carvalho, C. M., …

Dweck, C. S. (2019). A national experiment reveals where a growth mindset improves

achievement. Nature, 573(7774), 364–369. https://doi.org/10.1038/s41586-019-1466-y