ArticlePDF Available

Questionnaires as interventions: can taking a survey increase teachers’ openness to student feedback surveys?

Abstract and Figures

Administrators often struggle in getting teachers to trust their school’s evaluation practices – a necessity if teachers are to learn from the feedback they receive. We attempted to bolster teachers’ support for receiving evaluative feedback from a particularly controversial source: student-perception surveys. For our intervention, we took one of two approaches to asking 309 teachers how they felt about students evaluating their teaching practice. Control participants responded only to core questions regarding their attitudes towards student-perception surveys. Meanwhile, treatment participants were first asked whether teachers should evaluate administrators in performance reviews and were then asked the core items about student-perception surveys. Congruent with cognitive dissonance theory, this juxtaposition of questions bolstered treatment teachers’ support for using student surveys in teacher evaluations relative to the control group. We discuss the implications of these findings with respect to increasing teacher openness to alternative evaluation approaches, and consider whether surveys show promise as a vehicle for delivering interventions.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
Download by: [] Date: 25 July 2017, At: 08:01
Educational Psychology
An International Journal of Experimental Educational Psychology
ISSN: 0144-3410 (Print) 1469-5820 (Online) Journal homepage:
Questionnaires as interventions: can taking a
survey increase teachers’ openness to student
feedback surveys?
Hunter Gehlbach , Carly D. Robinson, Ilana Finefter-Rosenbluh, Chris
Benshoof & Jack Schneider
To cite this article: Hunter Gehlbach , Carly D. Robinson, Ilana Finefter-Rosenbluh, Chris
Benshoof & Jack Schneider (2017): Questionnaires as interventions: can taking a survey
increase teachers’ openness to student feedback surveys?, Educational Psychology, DOI:
To link to this article:
© 2017 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
View supplementary material
Published online: 25 Jul 2017.
Submit your article to this journal
View related articles
View Crossmark data
Questionnaires as interventions: can taking a survey increase
teachers’ openness to student feedback surveys?
HunterGehlbacha, Carly D.Robinsonb, IlanaFinefter-Rosenbluhc, ChrisBenshoofd
and JackSchneidere
aGevirtz Graduate School of Education, University of California, Santa Barbara, CA, USA; bHarvard Graduate
School of Education, Cambridge, MA, USA; cFaculty of Education, Monash University, Clayton, Australia;
dLathrop High School, Fairbanks, AK, USA; eCollege of the Holy Cross, Worcester, MA, USA
Administrators often struggle in getting teachers to trust their school’s
evaluation practices – a necessity if teachers are to learn from the
feedback they receive. We attempted to bolster teachers support for
receiving evaluative feedback from a particularly controversial source:
student-perception surveys. For our intervention, we took one of
two approaches to asking 309 teachers how they felt about students
evaluating their teaching practice. Control participants responded
only to core questions regarding their attitudes towardsstudent-
perception surveys. Meanwhile, treatment participants were rst
asked whether teachers should evaluate administrators in performance
reviews and were then asked the core items about student-
perception surveys. Congruent with cognitive dissonance theory, this
juxtaposition of questions bolstered treatment teachers’ support for
using student surveys in teacher evaluations relative to the control
group. We discuss the implications of these ndings with respect to
increasing teacher openness to alternative evaluation approaches,
and consider whether surveys show promise as a vehicle for delivering
Shakespeare’s winter of discontent may well apply to the current sentiment surrounding
teacher accountability systems in the United States. Frustrated educational researchers lament
that (over) emphasising test-score-based approaches to assessing teachers ignores major
confounding factors such as poverty and the complexity of teaching (Berliner, 2013; Good,
2014; Koedinger, Booth, & Klahr, 2013). Teachers worry that they teach a narrower subset of
curricula than ever before and that they often must spend, ‘substantial instructional time on
exercises that look just like the test-items’ (Darling-Hammond, 2010, p. 71). To the chagrin of
many policy-makers, almost all teachers continue to receive procient’ ratings despite prin-
cipals reporting that the range of teacher competencies is more variable
© 2017 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License
(, which permits non-commercial re-use, distribution, and reproduction in any medium,
provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
Brief interventions;
cognitive dissonance;
student feedback surveys;
questionnaire; teacher
Received 18 November 2016
Accepted29 June 2017
CONTACT Hunter Gehlbach
Supplemental data for this article can be accessed at
(Kraft & Gilmour, 2016). While the discontent is unlikely to turn into glorious summer any time
soon, new developments for districts aspiring to fairly evaluate their teachers oer some hope.
Recent research suggests new approaches to assessing teacher quality – in particular,
students reporting their perceptions of their teachers – may be a promising component of
a teacher evaluation programme (Kane, McCarey, Miller, & Staiger, 2013). However, this
potential addition to a teacher evaluation system faces a major problem: teacher resistance.
Many teachers and their unions oppose integrating student feedback into teacher evalua-
tions (Cromidas, 2012; Decker, 2012). This opposition is understandable – it is far from intu-
itive that good data might be gleaned from the reports of capricious second graders or surly
sophomores. Furthermore, while some forms of evaluation can improve teaching (Taylor &
Tyler, 2012), it remains to be seen whether teachers might learn from this kind of feedback.
Still, one thing is clear: If teachers consider student-perception surveys to be unfair or biased,
the likelihood that their teaching will improve from this feedback seems vanishingly small.
This study tests the eects of a brief intervention designed to nudge teachers’ attitudes
to be more favourable towards the use of student-perception surveys in evaluating teaching
Broader context of the study
A brief sampling of the scholarship on evaluating teacher eectiveness contextualises the
fraught nature of student-perception surveys. In the United States, the adoption of the No
Child Left Behind act generated dissatisfaction as teachers garnered nearly universal ‘satis-
factory’ ratings. In response, districts experimented with new evaluation systems. In particu-
lar, many districts began to assess their teachers based in part on students’ standardised test
scores (Steinberg & Donaldson, 2016). Research suggested that more eective teachers in
early grades (as measured through this test-score approach) impacted a host of long-term
student outcomes such as pregnancies and lifetime earnings (Chetty, Friedman, & Rocko,
2011). Consequently, enthusiasm for these evaluation methods grew. Simultaneously, scep-
ticism and critique of this approach erupted (Baker et al., 2010). Some argued that because
of the complexity of teaching (Koedinger et al., 2013), students’ standardised test scores
should only comprise a part of teachers’ evaluations – leaving open the question of what
other data might provide useful feedback on teachers’ eectiveness.
Based in large part on the ndings from the Measures of Eective Teaching study, student-
perception surveys gained traction as a potentially useful component of teacher evaluation
systems. The study’s authors found that students’ perceptions were not only reliable, but possibly
more accurate in predicting gains in student learning than observation protocols (Kane et al.,
2013). Others found that student surveys about their teachers better predicted student scores
on criterion-referenced tests than teacher self-ratings and principal ratings (Wilkerson, Manatt,
Rogers, & Maughan, 2000). Additionally, student surveys remain relatively cheap and easy to
administer. Perhaps most importantly, these surveys can potentially capture a much richer array
of desired teacher qualities than might gleaned from students’ test scores (Ferguson, 2012).
However, this idea was hardly less controversial than evaluating teachers on their students
test scores. Which aspects of teaching might students reasonably report on? At which grade
levels? For all courses or just academic ones? Should stakes be attached to these surveys –
possibly causing students to misreport their true feelings – or should the surveys solely be
used to drive improvements in teaching?
Thus, those interested in improving teacher evaluations faced a tough choice. On the one
hand, preliminary studies suggested that student reports might be an important, straight-
forward way to expand our approaches to evaluating teachers (Kane et al., 2013; Wilkerson
et al., 2000). On the other hand, if teachers were not open to this approach, it seemed unlikely
that the system would work well or that teachers would learn much from the student feed-
back. Consequently, researchers and district administrators interested in the viability of
student-perception surveys as part of teacher evaluations faced a Catch-22: They needed
teachers to be open to use student-perception surveys as a part of their evaluation systems.
Only then could researchers fairly adjudicate whether student-perception surveys might
work as a component of these evaluation systems.
For the sake of the present research, two key points should be remembered. First, the
controversial topic of student-perception surveys has emerged within larger controversies
surrounding teacher evaluation. So when asked about their attitudes towards student-
perception surveys, teachers likely have thought about the issue and may well have strong
opinions, i.e. they are unlikely to be blank slates.
Second, these surveys are already happening across the United States and internationally,
so school leaders need to get teachers bought-in to learning from student feedback.
Numerous states now encourage the use of student-perception surveys to assess K-12 teach-
ers (The Colorado Education Initiative, 2015; MET Project, 2012; TEAMTN, 2015). As Steinberg
and Donaldson (2016) report, 17% of the largest US districts employ student-perception
surveys in some way. Thus, students are already generating vast quantities of feedback. The
question is whether teachers will learn from it. If an intervention could nudge teachers to
be slightly more open to learning from this feedback, the resulting eects could improve
teaching across much of the United States. Particularly because teaching is so context-
dependent – what works for one group of students may or may not translate to the next
class period, the next day, or the following year’s class – getting feedback that is specic to
a particular group of students is vital for teachers.
Leveraging cognitive dissonance through surveys
Our intervention leveraged the social psychological principle of cognitive dissonance
(Festinger, 1962). Cognitive dissonance research has been one of the most robust and inu-
ential areas of inquiry within social psychology (Brehm, 2007). Current dissonance scholars
largely agree that this psychological state arises when individuals experience tension
between inconsistent cognitions. Because people desire internal consistency, experiencing
incompatible cognitions causes discomfort. In most situations, this uncomfortable tension
motivates action to alleviate the tension (Brehm, 2007; Gawronski, 2012; Martinie, Milland,
& Olive, 2013). Numerous experiments show that people employ a range of strategies to
mitigate this discomfort: by changing one of their beliefs or attitudes, through recalibrating
the importance of the relevant cognitions, by engaging in a new behaviour, through chang-
ing their ongoing behaviour, or by feeling less responsible for their behaviour (Martinie
et al., 2013).
Much of the work on dissonance has focused on the alignment of cognition and behav-
iours. For instance, Harmon-Jones, Harmon-Jones, and Levy (2015) describe three main par-
adigms of cognitive dissonance research, each of which implicate a person’s behaviours:
induced compliance, decision-making, and eort justication studies. From this perspective
on cognitive dissonance, ‘the negative aective state of dissonance is aroused not by all
cognitive conict but, specically when cognitions with action implications conict with
each other making it dicult to act’ (Harmon-Jones et al., 2015, p. 185).
However, others suggest that behaviours or actions may not be required for individuals
to experience dissonance. In this view, inconsistent cognitions may serve as a cue for the
presence of errors in one’s belief system (Gawronski, 2012). Thus, an intriguing question –
and one with important practical implications – becomes whether attitude change might
be sparked through inconsistent cognitions even if the thoughts have little potential to
inuence behaviour.
Social psychologists have applied the basic idea of cognitive dissonance across an array
of real-world settings to generate a variety of interventions. Through ‘foot-in-the-door’ tech-
niques, participants nd that it becomes much harder to say no to someone after having
already made a small concession or done a modest favour (e.g. Freedman & Fraser, 1966). In
‘saying-is-believing’ interventions, participants publicly espouse a point of view and then
subsequently tend to endorse that point of view more strongly (e.g. Aronson, Fried, & Good,
2002; Walton & Cohen, 2011). In other words, to say one thing and believe another would
be inconsistent. In these eld-experiments the dissonant cognitions again tend to implicate
Before dissonance theory came to the fore in social psychology, scholars in other elds
utilised people’s desire for internal consistency to demonstrate biased responding in ques-
tionnaires. For example, in the late 1940s asking Americans whether communist reporters
should be allowed to report on visits to the United States garnered little endorsement (37%
of respondents say ‘yes’). However, rst asking whether US reporters should be allowed to
report on the Soviet Union (an idea most everyone endorsed) and then asking about the
communist reporters dramatically shifted endorsements to 73% (Dillman, Smyth, & Christian,
2014). In this instance, presumably the respondents felt awkward about maintaining a
double-standard for Soviet and US reporters and thus shifted their opinions. Thus, experi-
mental evidence exists that is congruent with a cognitive dissonance explanation, even
though no actions are implicated. However, one could argue that most respondents have
no personal stake in what happens to reporters of dierent nationalities. Therefore, they
might be motivated only by presenting themselves consistently to the administrator of the
survey. Because the content of the cognitions is not particularly relevant at a personal level,
participants are unlikely to have held strong opinions about these reporters previously.
Consequently, changing ones opinion on this issue seems relatively cost-free. To the extent
that dissonance occurs at all, it is likely a weak version that might be easily resolved.
The situation becomes more intriguing when we shift to a case that has personal relevance
(but no action implications) for survey respondents. In this instance, we might anticipate
more strongly held prior attitudes that would be correspondingly harder to shift. In other
words, can cognitive dissonance still be sparked by attitudes alone when respondents are
personally invested in an issue? This is exactly the case we examined.
The present study
We applied this same psychological principle of cognitive dissonance to the challenge of
cultivating teachers’ support for using student-perception surveys as a component of teacher
evaluations. We randomly assigned a group of teachers to respond to survey questions about
their support for student-perception surveys under one of two contexts. Control teachers
simply took a ve-item survey scale assessing their feelings towards student-perception
surveys as the initial part of their survey. Treatment teachers answered the same items, but
did so after rst responding to a parallel scale about teachers evaluating their
As described in our Statement of Transparency, we anticipated that most teachers would
endorse their own capacity to capably evaluate their administrators. These relatively high
ratings would then spark a sense of dissonance when teachers next answered the items
regarding students evaluating teachers. In other words, we anticipated that teachers in the
treatment group would think something akin to: (1) Yes, teachers are capable of evaluating
and giving feedback to their administrators, (2) I am a fair person, who does not hold
double-standards; I am not a hypocrite, and (3) Although some students might be too young,
if it is reasonable for teachers to evaluate administrators, it should be reasonable for students
to evaluate their teachers.
Congruent with recent best practices for experimental studies (Gehlbach & Robinson,
manuscript under review; Simmons, Nelson, & Simonsohn, 2011), we submitted our
Statement of Transparency using Open Science Framework and pre-registered our main
hypothesis that: Treatment teachers will report greater support for student-perception sur-
veys on our ve-item composite than their control counterparts (controlling for their status
a national- or state-level award winning teacher). Increasingly, scholars have raised concerns
about ‘researcher degrees of freedom’ in which investigators engage in various practices
that have problematic repercussions. On the one hand, some of these practices, such as
testing numerous covariates, can provide an exhaustive sense of what a data-set might tell
us about a particular population. On the other hand, the practices enumerated by Simmons
et al. (2011) all serve to inate the p-value of any given analysis. By pre-registering our analysis
plan and specifying the model we t ahead of time, we avoid this concern. By describing a
set of exploratory analyses, we also hope to gain additional insights that might be generated
from the data-set. Readers should have more faith in the ndings corresponding to the
pre-registered analysis and should treat the exploratory analysis as hypothesis generating.
Finally, we report our ndings using condence intervals and eect sizes rather than relying
on null-hypothesis signicance testing (Cumming, 2014; Thompson, 1996).
We recruited participants through snowball sampling using teachers from a prominent
teacher organisation as our initial base of participants. Specically, we partnered with the
National Network of State Teachers of the Year (NNSTOY,, an organisation
of teachers who were selected as nalists or winners of State or National Teachers of the Year
competitions across the US. In addition to their broad geographic representation, we decided
to start from this sample of NNSTOY teachers based on the potential implications of our
study. We were especially interested in whether this intervention might work with teachers
who were leaders in their respective school communities. If school administrators could use
this approach successfully to get buy-in from the leaders in their school, we expected that
other teachers might be more likely to be persuaded.
The study focused on K-12 teachers at the end of the 2014–2015 school year. Of the 407
teacher participants who clicked into the survey, 309 participants (n = 157 control; n = 152
treatment) continued the survey long enough to complete the intervention and primary
dependent measure (i.e. control participants completed Support for Student-Perception
Surveys scale and treatment participants completed both scales). No participants who began
the intervention dropped out before completing the primary dependent measure; thus,
there was no dierential attrition for the treatment participants simply because they had to
complete ve extra items. Of the 279 participants who completed the entire survey (i.e. all
the way through the demographic questions at the end of the survey), 76% were female
and 32% were members of the NNSTOY. In terms of race/ethnicity, 85% of participants iden-
tied as white or Caucasian, 5% Latino, and less than 5% each for teachers who categorised
themselves as African-American, American Indian/Alaskan Native, Middle Eastern, or ‘Other.
Participants taught in 44 states and the District of Columbia, and teachers from all grades,
K-12, were represented. Approximately 50% of teachers reported having taught high school
in the prior year, 24% taught middle school and 26% taught elementary school. The average
amount of teaching experience was 18 years, with a standard deviation of 8.2 years and a
maximum of 39 years.
Thus, the sample was relatively representative of the US population of teachers on dimen-
sions such as race and gender – the overall teaching population for 2011–2012 was 82.7%
white and 76.2% female (National Center for Educational Statistics, 2013). However, the large
proportion of award winning teachers, high numbers of high school teachers and substantial
years of experience were not representative of the broader population of teachers. Given
the experimental design, the extent to which these discrepancies limit the generalisability
of the results is unclear.
Our measure of Support for Student-Perception Surveys consisted of a ve-item scale
(α = .86) to assess teachers’ views of using student-perception surveys to evaluate teachers.
After correlating the errors for items 2 and 3, a conrmatory factor analysis showed that the
data t a one-factor model (
df = 309
= 5.89, p = .21; CFI = .997; RMSEA = .039). This measure
included questions such as, ‘Overall, to what extent is it a good idea to have teachers’ per-
formance reviews be partially based on student input?’ Both treatment and control partici-
pants completed this scale. See Table 1a for item-level descriptive statistics on this
Only treatment participants completed the Support for Teacher-Perception Surveys meas-
ure – a ve-item scale (α = .75) that mirrored the student-perception survey scale and
assessed teachers’ views of using teacher-perception surveys to evaluate administrators. See
Table 1b. After correlating the errors for items 2 and 5, a conrmatory factor analysis showed
that the data t a one-factor model (
df = 151
= 5.36, p = .25; CFI = .993; RMSEA = .048). This
measure included questions such as, ‘Overall, to what extent is it a good idea for adminis-
trators’ evaluations to be based partially on teacher input?’
Beyond these ndings regarding the reliability and structural validity (Messick, 1995) of
each scale, acquiring additional indicators of validity was challenging because we developed
both scales explicitly for this project. However, we took seriously the notion that validity
should be built into each measure from the outset of the scale development process
(Gehlbach & Brinkworth, 2011). As such, we reviewed the literature on both topics, solicited
input from numerous teachers about both scales, synthesised these two distinct sources of
information, and adhered to standard best practices in survey design in writing the items
(Dillman et al., 2014; Gehlbach & Brinkworth, 2011, steps 1–4 of their survey design process).
Feedback from a pilot allowed us to revise the scales. We present the nal versions of both
measures in the Appendix 1.
Frequently, the claim of a scale being ‘validated’ rests upon a series of correlations with
other measures which show particular patterns of convergent and discriminant validity. For
example, we take the fact that our two scales correlated moderately (r152 = .52, p < .001) as
evidence that they are measuring related concepts as expected (i.e. both are tapping into
a general attitude towards feedback surveys). To our knowledge though, no other similar
measures of these constructs exist making it challenging to enact this traditional approach
to establishing validity. Furthermore, in actuality, validity is not an achieved state but an
ongoing process (Gehlbach, 2015). Thus, for newly developed scales we feel as though we
have reasonable preliminary evidence of construct validity, though this will be an important
area to build upon through future research.
Finally, we also collected demographic data and information on the participants’ teaching
career at the end of the survey.
Table 1a.Descriptive statistics for support for student-perception survey scale (unadjusted mean, SD,
and Pearson (r) correlations).
Notes: 1) Ns=152 for Treatment; 157 for Control.
2) The observed range for each item and the composite were 1 through 5.
3) All correlations are significant at the p<.05 level.
4) Intra-scale correlations are below the diagonal for treatment and above the diagonal for control.
Treatment Pearson r correlations Control
MSD 1 2 3 4 5 6 SD M
1) Fair 2.84 1.04 .48 .50 .52 .71 .84 1.08 2.47
2) Useful 3.85 .97 .57 .54 .25 .47 .73 1.12 3.86
3) Objective 2.68 .97 .70 .60 .41 .53 .77 .95 2.63
4) Others 2.12 .98 .61 .41 .56 .51 .66 .80 1.83
5) Good idea 2.64 1.16 .78 .56 .70 .61 .84 1.05 2.18
6) Overall composite 2.83 .85 .89 .75 .86 .77 .89 .77 2.60
Table 1b.Descriptive statistics for the teacher-perception survey scale (unadjusted mean, SD, and Pear-
son (r) correlations).
Notes: 1) N=151–152.
2) The observed range for each item 1 through 5, except for ‘useful’ (2 through 5); the overall composite was 1.6 through 5.
3) All correlations in the table (except for the Others-by-Useful correlation) are significant at the p<.05 level.
Pearson r correlations
1) Fair 3.38 1.09
2) Useful 4.43 .71 .32
3) Objective 3.07 1.07 .55 .19
4) Others 3.30 .98 .59 .14 .51
5) Good idea 3.84 .96 .37 .45 .29 .27
6) Overall composite 3.61 .68 .83 .53 .75 .73 .66
Through the NNSTOY network, we recruited teachers via emails and posts on social media
outlets. We encouraged the NNSTOY participants to take the survey themselves and then
to email the survey link to their fellow teachers in their schools and professional networks.
Participants were given the opportunity to win a $100 gift card in a lottery. Towards the end
of the survey, participants answered open-ended questions and could sign up for future
interviews/focus groups to discuss student-perception surveys as part of an ongoing, com-
plementary study.
The survey, administered via Qualtrics, took 5–10 min to complete and remained open
for two weeks in June of 2015. After participants completed their consent forms, the Qualtrics
platform randomly assigned them to treatment and control. All participants were told that
schools and districts across the country are considering using perception surveys as part of
performance reviews for teachers, and researchers wanted to get teachers’ input on this
practice. For control group participants, they then answered the ve-item scale regarding
their views about the use of student-perception surveys to evaluate teachers.
Before being asked about student-perception surveys, participants in the treatment con-
dition were rst told that schools and districts across the country are considering using
teacher perception surveys as part of performance reviews for administrators, and researchers
wanted to get teachers’ perspectives on this idea. They then answered the ve-item scale
regarding their views about the use of teacher-perception surveys to evaluate
Analytic approach
As noted in our Statement of Transparency, we evaluated our hypothesis using ordinary
least-squares regression with NNSTOY status as a covariate:
where Treatment1i indicates whether teacher i was exposed to the cognitive dissonance
treatment or not, X2i is a dummy variable indicating whether the teacher was a member of
NNSTOY or not, and εi is a residual. We included NNSTOY as a covariate because we assumed
that teachers who received such positive, public acclaim for their teaching would be more
condent teachers and more open to feedback from students than their non-NNSTOY peers.
We hoped the covariate would sharpen the precision of our estimates by accounting for this
additional source of variation.
As a rst step in our analyses, we checked for violations of random assignment with
respect to teachers’ gender, race, NNSTOY status, level of schooling taught or years of teach-
ing experience. Second, as a manipulation check, we examined whether teachers generally
endorsed the notion that they were competent to evaluate their administrators.
Preliminary analyses
We found no evidence that our random assignment produced non-equivalent groups,
Specically, the treatment and control groups appeared similar with respect to the
distribution of: males and females,
= 1.03, p = .31; NNSTOY membership,
= .07, p = .79;
dierent racial and/or ethnic backgrounds,
= 5.76, p = .33; grade-level taught (i.e. elemen-
tary, middle, or high school),
= 2.00, p = .37; or years of teaching experience, M
= 18.43,
SD = 8.43 versus Mtreatment = 17.37, SD = 8.13, t277 = 1.07, p = .29.
Next, our manipulation was predicated on the assumption that teachers would feel com-
petent to provide objective, fair and useful feedback to their administrators. Had they felt
they could not competently provide administrators with feedback, no dissonance would be
aroused by concluding that students could not reasonably provide teachers with feedback
either. Our assumption appears reasonable. Teachers’ mean rating of 3.6 (SD = .68) on the
Support for Teacher-Perception Surveys scale is closer to the ‘quite’ than to ‘moderately’
response options on the scale. For example, in response to being asked ‘Overall, to what
extent is it a good idea for administrators’ evaluations to be based partly on teacher input?’
teachers’ mean response was closest to the ‘quite a good idea’ anchor.
Pre-specied hypothesis test
With these preliminary ndings in mind, we tested our primary hypothesis: that our inter-
vention would nudge teachers’ opinions about student-perception surveys in a positive
direction. As predicted, we found that teachers in the treatment condition supported stu-
dent-perception surveys more than their control counterparts while controlling for partici-
pants’ NNSTOY status (B = .23, SE = .10, CI: .04, .42). These between-group dierences
correspond to an eect size of
= .14, or Cohen’s d = .28. The condence interval excludes
0, indicating that the dierence between the group means is statistically reliable. Figure 1
shows the means and 95% condence intervals. As noted by Cumming (2014), overlapping
condence intervals should not be confused as being equivalent to a ‘non-signicant’ result,
‘If the two groups’ CIs overlap by only a moderate amount … approximately, p is less than
Figure 1.Mean differences and 95% confidence intervals for Support for Student-Perception Surveys by
condition controlling for whether teachers were members of the National Network of State Teachers of
the Year (or not).
.05’ (p. 13). On average then, the treatment teachers were close to ‘moderately’ endorsing
the idea of student-perception surveys while the control teachers were about half-way
between the ‘mildly’ and ‘moderately’ response options.
Exploratory analyses
We conducted three main types of exploratory analyses – analyses that should be viewed
as hypothesis generating or suggestive. The rst set of these additional analyses helped us
better understand our results and place them into context. Toward this end, we rst re-ran
our equation testing our core hypothesis without the NNSTOY covariate. Removing teachers’
NNSTOY status made essentially no dierence (B = .23, SE = .09, CI: .05, .41;
= .14). Similarly,
when we included the grade-level taught as a covariate in our original equation, the treat-
ment eect was essentially unchanged (B = .24, SE = .10, CI: .05, .43;
= .14).
We also wanted to know whether teachers’ support of student-perception surveys diered
based on whether or not they were NNSTOY members. We anticipated that NNSTOY teachers
probably received more positive feedback from students (and others) over time and thus
might be more open-minded about having their teaching practice evaluated by students.
To investigate this possibility, we regressed the Support for Student-Perception Surveys
composite on teachers’ NNSTOY status. Congruent with our assumption, we found that
NNSTOY teachers were more supportive of student-perception surveys than teachers who
have not received this recognition (B = .41, SE = .10, CI: .21, .62; β = .23). Similarly, we expected
that teachers of earlier grades would be more sceptical that their younger students would
have the capacity to provide trustworthy evaluations (as compared to teachers of older
students). We explored this assumption by regressing the Support for Student-Perception
Surveys composite on the (average) grade-level that teachers taught. Teachers of younger
students were, in fact, less likely to endorse student-perception surveys, (B = .04, SE = .01,
CI: .01, .06; β = .18).
Finally, Table 1a reveals that the treatment and control groups did not diverge on all items.
Specically, both groups’ scores were similar on the utility of student evaluations and the
potential for students to be objective; by contrast, bigger dierences appeared to emerge
for the ‘fairness’ and ‘good idea’ items.
The second set of exploratory analyses reect our attempt to learn more about the plau-
sibility of cognitive dissonance as the hypothesised mechanism driving the group dier-
ences. Presumably, for the Support for Teacher-Perception Surveys scale to inuence
treatment participants on the Support for Student-Perception Survey scale, their responses
at both the item and scale levels – should be correlated. Moreover, one might imagine that
the correlation between the parallel items from each scale that invoked implicit comparisons
might be higher than the correlation between parallel items that do not invoke such com-
parisons. For example, the ‘fairness’ item might invite respondents to think about whether
an activity that is fair for teachers to do would also be fair for students.
As shown in Table 2, each parallel item and the overall scales are signicantly correlated
at greater than r = .30. Furthermore, we see a particularly strong correlation between the
‘fairness’ item on the two scales (relative to the correlations between the other parallel items).
The nal analyses involved a follow-up survey that we conducted about three months
after the initial survey. Our hope was to use those participants (n = 234) who provided contact
information (for potential participation in focus groups) to gauge the persistence of the
eects of the intervention. In this follow-up, we re-administered only the scale on
student-perception surveys. About a third (32%) of the eligible participants responded. For
those in the treatment group (n = 31), opinions remained consistent over this three-month
span (
= 2.88, SD = .91; Mpost = 2.90, SD = .92). Their pre- and post-ratings also correlated
with each other strongly (r31 = .83).
While a potentially encouraging sign for the endurance of our intervention, this result
should be interpreted cautiously. The control teachers (n = 44) who completed both surveys
became slightly more positive over the three-month span (
= 2.77, SD = .67;
= 2.86,
SD = .62) and showed less stability in their opinions between these pre- and post-assess-
ments (r44 = .46).
Further analyses showed evidence of dierential rates of volunteering for the follow-up
survey. The subgroup of treatment participants who completed both surveys was similar to
the original treatment group (
Mpre - original
= 2.83, SD = .85 versus
Mpre - both
= 2.88, SD = .91).
However, the subgroup of control participants who completed both surveys was not repre-
sentative of the original control group (
Mpre - original
= 2.60, SD = .77 versus
Mpre - both
= 2.77,
SD = .67). In addition to the small sample sizes for these follow-up analyses, we found sam-
pling dierences between the participants who participated in both surveys (as compared
to the composition of the original sample) and dierences in consistency of opinions over
time for the two groups – all of which make interpretation of these results challenging.
Through a modest, dissonance-based intervention, we nd that asking teachers about
student-perception surveys in dierent ways can aect teachers receptivity to this evaluative
practice. Specically, we nd that juxtaposing questions on the viability of teachers evalu-
ating administrators with questions about the viability of students evaluating teachers makes
teachers more receptive to student-perception surveys as a component of their evaluation
(as compared to directly asking them about the viability of student-perception surveys).
Although the eect size of this intervention was modest, eect sizes should be calibrated
with respect to the magnitude of the intervention (Cumming, 2014). In this case, the inter-
vention was exceedingly brief (less than two minutes for most participants) and simple to
Despite being more suggestive in nature, the exploratory analyses provide additional
signals that participants’ responses on these surveys comport with what one would expect.
Table 2.Pearson (r) correlations for treatment participants between Support for Student-Perception
Survey and Support for Teacher-Perception Survey responses.
Notes: 1) N=151–152.
2) All correlations presented in the table above .20 are significant at the p<.05 level.
Student-Perception Surveys
Teacher-Perception Surveys 1) Fair .53 .29 .32 .36 .38 .45
2) Useful .27 .37 .21 .16 .25 .30
3) Objective .34 .25 .32 .28 .25 .34
4) Others .36 .29 .27 .33 .23 .35
5) Good idea .36 .30 .30 .26 .34 .38
6) Overall composite .53 .41 .41 .40 .41 .52
NNSTOY teachers are more open to student-perception surveys than their colleagues who
have not received the same recognition. Teachers of younger students view this evaluative
practice with less enthusiasm than their colleagues who teach older students. These two
ndings accord with the logic that (a) teachers who have received positive reinforcement
about their performance may be less apprehensive about being evaluated by students and
that (b) teachers intuit that older students are more capable of providing fair, objective,
potentially useful feedback. Because these explanations are speculative – our data do not
speak directly to either nding – these results oer potential avenues for future study.
In Table 1a, we also saw signs that the intervention aected certain aspects of teachers’
perceptions of student-perception surveys more than others. Specically, the intervention
did not appear to aect teachers’ perceptions of the utility of student feedback or their
concerns about students’ objectivity. Instead, it appears that the intervention most aected
teachers’ perceptions of fairness and whether student-perception surveys were a good idea.
This nding also helps rule out an alternative explanation that a mere ordering eect
caused the results. For instance, in ‘anchoring’ (Dillman et al., 2014), respondents answer
subsequent items with similar ratings as an initial item because of the standard that is brought
to mind by the initial item; in ‘anchoring and adjusting’ (Gehlbach & Barge, 2012) respondents
answer similar adjacent items with similar ratings. However, neither of these potential expla-
nations seem viable given that the intervention aected some items but not others.
Our next analyses sought to provide additional evidence regarding whether cognitive
dissonance seemed plausible as the explanatory mechanism. Identifying a causal mechanism
is inherently a speculative endeavour – for our research design, it is probably more reasonable
to expect to learn about the eects of causes rather than the causes of eects (Bullock, Green,
& Ha, 2010; Holland, 1986). In other words, for experimental designs such as ours it is easy
to articulate how groups dier on particular outcomes; describing which part of the inter-
vention is responsible for causing that dierence cannot be done with the same precision.
With this caveat in mind, our data are congruent with a cognitive dissonance explanation.
More specically, we nd that treatment participants’ responses on the two scales covary
(at both the item and scale levels). Had we found no correlation between the responses on
the scales, it would be hard to imagine that the cognitive dissonance from the juxtaposition
of the scales caused the responses on the second scale to be higher. In addition, the corre-
lations were particularly strong for the fairness item – an item likely to engender implicit
comparisons between the student- and teacher-perception surveys.
Our attempts to ascertain whether the eects of the intervention endured over time were
somewhat frustrated. Only a modest proportion of our original participants responded.
These respondents may have been reasonably representative of the larger treatment group.
Yet, it appeared that the control group of follow-up respondents were not representative of
the original control group. Specically, they held much more favourable initial views about
student-perception surveys as compared to the overall control group. Furthermore, the
control group showed much greater uctuation in their opinions over these three months
than their peers in the treatment group. All of these factors muddy our attempts to under-
stand the persistence of the intervention. However, our attempt to gauge persistence was
not devoid of information. Given the brief nature of the intervention, it would hardly have
been surprising if the treatment eects had disappeared over time (Rogers & Frey, 2015).
However, we nd no evidence that the more positive attitudes of those in the treatment
condition drifted back to baseline. Thus, while we are reticent to make a strong claim that
the eects endured, we can produce no evidence that they faded either. Consequently,
assessing the longevity of these eects seems like an especially important area for future
Finally, our study helps shed new light on a current debate in the cognitive dissonance
literature: Does behaviour need to be implicated for dissonance to occur, or can dissonance
result merely from incongruous cognitions that have no action implications (Brehm, 2007;
Harmon-Jones et al., 2015)? Past studies on the ‘even-handedness eect’ (Dillman et al., 2014)
suggest that, in at least some cases, dissonance can occur without implications for a respond
ent’s behaviour. However, these studies asked respondents about topics that they were
unlikely to have thought about much and that were largely irrelevant to their personal lives
(i.e. freedoms for communist vs. western reporters in one example). We asked teachers about
a topic of clear personal relevance, but which lacked clear action implications for them.
Because of the clear personal relevance, one might have anticipated that their attitudes
might be more deeply held, and thus more resistant to change simply by being brought
into conict with another cognition. Yet, our study nds that the treatment group still shifted
their attitude towards student-perception surveys relative to the control group. This nding
provides additional evidence congruent with the notion that cognitive dissonance may
occur through conicting cognitions alone; action implications may not always be
In addition to the problems that arose in our attempts to learn about the duration of the
eects of the intervention, other limitations of the study are important to weigh. One obvious
issue is that the study provides only minimal evidence about what the mediating mechanism
might be. Our theory is that participants in the treatment group have dierent attitudes
towards student-perception surveys because they experienced a form of cognitive disso-
nance. However, other explanations may well be plausible and additional evidence to sup-
port (or disconrm) our explanation would clearly strengthen our study.
Perhaps the most prominent question is the extent to which the sample might aect the
validity of the ndings. One version of this question revolves around internal validity. Does
having a high proportion of nationally recognised teachers (and their friends and colleagues)
in the sample jeopardise the integrity of the intervention? All participants were randomly
assigned to condition, random assignment appeared to work (so far as we could check it),
and we controlled for NNSTOY status. As a result of these checks and safeguards, we cannot
come up with a plausible story as to how the internal validity might be threatened by the
The second question is whether the sample aected the external validity or generalisa-
bility of the results. This possibility seems more concerning. Relative to a nationally repre-
sentative sample of US teachers, our sample was more accomplished. While the level of
accomplishment is clear for the NNSTOY teachers in our sample, it seems possible that the
colleagues and associates of these teachers are also stronger and/or more experienced
teachers than typical US teachers. As such, many teachers in our sample may have received
more positive reinforcement about their teaching over the years than typical teachers. As a
result, teachers in our sample might be more open to student-perception surveys as a com-
ponent of how they are evaluated. So one potential threat to external validity is that a more
typical population of teachers would be so averse to the use of student-perception surveys
that a modest intervention such as this one could not possibly work.
On the other hand, an equally compelling story might be told that NNSTOY teachers (and
their colleagues) are suciently condent in their teaching capacities, that they are relatively
unafraid of student-perception surveys as an evaluation component. Consequently, the
eects of the intervention may have been muted on this relatively elite sample of teachers.
In this case, the threat to validity would be that the eects of our intervention would be
stronger on a more typical population of teachers than the eects found in this study. The
range of scores for each item and the overall Support for Student-Perception Surveys com-
posite all extended from 1 to 5. This suggests that we did obtain a diverse sample of teachers
with respect to their views on student-perception surveys. Presumably some of them are
relatively representative of a more typical sample of US teachers. However, like almost all
studies, the real test for the external validity of this study lies in replication attempts with
varied samples.
With these limitations in mind, we want to be appropriately cautious about the potential
implications of this study. However, assuming that the intervention could be replicated on
future populations of teachers, we think these ndings raise two especially intriguing pos-
sibilities. The rst is a practical policy consideration. If school administrators wish to nudge
their teachers to be more open regarding student-perception surveys, they may want to
consider whether teachers should have opportunities to evaluate administrators. If future
research suggests that the intervention worked, in part, because of a norm of evenhanded-
ness (Dillman et al., 2014) or reciprocity (Cialdini, 2009), expanding the scope of these types
of evaluations seems reasonable to entertain. A number of businesses have employed ‘360
degree evaluations’ – a system in which any given individual receives feedback from subor-
dinates, peers, and managers – as part of a cultural norm in their organisations (Peiperl,
2001). Perhaps schools might benet from a similar approach.
Second, we think our ndings signal some promise for the use of surveys as interventions.
While typically thought of as data collection tools, surveys can be used to shift respondents’
attitudes and beliefs. At times, surveys-as-interventions have been used with nefarious inten
tions, particularly in politics. The practice of push-polling consists of setting up a fraudulent
poll in which a large number of respondents are typically asked a relatively small number
of questions about a single candidate or issue where the questions are uniformly negative
(AAPOR, 2007). The intent of these ‘polls’ is not to collect data but rather to push the opinions
of voters by sowing seeds of doubt about particular candidates or issues. Other instances
of surveys-as-interventions have been for more neutral purposes – e.g. to illustrate response
order eects in survey design as described in the introduction. However, we think that sur-
veys as interventions might be used to positively impact educational outcomes. The present
study serves as a proof of concept for one such instance – our intervention shows how
support might be generated for particular school policies. Providing individuals with feed-
back from surveys oers a related type of intervention that also may yield positive benets
for educational settings (Gehlbach et al., 2016). Thus, there may be future possibilities for
scholars to use surveys as interventions that might help facilitate desired educational
To our knowledge, this study is the rst of its kind to leverage a survey as an intervention to
shift teachers’ beliefs – in this case, about the viability of using student-perception surveys
as a component of their evaluation system. While much remains to be learned about the
ecacy of this particular intervention – with respect to other populations of teachers and
to the longevity of the eects – the basic approach oers some new ways to think about
constructing interventions in education.
We expect that some school leaders might perceive a technique such as this to be too
‘manipulative’ for their tastes. Though they may be reluctant to use this survey approach in
their own schools, perhaps they may still perceive potential benets from employing
360-degree evaluations. Other school leaders will likely view this survey as no more manip-
ulative than the array of positive and negative reinforcers already used in schools (e.g. linking
teachers’ pay with their students’ standardised test scores as a means to bolster teachers’
eort, or giving students extra recess for good behaviour). Perhaps school leaders might
even use this intervention directly – for example, by having teachers complete a survey
similar to the treatment group’s version prior to a faculty meeting where the school’s eval-
uation system is under discussion. They might use this approach to begin a conversation
around the costs and benets of implementing a more comprehensive evaluation system
for all school personnel.
Thus, we presume that employing an intervention such as this one will be more appealing
to some school leaders than others. However, we also hope that this type of survey-as-
intervention approach sparks some creative new developments in how researchers think
about improving an array of educational outcomes.
The authors are grateful to the Katherine Bassett, Bob Williams and their team at the National Network
of State Teachers of the Year for their tremendous support in conducting this study and thoughtful
comments on a draft of this manuscript.
Disclosure statement
No potential conict of interest was reported by the authors.
This work was supported by the Gevirtz Graduate School of Education.
Hunter Gehlbach
AAPOR. (2007). AAPOR Statements on “Push” Polls. Retrieved from
Aronson, J., Fried, C. B., & Good, C. (2002). Reducing the eects of stereotype threat on African American
college students by shaping theories of intelligence. Journal of Experimental Social Psychology, 38,
Baker, E. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., … Shepard, L. A. (2010).
Problems with the use of student test scores to evaluate teachers (EPI Brieng Paper #278). Retrieved
Berliner, D. C. (2013). Eects of inequality and poverty vs. teachers and schooling on America’s youth.
Teachers College Record, 115(12), 1–26.
Brehm, J. W. (2007). A brief history of dissonance theory. Social and Personality Psychology Compass, 1,
381–391. doi:10.1111/j.1751-9004.2007.00035.x
Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy
answer). Journal of Personality & Social Psychology, 98, 550–558.
Chetty, R., Friedman, J. N., & Rocko, J. E. (2011). The long-term impacts of teachers: Teacher value-added
and student outcomes in adulthood (17699). Washington, DC: National Bureau of Economic Research.
Cialdini, R. B. (2009). Inuence: Science and practice (5th ed.). Boston, MA: Pearson.
The Colorado Education Initiative. (2015). Using Student Perception Survey Results in Educator
Evaluations. Retrieved from
Cromidas, R. (2012). Survey of students about student surveys yields mixed opinions. Retrieved from
Cumming, G. (2014). The new statistics. Psychological Science, 25, 7–29. doi:10.1177/0956797613504966
Darling-Hammond, L. (2010). The at world and education: How America’s commitment to equity will
determine our future. New York, NY: Teachers College Press.
Decker, G. (2012). Student surveys seen as unlikely evaluations element, for now. Retrieved from https://
Dillman, D. A., Smyth, J. D., & Christian, L. M. (2014). Internet, phone, mail, and mixed-mode surveys: The
tailored design method (4th ed.). Hoboken, NJ: Wiley.
Ferguson, R. F. (2012). Can student surveys measure teaching quality? Phi Delta Kappan, 94, 24–28.
Festinger, L. (1962). Cognitive dissonance. Scientic American, 207, 93–106.
Freedman, J. L., & Fraser, S. C. (1966). Compliance without pressure: The foot-in-the-door technique.
Journal of Personality and Social Psychology, 4, 195–202. doi:10.1037/h0023552
Gawronski, B. (2012). Back to the future of dissonance theory: Cognitive consistency as a core motive.
Social Cognition, 30, 652–668. doi:10.1521/soco.2012.30.6.652
Gehlbach, H. (2015). Seven survey sins. The Journal of Early Adolescence, 35, 883–897.
Gehlbach, H., & Barge, S. (2012). Anchoring and adjusting in questionnaire responses. Basic and Applied
Social Psychology, 34, 417–433. doi:10.1080/01973533.2012.711691
Gehlbach, H., & Brinkworth, M. E. (2011). Measure twice, cut down error: A process for enhancing the
validity of survey scales. Review of General Psychology, 15, 380–387. doi:10.1037/a0025704
Gehlbach, H., Brinkworth, M. E., King, A. M., Hsu, L. M., McIntyre, J., & Rogers, T. (2016). Creating birds
of similar feathers: Leveraging similarity to improve teacher–student relationships and academic
achievement. Journal of Educational Psychology, 108, 342–352. doi:10.1037/edu0000042
Gehlbach, H., & Robinson, C. (manuscript under review). Mitigating illusory results through pre-
registration in education.
Good, T. L. (2014). What do we know about how teachers inuence student performance on standardized
tests: And why do we know so little about other student outcomes? Teachers College Record, 116(1),
Harmon-Jones, E., Harmon-Jones, C., & Levy, N. (2015). An action-based model of cognitive-dissonance
processes. Current Directions in Psychological Science, 24, 184–189. doi:10.1177/0963721414566449
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81,
Kane, T. J., McCarey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identied eective teachers? Validating
measures of eective teaching using random assignment. Seattle, WA: Bill & Melinda Gates Foundation.
Koedinger, K. R., Booth, J. L., & Klahr, D. (2013). Instructional complexity and the science to constrain
it. Science, 342, 935–937. doi:10.1126/science.1238056
Kraft, M. A., & Gilmour, A. F. (2016). Revisiting the widget eect: Teacher evaluation reforms and the
distribution of teacher eectiveness (Working Paper). Brown University.
Martinie, M. A., Milland, L., & Olive, T. (2013). Some theoretical considerations on attitude, arousal
and aect during cognitive dissonance. Social and Personality Psychology Compass, 7, 680–688.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses
and performances as scientic inquiry into score meaning. American Psychologist, 50, 741–749.
MET Project. (2012). Asking student about teaching. Retrieved from
National Center for Educational Statistics. (2013). Digest of education statistics. Retrieved from https://
Peiperl, M. A. (2001). Getting 360 degrees feedback right. Harvard Business Review, 79, 142–147, 177.
Rogers, T., & Frey, E. (2015). Changing behavior beyond the here and now. In The Wiley Blackwell
Handbook of Judgment and Decision Making (pp. 723–748).
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology. Psychological Science,
22, 1359–1366. doi:10.1177/0956797611417632
Steinberg, M. P., & Donaldson, M. L. (2016). The new educational accountability: understanding the
landscape of teacher evaluation in the post-NCLB era. Education Finance and Policy, 11, 340–359.
Taylor, E. S., & Tyler, J. H. (2012). The eect of evaluation on teacher performance. American Economic
Review, 102, 3628–3651. doi:10.1257/aer.102.7.3628
TEAMTN. (2015). Tennessee educator acceleration model: A Tennessee department of education
website. Overview. Retrieved from
Thompson, B. (1996). AERA editorial policies regarding statistical signicance testing: Three suggested
reforms. Educational Researcher, 25, 26–30.
Walton, G. M., & Cohen, G. L. (2011). A brief social-belonging intervention improves academic and health
outcomes of minority students. Science, 331, 1447–1451. doi:10.1126/science.1198364
Wilkerson, D. J., Manatt, R. P., Rogers, M. A., & Maughan, R. (2000). Validation of student, principal and
self-ratings in 360” feedback® for teacher evaluation. Journal of Personnel Evaluation in Education,
14, 179–192. doi:10.1023/A:1008158904681
Appendix 1.
Descriptions of the measures used in this study:
The 5-item Support for Student-Perception Surveys scale:
How fair is it for student-perception
surveys to be one of the sources of
information in assessing your teaching
Not fair at all Mildly fair Moderately
Quite fair Extremely
How useful is it for you to receive
feedback on your teaching from your
Not at all
Mildly useful Moderately
Quite useful Extremely
How objectively can your students
assess your teaching performance?
Not at all
How supportive do you think other
teachers are of using student-
perception surveys to assess teaching
Not at all
Overall, to what extent is it a good idea
to have teachers’ per formance reviews
be partially based on student input?
Not a good
idea at all
A mildly good
ately good
Quite a good
good idea
The 5-item Support for Teacher-Perception Surveys scale:
How objectively can teachers evaluate
their administrators?
Not at all
How fair is it for teacher-perception
surveys to be one of the sources in
assessing the performance of school
Not fair at all Mildly fair Moderately
Quite fair Extremely
How supportive do you think other
teachers are of using teacher-
perception surveys to assess
administrators’ performance?
Not at all
How useful is it for administrators to
receive feedback on their job
performance from their faculty?
Not at all
Mildly useful Moderately
Quite useful Extremely
Overall, to what extent is it a good idea
for administrators’ evaluations to be
based partially on teacher input?
Not a good
idea at all
A mildly
good idea
A moderately
good idea
Quite a good
An extremely
good idea
Note: For each item, the response options were scored on a 1-through-5 system where 1=‘Not at all’ and 5=‘Extremely’
... Scholars lamented that students neither have the knowledge nor the skills to assess all facets of teaching, and the surveys' measures in themselves may be incapable of effectively capturing these facets (e.g., Goe et al., 2008). Additionally, it can be challenging to convince teachers to use SPS (Gehlbach et al., 2018), particularly when such surveys are part of a summative evaluation process that involves teacher fear of biased or lightly taken feedback. Being a relatively isolated form of assessment (i.e., disconnected from teacher surveys to principals), SPS can also fail to capture the 'whole story' and thus have representation issues (Finefter-Rosenbluh, 2020a;Schulz et al., 2014). ...
... This research illustrates how SPS are interpreted and 'translated' by the teachers (policy actors), rather than simply implemented as 'practice changers', which is the intention of student voice-based assessment policies. It illuminates the consistency of teacher resistance to SPS; complementing studies noting the importance of identifying teachers' particular needs and enhancing their trust in SPS prior to implementing such surveys (Gehlbach et al., 2018)dcreating an inclusive survey implementation plan that acknowledges teachers' diverse needs (Finefter-Rosenbluh, 2020a). ...
Student perception surveys (SPS) play an increasing assessment role envisioned to improve teaching. Yet, it is unclear what their impact is on teachers' practice. Data collected from Australian teacher interviews, student focus groups and SPS based on validated frameworks of effective teaching revealed no significant change in teachers' practice; illustrating student skepticism in the power of their voice, and teacher resistance and struggle to act upon SPS. Contributing to the literature on effective teaching, teacher resistance and student voice, the study's implications call for teacher education that treats SPS as diagnostic tools that empower teachers in their search for autonomy.
... This qualitative descriptive/interpretive study (Elliott & Timulak, 2005) -part of a larger project that aimed to examine the extent of teachers' support for SPS (Gehlbach, Robinson, Finefter-Rosenbluh, Benshoof, & Schneider, 2018) -sought to understand the multidimensional understanding of SPS among US public school teachers, and, more particularly, identify what the experience with such surveys mean to teachers. Focusing on teachers who had experienced some kind of SPS (see Teacher classification, Table 1), the study interrogated the teachers' understanding of such surveys in the hope of providing a thick description of the complex formation of SPS in schools. ...
... This study's findings suggest that contemplating the use of SPS from a principal's perspective -that is, stepping outside the constraints of their own immediate biased frames of reference (Gehlbach, Brinkworth, & Harris, 2012) -may deepen teachers' understanding and extend their perceptions of SPS. The findings also complement a previous study that showed how teachers' perceptions of student surveys could be bolstered when reflecting on the use of teacher surveys to principals (Gehlbach et al., 2018). ...
Full-text available
For decades, policymakers deliberated whether student perception surveys (SPS) should include a component of teacher evaluation programmes in schools. However, much research has focused on SPS’ reliability and validity, and little is known about teachers’ interpretation of SPS or what preparation should be instituted before administering such surveys. Guided by a qualitative descriptive/interpretive approach, this paper draws upon 20 teacher interviews from different public schools in 14 US states. Teachers’ understanding of SPS appeared to provide insight into their self-efficacy beliefs in accountability-driven systems. Taking the perspective of principals illustrated teachers’ valuing SPS mostly as formative assessment. SPS also stood as isolated voice-based forms of evaluation, offering limited understanding of the educational processes when disconnected from an inclusive 360-degree feedback culture, grounded in principles of reciprocity and even-handedness. The paper holds promise for policymakers, implementers and educators seeking to buttress support for the use of voice initiatives in schools.
... Specifically, educators will have to determine how the evidence provided by students could be assessed to estimate whether instructional designs meet their needs in a reliable and valid manner. If teachers feel doubtful about the validity of student feedback, it is less likely for them to use the feedback for improvement (Gehlbach, Robinson, Finefter-Rosenbluh, Benshoof, & Schneider, 2017). As such, establishing a culture of trust or a mutual feedback system will potentially increase educators' willingness and openness to utilizing PLSI to support the design of a learning environment that aligns with UDL to support learner variability and students' PL experiences. ...
The purpose of this study was to develop and content validate a student self-report instrument that could be used to measure whether a learning environment supports personalized learning (PL) experiences of students in middle and high schools (i.e., Grades 6 through 12). This instrument, the Personalized Learning Supporting Instrument (PLSI), was developed according to an inclusive instructional design framework called Universal Design for Learning (UDL). Seven experts in UDL were recruited to evaluate whether PLSI consists of appropriate items for the constructs developed to measure student perceptions of three core UDL instructional design elements and two desired outcomes of UDL implementation. Preliminary evidence from this study showed that PLSI yielded an excellent level of item-level content validity index (I-CVI) for relevance across all items. Additionally, PLSI yielded an average scale-level content validity index (S-CVI) of 0.97 for relevance and an average S-CVI of 0.99 for clarity across all constructs.
... When feedback is solicited through the medium of surveying and transferred back to relevant stakeholders for the purpose of diagnosis and intervention, it is called survey feedback (intervention) (Nadler, 1976). Throughout the industrial and organizational (IO) psychology literature, this is generally referred to as "survey feedback," whereas such interventions can also be applied in different contexts, as for example education or research (e.g., Gehlbach et al., 2018). In the work context, survey feedback interventions entail systematic data collection and feeding the results back to organizational members (Nadler, 1976). ...
Full-text available
Employee surveys are often used to support organizational development (OD), and particularly the follow-up process after surveys, including action planning, is important. Nevertheless, this process is oftentimes neglected in practice, and research on it is limited as well. In this article, we first define the employee survey follow-up process and differentiate it from other common feedback practices. Second, we develop a comprehensive conceptual framework that integrates the relevant variables of this process. Third, we describe the methods and results of a systematic review that synthesizes the literature on the follow-up process based on the conceptual framework with the purpose of discussing remaining research gaps. Overall, this paper contributes to a better understanding of the organizational and human factors that affect this process. This is useful for practitioners, as it provides guidance for the successful implementation of this human resource practice. For example, research suggests that it is important to enable managers as change agents and to provide them with sufficient resources.
... These deals are only a drop in the ocean of state and edu-business collaborations (see Chowdhury & Ha, 2014;Sidhu 2005). Other "inspection initiatives" include the use of student feedback surveys to evaluate teachers (see in Hanover Research, 2013; The Colorado Education Initiative, 2019), with 17% of the largest American districts administering some form of such surveys (Steinberg & Donaldson, 2016), entangling mixed responses from teachers (Finefter-Rosenbluh, 2020a;Gehlbach, Robinson, Finefter-Rosenbluh, Benshoof, & Schneider, 2018). Survey companies like the Tripod Education Partners (2019) help reformers assess and evaluate school quality, assisting them in governing from afar and present their "normalized" calculation of effective education through the lens of consumerism and commodification (Ball, 2004). ...
Critically considering the history of educational assessment, this analysis problematizes the way in which certain constructions of assessment have achieved privileged status over others in the past two centuries in Western discourses, particularly in the US educational landscape. The analysis adopts the position that a centralized, authoritarian control through various government mechanisms has resulted in the gradually diminishing power of school leaders and teachers, who once had the responsibility of not only designing learners’ assessment tasks, but of presenting or “exhibiting” their outcomes to the public. It traces the épistémès of both thought and practice and the way in which standardized testing has become an end in itself rather than a means of assessment and improvement; an “unquestioned” social-educational norm, capturing state agents’ push for quantitative measures that do not fully appreciate the complexity of teaching and learning. In the name of neoliberal agendas and economics, these calculative power discourses have shaped public understandings of educational “performance,” identifying standardized tests as the key tool to control funding entitlement and other incentives. Using the Foucauldian framework of archaeology, the analysis portrays the ideas, assumptions, beliefs, ideologies, and theories which have formed, evolved, and ultimately normalized the ruptures and discontinuities resulting in reductive standardized assessment. In developing an intellectual archaeology of evaluation, the analysis offers the concept of reproductive power as a way to capture the circularity of mechanisms intended to centralize the power for decision-making and administration in the hands of state policy actors. It concludes with commentary on future trends in evaluation and assessment.
... Over the years, other examples of teacher monitoring or teacher surveillance have been reported by studies highlighting the way educators are scrutinised in schools, including by harsh accountability reforms (Koretz 2017); frequent class observation by peers and administrators (Finefter-Rosenbluh 2016;Page 2015); school inspections (Choi 2017;Perryman 2006;Perryman et al. 2018); CCTV cameras (Perry-Hazan and Birnhack 2019); technologies that enable data collection (Hope 2015;Selwyn 2011); and student voice activities (e.g., Gehlbach, Robinson, Finefter-Rosenbluh, Benshoof and Schneider 2018) under the guise of school improvement initiatives, when in fact, they encapsulate a rationality of governmentality (Finefter-Rosenbluh 2020;Page 2017Page , 2018. ...
Full-text available
Widespread neoliberal approaches to education consider schools increasingly accountable for self-management and 'client' recruitment , encapsulating economic ideologies that assume privatisation is essential for social progress. With an ever-shifting landscape of market-driven policies and the increasing growth of private education settings, more research is needed to cast light on emerging or under-researched aspects of autonomous schools. Located within a U.S. state that has strict constraints on tax subsidies for religious K-12 education, this paper investigates how an extensive form of decentralisation corresponds with schools' discipline and ethical environment. It analyses teacher interviews and web documents from faith-based, autonomous schools in a state that has devolved power and authority for decision-making to parents and other independent 'agents', having a distanced relationship with its non-state 'actors'. The paper follows Foucault's use of the metaphor of the panopticon and adopts his power analysis to examine the nature of parental control and its influence on disciplinary and ethical practices. Evidence suggests that these autonomous schools are driven by a 'mini-public' ideology that constrains educators' autonomy and generates particular disciplinary norms; entangling ethical, educational, and social ramifications, including teacher resistance and teacher demoralisation. Implications for policy are discussed in this context of control. ARTICLE HISTORY
Ensuring teacher effectiveness within higher education contexts continues to shape institutional policy discourse and practice. However, there is limited research exploring how assessment practices correspond with teachers’ and students’ perceptions of teacher effectiveness, particularly in vocational higher education settings. Acknowledging the complexities of assessing vocational competence and the dearth of literature exploring teacher and student views on vocational assessment practices, this study interrogates the perceptions of 13 teachers and 15 students from five Indonesian vocational higher education institutions to better understand what constitutes fair assessment as an aspect of effective teaching. This study examines how teachers and students are positioned within the assessment process and the existing opportunities to make decisions regarding assessment design and processes. It shows (i) discrepancies in teachers’ and students’ perceptions of student positionality within assessment processes; (ii) blurry lines between ideas of ‘assessment contract’ and ‘assessment practice’; and (iii) contested views on student learning processes and attitudes as attributes of vocational competence. The study illuminates the tensions between students’ and teachers’ positionality within assessment practices, highlighting the need for increased student participation while maintaining teachers’ desire to make assessment decisions.
This paper draws upon Foucault’s problematisation of governmentality analysis to explore teacher interviews from Australian secondary schools, where student voice was ‘enacted’ within a teacher assessment reform strategy. By bringing teacher voices into relation with theory, it illustrates how the current ‘sociality of performativity’ is situating student voicebased assessment initiatives as power apparatuses of teacher surveillance that shape teacher-student relationships. The analysis portrays teachers’ responses to such ‘techniques of power’, employing forms of auditable commodification, physical proximity, and reflective practice as a means of managing student voice ‘risk’. In so doing, the teachers relegated teacher- student relationships to the margins, struggling to profess an ethic of care; paradoxically disadvantaging students through voice initiatives intended to advance them. Demonstrating how affective fundamentals are eclipsed by performative-invested practices, the analysis highlights the discursive policy contestations of rapport and performance that should be taken into consideration in future implementations of student voice-based assessment initiatives.
Full-text available
Psychological intervention targeting distress is now considered an integral component of inflammatory bowel disease (IBD) management. However, significant barriers to access exist which necessitate the development of effective, economic, and accessible brief and remote interventions. Acceptance and commitment therapy (ACT) is a therapy with demonstrated acceptability and a growing evidence base for the treatment of distress in IBD populations. The present paper trialled two brief ACT interventions via randomized multiple baseline designs. Study 1 trialled a single-session ACT intervention (delivered face-to-face and lasting approximately two hours) targeting stress and experiential avoidance, respectively. Participants were seven people with an IBD diagnosis who presented with moderate to extremely severe stress (five females, two males; M age = 39.57, SD = 5.74). The findings of study 1 indicate that a single-session ACT intervention represented an insufficient dosage to reduce stress and experiential avoidance. Study 2 investigated a brief telehealth ACT intervention (delivered via a video conferencing platform and lasting approximately four hours) targeting stress and increased psychological flexibility. Participants (N = 12 people with an IBD diagnosis and mild to extremely severe stress) completed baselines lasting from 21 to 66 days before receiving a two-session ACT telehealth intervention supplemented by a workbook and phone consultation. Approximately half of participants experienced reduced stress, increased engagement in valued action, and increased functioning. Despite shortcomings such as missing data and the context of COVID-19, the present findings suggest that brief ACT interventions in this population may be effective and economic, though further research and replications are necessary.
Introduction. In educational institutions of law enforcement agencies, the presence of feedback from cadets and an understanding of their attitude to the educational process is relevant in connection with the implementation of various types of educational and service-professional activities. Educational, patriotic and educational, research and service and professional activities are carried out continuously. The purpose of this article is to study the attitude of cadets and teachers to the educational process at the University of the Ministry of Emergency Situations, and taking into account the information received, to determine priority areas for improving the quality of the educational process and the readiness of graduates for professional activity. It is of interest how adequately the teaching staff of the departmental university of the Ministry of Emergency Situations assesses the attitude of cadets to various types of training sessions and other types of educational and service-professional activities. Materials and Methods. To achieve the goals of the research, the methods of interviewing, questionnaire survey, and conversation were used, which allow, through multiple choice, to obtain objective results from the respondents on the issues included in the study area. The methodology is implemented by means of a questionnaire survey in the digital format of the Fire Test program, which allows automated surveys of registered persons. With the help of statistical data processing tools, the information obtained has been systematized, analyzed and generalized, on the basis of which reliable research results are presented and conclusions are drawn. Results. According to the results of the questionnaire survey, it was revealed that the attitude of the teaching staff to the educational process and the attitudes of cadets to various types of training sessions and other types of educational and service-professional activities do not always correspond to each other. The attitude of teachers to various types of training corresponds to the attitude of cadets to practical classes, but differs in relation to lectures, seminars and independent work. The obtained comparative results indicate a generally positive assessment, both on the part of the trainees and on the part of the teaching staff, of practical classes. At the same time, teachers underestimate the importance of lectures and independent work, which are highly appreciated by students, and demonstrate a reassessment of seminars and extracurricular activities for students who have a positive perception of these types of classes is not so high. The implementation of the presented activities, which teachers consider to be a priority, allows to increase the effectiveness of training sessions and form a positive attitude of cadets to them. Discussion and Conclusions. The results of the study at the University of the Ministry of Emergency Situations made it possible to assess the degree of mutual understanding between the subjects of the educational process and to identify priority areas for improving the quality of training sessions and forming the cadets’ positive attitude to these courses.
Full-text available
When people perceive themselves as similar to others, greater liking and closer relationships typically result. In the first randomized field experiment that leverages actual similarities to improve real-world relationships, we examined the affiliations between 315 9th grade students and their 25 teachers. Students in the treatment condition received feedback on 5 similarities that they shared with their teachers; each teacher received parallel feedback regarding about half of his or her 9th grade students. Five weeks after our intervention, those in the treatment conditions perceived greater similarity with their counterparts. Furthermore, when teachers received feedback about their similarities with specific students, they perceived better relationships with those students, and those students earned higher course grades. Exploratory analyses suggest that these effects are concentrated within relationships between teachers and their “underserved” students. This brief intervention appears to close the achievement gap at this school by over 60%.
Full-text available
Background/Context: This paper arises out of frustration with the results of school reforms carried out over the past few decades. These efforts have failed. They need to be abandoned. In their place must come recognition that income inequality causes many social problems, including problems associated with education. Sadly, compared to all other wealthy nations, the USA has the largest income gap between its wealthy and its poor citizens. Correlates associated with the size of the income gap in various nations are well described in Wilkinson & Pickett (2010), whose work is cited throughout this article. They make it clear that the bigger the income gap in a nation or a state, the greater the social problems a nation or a state will encounter. Thus it is argued that the design of better economic and social policies can do more to improve our schools than continued work on educational policy independent of such concerns. Purpose/Objective/Research Question: The research question asked is why so many school reform efforts have produced so little improvement in American schools. The answer offered is that the sources of school failure have been thought to reside inside the schools, resulting in attempts to improve America's teachers, curriculum, testing programs and administration. It is argued in this paper, however, that the sources of America's educational problems are outside school, primarily a result of income inequality. Thus it is suggested that targeted economic and social policies have more potential to improve the nations schools than almost anything currently being proposed by either political party at federal, state or local levels. Research Design: This is an analytic essay on the reasons for the failure of almost all contemporary school reform efforts. It is primarily a report about how inequality affects all of our society, and a review of some research and social policies that might improve our nations' schools. Conclusions/Recommendations: It is concluded that the best way to improve America's schools is through jobs that provide families living wages. Other programs are noted that offer some help for students from poor families. But in the end, it is inequality in income and the poverty that accompanies such inequality, that matters most for education.
Full-text available
Background/Context Since the 1970s, researchers have attempted to link observational measures of instructional process to student achievement (and occasionally to other outcomes of schooling). This paper reviews extensively both historical and contemporary research to identify what is known about effective teaching. Purpose/Objective Good, after reviewing what is known about effective teaching, attempts to apply this to current descriptions of effective teaching and its application value for practice. Good notes that much of the “new” research on effective teaching has simply replicated what has been known since the 1980s. Although this is not unimportant (since it shows that older findings still pertain to contemporary classrooms), it is unfortunate that research has not moved beyond the relationship between general teacher behavior (those that cut across subject areas) and student achievement (as measured by standardized tests). How this information can be applied and the difficulty in using this information is examined in the paper. Research Design The paper is a historical analysis and reviews research on teaching from the 1960s to today. Conclusion This paper has stressed that our data base on effective teaching is limited— still it has some implications for practice. Even though the knowledge base is limited, there is no clear knowledge that teachers-in-training learn and have the opportunity to practice and use. It would seem that teacher education programs would want to assure that their graduates, in addition to possessing appropriate knowledge, would also have clear conceptual understanding and skills related to active teaching, proactive management, communication of appropriate expectations for learning, and the ability to plan and enact instruction that balances procedural and conceptual knowledge. Future research on the use of this knowledge base and its effects in teacher education programs would be informative. If done correctly, research on teaching can improve instruction. However, the research must be applied carefully if it is to have useful effects. And, as noted often in this paper, research must consider outcomes of schooling other than achievement such as creativity, adaptability, and problem finding.
In 2009, the New Teacher Project’s The Widget Effect documented the failure of U.S. public school districts to recognize and act on differences in teacher effectiveness. We revisit these findings by compiling teacher performance ratings across 24 states that adopted major reforms to their teacher evaluation systems. In the vast majority of these states, the percentage of teachers rated unsatisfactory remains less than 1%. However, the full distributions of ratings vary widely across states, with 0.7% to 28.7% rated below proficient and 6% to 62% rated above proficient. We present original survey data from an urban district illustrating that evaluators perceive more than 3 times as many teachers in their schools to be below proficient than they rate as such. Interviews with principals reveal several potential explanations for these patterns.
Behavioral decision researchers have provided naturally suggesting innovations in how interventions are structured. There has been an explosion in the use of randomized field experiments across the social sciences. This chapter uses the insights and perspective of behavioral science to analyze how field interventions to improve societal well-being work over time. It discusses features of interventions that affect their success at bridging this intervention-behavior lag. The chapter explores why interventions might or might not continue to have an impact with each successive round of treatment. It unveils several pathways that might lead to the persistence of treatment effects after an intervention has been discontinued. Interventions targeting a range of societal challenges are also elucidated. The chapter also talks about two ways that thoughts can be momentarily changed so as to immediately induce target behaviors: the framing of risky choices and changing the accessibility of target behavior relevant goals.
In the past five years, teacher evaluation has become a preferred policy lever at the federal, state, and local levels. Revisions to teacher evaluation systems have made teachers individually accountable for student achievement to a greater extent than ever before. We describe and analyze the components, processes, and consequences embedded in new teacher evaluation policies in all fifty states, the twenty-five largest school districts, and Washington, DC. We contextualize these policies by basing our analysis in prior research on teacher evaluation, and examining key comparisons between state and district policies, including their treatment of teachers in tested and untested subjects with career and beginning teachers. We find notable differences in how states and the largest districts have structured evaluation policies for all teachers and, in particular, for early career teachers compared with their more veteran counterparts, and for teachers in nontested grades and subjects compared with those in tested grades and subjects.
Problems involving causal inference have dogged at the heels of Statistics since its earliest days. Correlation does not imply causation and yet causal conclusions drawn from a carefully designed experiment are often valid. What can a statistical model say about causation? This question is addressed by using a particular model for causal inference (Rubin, 1974; Holland and Rubin, 1983) to critique the discussions of other writers on causation and causal inference. These include selected philosophers, medical researchers, statisticians, econometricians, and proponents of causal modelling.
The action-based model extends the original theory of cognitive dissonance by proposing why cognitive inconsistency causes both dissonance and dissonance reduction. The model begins by assuming that many perceptions and cognitions automatically impel us to act in specific ways. It then posits that the negative affective state of dissonance is aroused not by all cognitive conflict but, specifically, when cognitions with action implications are in conflict with each other, making it difficult to act. The dissonance signals to the organism that there is a problem and that the cognitive inconsistency needs to be resolved so that behavior can occur. After presenting the action-based model, we review results from behavioral and neuroscience experiments that have tested predictions derived from it.
observable teacher characteristics like graduate education and experience (beyond the first few years) are not typically correlated with increased productivity. Many researchers and policymakers have suggested that, under these conditions, the only way to adjust the teacher distribution for the better is to gather information on individual productivity through evaluation and then dismiss low performers. This paper offers evidence that evaluation can shift the teacher effectiveness distribution through a different mechanism: by improving teacher skill, effort, or both in ways that persist long-run. We study a sample of mid-career math teachers in the Cincinnati Public Schools (CPS) who were assigned to evaluation in a manner that permits a quasi-experimental analysis. All teachers in our sample were evaluated by a year-long classroom observation–based program, the treatment, between 2003–2004 and 2009–2010; the timing of each teacher’s specific evaluation year was determined years earlier by a district planning process. To this setting we add measures of student achievement, which were not part of the evaluation, and use the within-teacher over-time variation to compare teacher performance before, during, and after their evaluation year. We find that teachers are more productive during the school year when they are being evaluated, but even more productive in the years after evaluation. A student taught by a teacher after that teacher has been through the Cincinnati evaluation will score about 10 percent of a standard deviation higher in math than a similar student taught by the same teacher before the teacher was evaluated. Under our identification strategy, these estimates may be biased by patterns of student assignment that favor previously evaluated teachers, or by preexisting positive