Educational Assessment Evaluation and Accountability

Published by Springer Nature
Online ISSN: 1874-8600
Print ISSN: 1874-8597
Learn more about this page
Recent publications
Three main themes and codes consisting of each theme
Sociodemographic background of the study participants
Although there has been intense criticism of NAPLAN in educational policy debates in Australia, little scholarly efforts have been made to understand the underlying cognitive mechanisms that contribute to the public narrative about the national testing program. We aim to provide tentative evidence about the way public perceptions about NAPLAN may be formed. Our results show empirical support for the incentive, interpretative, and institutional effects, which suggest ways that national testing program can be improved. That is, it needs to (a) provide a diverse range of incentives to promote people’s self-interest (incentive effect); (b) demonstrate good alignment with the core values, social norms, and attitudes of the given society (interpretative effect); and (c) build a consensus about the institutional use of the test results (institutional effect). We conclude with practical implications and recommendations about seeking public support for the seemingly unpopular national educational policy.
 
Model for lower secondary teachers’ beliefs about changes in instruction and perceived pressure from various stakeholders
Model for upper secondary teachers’ beliefs about changes in instruction and perceived pressure from various stakeholders
Goodness-of-fit indices of measurement models
Means, standard deviations and group differences for sources of pressure variables
One of the main aims of national assessment programmes is to improve the efficacy of education systems; realizing this aim often takes the form of implementing a variety of accountability measures. Using assessment results for accountability purposes is highly controversial, while one of its undesirable impacts is that it generates negative attitudes towards educational assessments among teachers. The aim of this study is to examine lower and upper secondary teachers’ ( N = 1552) opinions and beliefs about testing and, more specifically, about the national assessment programme in Hungary. A questionnaire was used to explore teachers’ beliefs about the effects of the assessment system on how they teach, perceived pressure from stakeholders, teachers’ acceptance of assessment programmes and the relationship between these beliefs. Results show that assessment programmes compel teachers to revise their teaching practices — some change to make meaningful gains in student learning, while others turn to practices that are not conducive to a genuine improvement in students’ knowledge, focussing instead on assessment scores. Pressure from inside the school (colleagues and school leaders) and teachers’ attitude towards assessments bring about changes in instruction, such as the reallocation of coaching and improvement in teaching. Sources of pressure outside school (local government and the media) have an indirect effect on changes in teaching because their pressure influences in-school motivators. Pressure from parents and students is felt directly by teachers, but only in limited areas. The results demonstrate that a national assessment programme has a more significant impact on teaching in lower than in upper secondary schools.
 
Citation map for assessment competence including assessment competence (black hourglass), assessment literacy (orange triangle), and publications not included in this review (blue box). 1 = Standards (AFT, NCME, & NEA, 1990); 2 = Stiggins (1991a); 3 = Herppich et al. (2018). The larger blue box indicates the large number of studies that Herppich et al., (2018) included in their model of assessment competence that, if shown here, would negatively impact the interpretability of the citation map. Only publications cited by Herppich et al. (2018) and already included in this review are shown here
Citation map for assessment literacy including assessment literacy (orange triangles), assessment competence (black hourglass), assessment capability (pink circle), assessment identity (yellow diamonds), and publications not included in this review (blue box). 1 = Standards (AFT. NCME, & NEA, 1990); 2 = Stiggins (1991b); 3 = Popham (2009a); 4 = Popham (2011); 5 = Willis et al. (2013); 6 = Xu and Brown (2016); 7 = Inbar-Lourie (2008); 8 = Fulcher (2012); Hay and Penney (2009). Similar to Fig. 2, large blue boxes represent the large number of studies included by publications that developed models of assessment literacy (i.e. Pastore & Andrade, 2019; Xu & Brown, 2016). Only publications cited by these models and already included in this review are shown here
Citation map for assessment capability including assessment capability (pink circle) and publications not included in this review (blue box). 1 = Absolum et al. (2009)
Citation map for assessment identity including assessment identity (yellow diamond), assessment literacy (orange triangle) and publications not included in this review (blue box). Similar to Figs. 1 and 2, the large blue box represents the large number of studies included by a publication that developed models of assessment identity (i.e. Looney et al., 2018). Only publications cited by this model and already included in this review are shown here. 1 = Wyatt-Smith et al. (2010); 2 = Looney et al. (2018)
Over the past three decades, policy and professional standards have repeatedly called teachers to integrate assessment continuously across their practice in various ways to identify, monitor, support, evaluate, and report on student learning. Educational researchers have conceptualized and operationalized multiple constructs to understand teachers’ classroom assessment practice including ‘assessment competency,’ ‘assessment literacy,’ and later, ‘assessment capability’ and ‘assessment identity.’ The result of these multiple constructs presents a constellation of assessment discourses, which have influenced contemporary educational policies and professional development practices across systems, shaping understandings of teachers’ assessment work. Yet of concern is the resulting confusion that ensues when multiple discourses related to the same professional responsibility proliferate in a short timeframe, arising from dissimilar historic foundations, and each replete with epistemological assumptions and unique connotations for practice. As such, our aim in this paper is to critically map the constellation of assessment capacity discourses through a scoping review methodology to examine how these related discourses have been conceptualized for pre-service or in-service teachers. Driving this analytic mapping was the following research question: How are assessment competence, assessment literacy, assessment capability, and assessment identity conceptualized in peer-reviewed research? Specifically, we were interested in analyzing the evolution of each construct over time (i.e. since the introduction of the construct into peer-reviewed literature) and space (i.e. geography), and in considering how the constructs contribute toward a current view of teachers’ assessment work. To this end, our provides the basis for theorizing new directions and possibilities for supporting teachers’ in their assessment roles and responsibilities.
 
School self-evaluation process-procedure
Research design model
School self-evaluation implementation plan
Model of the qualitative findings regarding the education process—nutrition and health—relationships and communication and participation in management at the school
School self-evaluation based on multiple data sources has been used to evaluate the quality of education at schools. This study aims to reveal the state of the education process, health, safety, relations and communication, and participation in management at school. The study is conducted with a case study design, in a public secondary school. The researchers developed a School Self-Evaluation Model Supporting School Development and applied it to this study. In this study, two study groups were formed through purposeful sampling. Data was obtained from students, parents, teachers, and school administrators during the quantitative phase of the study, and these stakeholders were interviewed during the qualitative phase. In addition, while directing the implementation of their planned approach at the school, the authors conducted observations. The materials used to support the quantitative and qualitative data were also evaluated, and possible indicators were identified. According to the research findings, the stakeholders had good perceptions of the education processes in the school. Furthermore, the views of the stakeholders based on health, safety, relationships, and communication, participation in school management are also generally positive. However, it has been determined that there are some problems in student nutrition services, student relations, and participation in school management. All told, the stakeholders have generally satisfied with the education process at school. These findings have pointed out that school self-evaluation is very applicable for the constant improvement of schools based on various evidence.
 
This paper presents new evidence that explores the strengths, weaknesses and overall quality of school inspection practice in China. In one city region of Shandong Province, the research examines stakeholder perceptions of inspection purposes, processes and outcomes, as well as the potential to improve inspection practice and compulsory education quality in China. A mixed-methods empirical design was employed to conduct the research involving ten purposively selected junior high schools. Data collection methods included a survey of 364 teachers and headteachers and 13 stakeholder interviews with headteachers, teachers, as well as city and national inspectors, and an educational officer. The survey data were analysed through descriptive analysis and a repeat-measures one-way ANOVA, and the interview data were analysed thematically. The findings supply new empirical evidence regarding the context specificity of school inspection in China and identify pertinent issues regarding school inspection quality. This study overall argues that school inspection criteria and methods in Shandong province and more broadly in China could be improved by taking better account of stakeholder views and school contexts and by putting more stress on providing school-based professional improvement guidance integrated within or alongside inspection processes, instead of just intense bureaucratic monitoring of inspection outcomes.
 
Binary logistic regression model (significant variables)
Although studies have been conducted on educators' perceptions of assessment practices , few studies explored students' perceptions of the ethical issues in classroom assessment. A mixed design research method was used to examine factors associated with students' perceptions of the ethicality of classroom assessment practices. A sample of 1996 undergraduate students enrolled in 177 colleges and universities in China participated in the quantitative phase of the study to complete a survey measuring students' perceptions regarding the ethicality of classroom assessment. In the qualitative phase, 579 participants responded to the open-ended questions concerning their justification of the ethicality of the individual assessment situations. Quantitative analyses indicated that students' gender, grade level, major, and program were associated with their perceptions of the ethicality of multiple assessment practices (i.e., multiple assessment, surprise items, considering effort and attendance in grading, and giving feedback). Qualitative analyses showed that conflicting needs of different stakeholders in classroom assessment (i.e., student needs vs. assessment needs, teacher needs vs. assessment needs, student needs vs. student needs) were associated with their perceptions. Findings of the current study offer insights for teachers regarding how to make classroom assessment practices ethical based on diverse needs of stakeholders involved in assessment.
 
Distribution of births in the year by grade retention. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data
a Students’ standardised scores in reading, complete sample. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data. b Students’ standardised scores in reading by grade retention. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data
a Students’ standardised scores in mathematics, complete sample. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data. b Students’ standardised scores in mathematics by grade retention. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data
a Students’ standardised scores in science, complete sample. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data. b Students’ standardised scores in science by grade retention. Note: All OECD recommended practices (final student weights, BRR weights and five plausible values) have been employed. Source: Authors’ own calculations from PISA 2006, 2009, 2012 and 2015 Spanish students’ data
Grade retention has been the focus of the education debate in Spain for decades. On average, more than 30% of students have repeated at least one grade before they finish (or dropout from) their compulsory studies. The present research provides new evidence on this issue by investigating the influence of Spain’s school entry age upon students’ grade retention. Using data from 15-year-old students who participated in the PISA 2006, 2009, 2012 and 2015 assessments, we implement a regression discontinuity analysis. Our key finding is that students who were born late in the year (younger students) are more likely to repeat a grade. Yet, once they reach secondary education, the disadvantage they suffer due to their younger school starting age seems to disappear. Hence, the key reason why younger students have lower PISA scores than older students in Spain is due to their increased likelihood of repeating a grade, rather than being due to their relative age per se. To avoid these artificial disadvantages of younger students and unfair retention, we suggest that policymakers inform families about this school entry issue and also make the school entry law more flexible. This would facilitate parents of younger children to choose whether to delay their children’s school enrolment or not.
 
Examples of the nature and diversity of tasks from a Finnish classroom
Examples of how teachers may sequence lessons. Case A—the teacher has a clearly sequenced lesson, where all instances of high-quality feedback occur in the last segment; case B—the teacher lets the students work independently or in pairs on the task at hand, walks around, and provides individual guidance. One instance of high-quality feedback occurs in each of the three segments
Capturing and measuring instructional patterns by using standardized observation manuals has become increasingly popular in classroom research. While researchers argue that a common vocabulary of teaching is necessary for the field of classroom research to move forward, instructional features vary across classrooms and contexts, which poses serious measuring challenges. In this article, we argue that potential biases embedded in observation systems have to be identified and addressed in order for interpretations of results across different classrooms and contexts to be valid and relevant. We identify three aspects of possible systematic biases (related to the grain size of conceptualization, operationalization, and sequencing of lessons) and how these may influence ratings of instructional quality when an established observation system (the Protocol for Language Arts Teaching Observations [PLATO]) is applied in the contexts of Nordic mathematics classrooms. We discuss implications of such possible biases and make suggestions for how they may be addressed.
 
One-dimensional (left side) and multidimensional (right side) modeling of pre-service teachers’ professional knowledge for teaching early literacy. Note: CK content knowledge, PCK pedagogical content knowledge, GPK general pedagogical knowledge
Item–person map of three-dimensional Rasch scaling
Means and 95% confidence interval of test scores by groups. CK content knowledge, PCK pedagogical content knowledge, GPK general pedagogical knowledge
This study suggests a comprehensive conceptualization of teacher knowledge for teaching early literacy in primary schools. Following the discourse on the professional knowledge of teachers, we argue that teachers’ knowledge relevant to support reading and writing at the beginning of primary school education is multidimensional by nature: Teachers need content knowledge (CK), pedagogical content knowledge (PCK), and general pedagogical knowledge (GPK). Although research on teacher knowledge has made remarkable progress over the last decade, and in particular in domains such as mathematics, relevant empirical research using standardized assessment that would allow in-depth analyses of how teacher knowledge is acquired by pre-service teachers during teacher education and how teacher knowledge influences instructional quality and student learning in early literacy is very scarce. The following research questions are focused on: (1) Can teachers’ professional knowledge for teaching early literacy be conceptualized in terms of CK, PCK, and GPK allowing empirical measurement? (2) How do teachers acquire such knowledge during initial teacher education? (3) Is teachers’ professional knowledge a premise for instructional quality in teaching early literacy to students? We present the conceptualization of teacher knowledge for teaching early literacy in primary schools in Germany as the country of our study and specific measurement instruments recently developed by our research group. Assessment data of 386 pre-service teachers at different teacher education stages is used to analyze our research questions. Findings show (1) construct validity of the standardized tests related to the hypothesized structure, (2) curricular validity related to teacher education, and (3) predictive validity related to instructional quality. Implications for teacher education and the professional development of teachers are discussed.
 
The present study explores the antecedents of frst- and second-generation (1G and 2G) immigrant students’ academic performance using PISA 2018 data. The study draws on an international sample of 11,582 students from 534 schools in 20 countries and focuses on PISA schools that catered to a mix of 1G and 2G students. The study explores the role that student attributes, student-perceived peer and parental support, school provisions, and school equity-oriented policies have on immigrant student academic achievement. The analysis involved specifying three separate stepwise multi-level regression models for mathematics, science, and reading achievement. Findings suggested that, at the within-school level, perceived parental support and teacher enthusiasm and the adaption of instruction were associated with improved academic performance, while student experience of bullying was associated with more substantive negative academic outcomes. At the betweenschool level, the opportunity to participate in creative extracurricular activities was associated with improved academic performance. In contrast, a higher proportion of 1G students and the overall perceived level of bullying of immigrant students were associated with substantively negative academic outcomes between schools. Tests of moderation efects suggested that parental emotional support appeared to be of particular relevance to 1G students’ math and reading outcomes, while enhanced SES status appeared to be specifcally relevant to improved science and reading outcomes for 1G students. Implications for policy and practice are discussed.
 
Diagram of the proposed teacher learning progression
This theoretical piece discusses the concept of a teacher learning progression in an attempt to integrate teacher learning and assessment. From the authors’ perspective, the main features of the teacher learning progression are the longitudinal understanding of teacher knowledge and practice, and the opportunity to align teacher evaluations’ formative and summative purposes. Criteria to assess existing teacher learning progressions are proposed and used to examine examples of teacher assessment systems implemented in different parts of the world. The concept of teacher learning progression has national and international implications for teacher training, for teaching assessment and for the design and implementation of educational policies.
 
Graphical representation of the six operationalisations used in the current study. * = not applied to PISA datasets. TIMSS 2019 grade 8 dataset used as an example. The first science achievement plausible value was used as the proxy for achievement
Academic resilience captures academic success despite adversity and thus is an important concept for promoting equity within education. However, our understanding of how and why rates of academic resilience differ between contexts is currently limited by variation in the ways that the construct has been operationalised in quantitative research. Similarly, comparing the strength of protective factors that promote academic resilience is hindered by differing approaches to the measurement of academic resilience. This methodological variation has complicated attempts to reconcile disparate findings about academic resilience. The current study applied six commonly used operationalisations of academic resilience that combined different thresholds of high risk and high achievement, to three international large-scale assessments, to explore how these different operationalisations impacted the findings produced. The context of Aotearoa New Zealand was chosen as a case study to further academic resilience research within this context and investigate how academic resilience manifests in an education system with relatively high levels of average achievement alongside low levels of educational equity. Within international large-scale assessment datasets, prevalence rates differed markedly across subject areas, grade levels, and collection cycles, as a function of the measure of academic resilience employed, while the strength of protective factors was more consistent. Thresholds that were norm-referenced produced more consistent findings across the different datasets compared to thresholds that were criterion-referenced. High levels of missing data prevented the analysis of some datasets, and differences in the way that key constructs were measured undermined the comparability of findings across international large-scale assessments. The findings emphasise the strengths and limitations of utilising international large-scale assessment data for the study of academic resilience, particularly within the Aotearoa New Zealand context. Furthermore, the study highlights that researchers' methodological decisions have important impacts on the conclusions drawn about academic resilience. Supplementary information: The online version contains supplementary material available at 10.1007/s11092-022-09384-0.
 
A contextual framework for moderation
Suitable execution of moderation policy is challenging but crucial for the trustworthiness and credibility of internal high-stakes assessment systems. In formal education, policies are rarely implemented as intended. Instead, they are enacted in ways infuenced by mediating factors including the internal and external contexts of organisations. Ball, Maguire and Braun’s (2012) contextual-dimensions heuristic provides a conceptualisation of organisation-specifc contexts, which is useful when the organisation is the unit of analysis. However, comprehensive analysis of policy enactment—including that relating to moderation—warrants consideration of contexts narrower in scope than whole organisations and wider in scope than individual organisations. In this article, we modify Ball and colleagues’ heuristic, incorporating Biggs’ (1993) application of systems theory, to develop a new contextual framework for moderation that is applicable on multiple scales and enables such analysis. This framework is applied to a selection of contemporary moderation studies with scopes that vary from one course, to jurisdiction-wide, to illustrate its utility. Our framework captures the hierarchy of embedded, interacting systems within which moderation is enacted and makes contextual relationships visible, allowing consideration of perspectives between units of analysis. Our framework provides a nuanced conceptualisation of context that distinguishes between material and human factors, and intrinsic and extrinsic contexts. We present potential uses of the framework for education organisations, central agencies and researchers including as a tool for identifying contextual factors involved in executing moderation initiatives and identifying possible pressures, tensions and enablers.
 
Consistency measures of benchmark classifications as compared with the classifications made by the model that included all of the covariates. Consistency measures for school VA scores in math are shown on the right and school VA scores for language on the left. Below the plots, the color of the dots indicates the inclusion (black) or exclusion (white) of the respective covariate sets
Range of percentiles resulting from math (white) and language (gray) VA scores for five example schools. Every dot represents the school VA percentile as obtained from a certain VA model. The VA models with all the covariates included are marked in black. At the 25th and 75th percentiles, there are cut-off lines to define the border between schools classified as “needs improvement,” “moderately effective,” and “highly effective”
There is no final consensus regarding which covariates should be used (in addition to prior achievement) when estimating value-added (VA) scores to evaluate a school's effectiveness. Therefore, we examined the sensitivity of evaluations of schools' effectiveness in math and language achievement to covariate selection in the applied VA model. Four covariate sets were systematically combined, including prior achievement from the same or different domain, sociodemographic and sociocultural background characteristics, and domain-specific achievement motivation. School VA scores were estimated using longitudinal data from the Luxembourg School Monitoring Programme with some 3600 students attending 153 primary schools in Grades 1 and 3. VA scores varied considerably, despite high correlations between VA scores based on the different sets of covariates (.66 < r < 1.00). The explained variance and consistency of school VA scores substantially improved when including prior math and prior language achievement in VA models for math and prior language achievement with sociodemographic and sociocultural background characteristics in VA models for language. These findings suggest that prior achievement in the same subject, the most commonly used covariate to date, may be insufficient to control for between-school differences in student intake when estimating school VA scores. We thus recommend using VA models with caution and applying VA scores for informative purposes rather than as a mean to base accountability decisions upon. Supplementary information: The online version contains supplementary material available at 10.1007/s11092-022-09386-y.
 
Participants’ concept of empathy in light of the concept proposed by Mercer & Reynolds (2002)
Places to study at medical schools are scarce, which makes well-designed selection procedures employing criteria with predictive validity for good students and doctors necessary. In Germany, the pre-university grade point average (pu-GPA) is the main selection criterion for medical school application. However, this is criticised. According to a decision by the Federal Constitutional Court, selection must be supplemented with a criterion other than the pu-GPA. Empathy is a core competency in medical care. Therefore, it seems to be an appropriate criterion. This study evaluates the feasibility of an empathy questionnaire and empathy appraisal by a panel for applicant selection. We employed a sequential explanatory mixed-methods design. Results of self- and external assessments of empathy were compared in a quantitative analysis. Thereafter, the concept of empathy and the approach to empathy appraisal by the selection panel members were explored qualitatively in six focus groups with 19 selection panel members using a semi-structured guideline. Transcripts were content analysed using both deductive and inductive coding. We found no significant correlation of self- and external empathy assessment (ρ(212) = − .031, p > .05). The results of the focus groups showed that, while panel members judged the external empathy assessment to be useful, they had neither a homogenous concept of empathy nor an implicit basis for this assessment. This diversity in panel members’ concepts of empathy and differences in the concepts underlying the Davis Interpersonal Reactivity Index seem to be the main reasons for the lack of correlation between self- and external empathy assessments. While empathy is a possible amendment to established selection criteria for medical education in Germany, its external assessment should not be employed without training panel members based on an established theoretical concept of empathy and an objective self-assessment measure.
 
Predicted school readiness (TS GOLD total score) for 3- and 4-year-old preschool students as a function of child-initiated instructional time. Student and classroom covariates are held constant
Standardized mean differences in school readiness (TS GOLD total score) predicted from Model 7 (see Table 7) as a function of child-initiated instructional time and student age. Other student and classroom covariates are held constant
Although research suggests that the use of child-initiated vs. teacher-directed instructional practices in early childhood education has implications for learning and development, the precise nature of these effects remains unclear. Using data from the Midwest Child-Parent Center (CPC) Expansion Project, the present study examined the possibility that a blend of child- and teacher-directed practices best promotes school readiness among preschoolers experiencing high levels of sociodemographic risk and explored whether the optimal blend varies based on child characteristics. Sixty-two CPC preschool teachers reported their instructional practices throughout the year, using a newly developed questionnaire—the Classroom Activity Report (CAR). The average reported proportion of child-initiated instruction was examined in relation to students’ end-of-year performance on a routine school readiness assessment (N = 1289). Although there was no main effect of child-initiated instruction on school readiness, there was a significant interaction between instruction and student age. Four-year-olds’ school readiness generally improved as the proportion of child-initiated time increased, while 3-year-olds showed a U-shaped pattern. The present findings add to the evidence that child-initiated instruction might support preschoolers’ school readiness, although they also suggest this relation may not always be linear. They also point to the importance of examining instructional strategies in relation to student characteristics, in order to tailor strategies to the student population. The CAR has potential as a brief, practical measurement tool that can support program monitoring and professional development.
 
The quadrant model of flow (Csikszentmihalyi & Csikszentmihalyi, 1988)
Illustrative diagrams of factor models compared for the Flow State Scale-2 and Dispositional Flow Scale-2: a single-factor model, b multi-factor model, c hierarchical model. Squares represent item indicators, and small circles represent error terms
Illustrative diagram depicting a hypothetical model in which four dimensions of flow are causal antecedents of the construct and five dimensions represent the actual experience of it (Moneta, 2012)
The validity of inferences made with test results depends on meeting the assumptions of the test users, one of which is the presumption of optimal performance (i.e., test-takers are doing their best). Flow theory identifies the conditions under which optimal performance is achieved and can be used to inform test users about the degree to which optimal performance has been attained. This sequential explanatory mixed-methods study examined the presence of flow in 159 middle-school test-takers during a high-stakes standardized test of math and reading achievement. Strong evidence was obtained for multiple facets of validity for measuring flow, including internal consistency reliability; structural validity via confirmatory factor analysis; concurrent and predictive validity via correlations among state and trait flow, test anxiety, and test performance; and convergent validity via structured participant interviews. Flow theory provides a road map to test stakeholders for fostering motivation and optimal performance in adolescent test-takers.
 
Early childhood care and education (ECCE) programs are an important mechanism for supporting foundational skill development and successful progression through later schooling. School readiness assessments that serve as reliable indicators of children’s later educational outcomes are most useful for education systems, but little is known about how well-existing assessments predict primary school performance in low- and middle-income countries. This study uses longitudinal data from Ghana to investigate how the International Development and Early Learning Assessment (IDELA) predicts future reading and math skills as measured by the Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA). Results demonstrate significant and meaningful associations between IDELA scores in ECCE and early primary school academic skills. We find that all domains of development measured by IDELA (i.e., Emergent Literacy, Emergent Numeracy, Motor Development, and Social-emotional Development) are predictive of later academic performance, and that the domains of Emergent Literacy and Emergent Numeracy are the strongest predictors of EGRA and EGMA. Our findings indicate that it is feasible to monitor children’s school readiness with strong predictive validity across several years of schooling, and points to several promising avenues for future research for education monitoring systems across the region.
 
A model of competence as a continuum (Blömeke et al., 2015)
Multidimensional adapted process model of teaching (MAP)
In the present study, we aimed to specify the key competence domains perceived to be critical for the teaching profession and depict them as a comprehensive teacher competence model. An expert panel that included representatives from seven units providing university-based initial teacher education in Finland carried out this process. To produce an active construction of a shared understanding and an interpretation of the discourse in the field, the experts reviewed literature on teaching. The resulting teacher competence model, the multidimensional adapted process model of teaching (MAP), represents a collective conception of the relevant empirical literature and prevailing discourses on teaching. The MAP is based on Blömeke et al.’s, Zeitschrift für Psychologie, 223, 3–13, (2015) model which distinguishes among teacher competences (referring to effective performance of teachers’ work), competencies (knowledge, skills, and other individual competencies underlying and enabling effective teaching), and situation-specific skills of perceiving, interpreting, and making decisions in situations involving teaching and learning. The implications of the MAP for teacher education and student selection for initial teacher education are discussed.
 
CFA Models. a Unidimensional model. b Six-correlated factor solution
Bifactor (S-I) model
The calculation of the three components
The Bracken School Readiness Assessment (BSRA) has been used in large studies such as the Millennium Cohort Study (MCS). Important conclusions might be done regarding its reliability for the prediction of children’s school readiness taking advantage of such large-scale evaluation. Although BSRA has being largely used, few are the studies at item-level under latent approach investigating its psychometric features. Using data from 14,899 2–3-year olds who participated in the MCS, we used Bayesian confirmatory factor analysis to examine multidimensionality of the subtests of the BSRA and their consistencies, specificities, and reliabilities. We found clear indications of multidimensionality. From the 88 items, 10 showed low reliability. Future research may consider excluding these low reliability items to improve the psychometric properties of the BSRA and its use as multidimensional measurement tool.
 
The use of data for governance purposes has been widely recognised as a way for national authorities to coordinate their activities across administrative levels and improve educational quality. This places the mid-central authority-in many countries the municipal level-in the midst of modern education governing. This article reports a case study analysis of the particular uses of performance data and numbers by mid-central municipal authorities in the daily work of governing schools in Nor-way. The three empirical case studies combine an analysis of policy document and fieldwork interviews with municipal administrators. The article contributes important insights into the role of municipal administrators as interpreters of policy goals at a crucial yet understudied level of the education system. In contrast to the dominant perspective in the data use literature, which often addresses implementation and the effectiveness of how numbers and data can be ideally designed and used, the results provide grounds for a more nuanced understanding of the institutional processes related to setting performance goals.
 
There is an extensive need for school systems to reliably assess the data literacy and data use skills of their educators. To address this need, the current study seeks to refine the NU Data Knowledge Scale (NUDKS) for assessing teacher data literacy for classroom data. A data-based decision-making framework provides the theoretical underpinnings for the instrument. The study’s objective is to refine the NUDKS such that items are located at various points along the data literacy continuum. In this fashion, the NUDKS should be able to measure teacher data literacy throughout the data literacy continuum. To this end, item response theory is used to provide the estimates of the items’ locations and teacher data literacy. Analyses revealed that the NUDKS conformed to the Rasch model. To facilitate the future use of the NUDKS, concordance tables were created to provide a quick determination of teacher data literacy.
 
Increasing job duties and responsibilities associated with the changing role of school principals have prompted even greater accountability. As a result, principals are faced with competing demands and expectations in various forms of accountability from multiple stakeholders. This study examines principals’ perception of accountability in the context of work intensification with a particular focus on the question of “accountable to whom and why.” A total of 1434 practicing principals responded to an online survey that sought to determine the groups and individuals to whom principals feel accountable, and why principals feel accountable to those particular individuals or groups. The survey achieved a response rate of 52.68%. The research results show balancing competing accountabilities concerning students has become a daunting task for school principals. The competing if not conflicting expectations from (federal and state/provincial) educational authorities, teachers, parents, students, and various interest groups often pose significant challenges to principals’ work and add to the complexity of principals’ role. The unrealistic expectations imposed on principals make it imperative to critically examine the changing role of school principals and identify essential and legislatively mandated duties and responsibilities of principalship to better reflect and address their intensified work realities.
 
Statewide distribution of principal’s classroom observation ratings for the 2016–2017 school year for three teaching practices in the Network for Educator Effectiveness (NEE)
Diagram of exploratory sequential mixed-method design of the study showing the priority qualitative (QUAL) strand and the secondary quantitative (quan) strand
A concerning attribute of teacher evaluations across countries is the systemic leniency of principals during classroom observations. However, little is known about the motivations behind this phenomenon. The purpose of this study is to explore the motivating factors behind principals’ leniency in an authentic teacher evaluation system. In this study, we apply an explanatory-sequential mixed-method design. Using focus groups (qualitative strand; n = 15 principals) and a state-wide survey (quantitative strand; n = 364 principals), we apply goal theory to investigate influences on principals’ ratings in a Midwestern state in the USA. Results suggest that multiple goals may drive principals during observations. These include the following: (1) providing accurate ratings and feedback to teachers, (2) keeping teachers open to growth-promoting feedback, (3) supporting teachers’ morale and fostering positive relationships, (4) avoiding difficult conversations, (5) managing limited time wisely, and (6) maintaining self-efficacy as an instructional leader. Implications are that principals hold beneficial goals that may compete with accuracy when evaluating teachers, and that contextual differences in evaluation systems may influence the way principals act upon these goals. When responding to systemic leniency in teacher evaluations, solutions should increase accuracy in ways that minimally interfere with principals’ other beneficial goals.
 
A simplified illustration of principal-agent relationships within education
The cross-national relationship between the extent of school accountability and the percent of staff stress by accountability. Accountability scale derived using PISA 2018 data, based upon how headteachers use student assessment data, how achievement data are disseminated to stakeholders and whether external evaluation is used in quality assurance. Higher values on this scale indicate greater levels of school accountability. OLS regression estimate illustrated by dashed line. Pearson correlation = 0.31 in panel a and 0.32 in panel b
Accountability—the monitoring and use of student performance data to make judgements about school and teacher effectiveness—is increasing within school systems across the globe. In theory, by increasing accountability, the aims and incentives of governments, parents, school leaders and teachers become more closely aligned, potentially improving student achievement as a result. Yet, in practice, concerns are mounting about the stress that accountability is putting schools and teachers under. This paper presents new evidence on this issue, drawing upon data from more than 100,000 teachers across over 40 countries. We find evidence of a modest, positive correlation between school system accountability and how stressed teachers and headteachers are about this aspect of their job. When looking within schools, there is little evidence that the management practices of headteachers differ when they report feeling stressed about accountability, or that they transmit these feelings onto their staff. However, we do find strong evidence of ‘emotional contagion’ of stress amongst colleagues within schools, with teachers more likely to feel stressed by accountability if their colleagues do as well.
 
PISA Exclusion Rates 2009 – 2018 by Province. NL: Newfoundland; PE: Prince Edward Island; NS: Nova Scotia; NB: New Brunswick; QC: Quebec; ON: Ontario; MN: Manitoba; SK: Saskatchewan; AB: Alberta; BC: British Columbia
2015 PISA Mathematics Average Scores for Canadian Provinces and their Exclusion Rate (%). NL: Newfoundland; PE: Prince Edward Island; NS: Nova Scotia; NB: New Brunswick; QC: Quebec; ON: Ontario; MN: Manitoba; SK: Saskatchewan; AB: Alberta; BC: British Columbia
PISA Exclusion Rates (%) and Averages from 2009 – 2018 Stratified by Three Ways of Reporting the Number of Non-writers. The percent excluded from PISA include students excluded due to a disability. The percent absent represent the student percent of students absent on the day of the assessment and the percent of non-participants include students who did not write the PISA for other reasons
The purpose of this study was to examine patterns and factors influencing exclusion rates and achievement on large-scale assessments in Canada. Data analysis employed a case study to examine policies and practices of exclusion, absenteeism, and social promotion related to large-scale assessments at the international, national, and provincial levels. In addition, information was solicited from assessment experts regarding exclusion rate practices. Findings revealed significant increases in student performance, which paralleled significant increases in exclusion rates. At the provincial level, the analysis led to the discovery of the relationship between social promotion policies and a document guiding assessment practices in Canada (i.e., Principles of Fair Assessment Practices). This relationship was the rationale given for excluding poor performing students, not learning or physically disabled students, from participating in large-scale assessments. Recommendations include an alignment of exclusion policies between the three levels of administration, documenting students who are unable to participate on large-scale assessment because they are operating too far below grade level as a result of being social promoted, and to over sample provinces and schools with high absenteeism on assessment days.
 
Data analysis process
Despite the widely acknowledged pro-learning function of formative assessment and its wide adoption around the globe, the gaps between policy intention, interpretation and implementation remain a problem to be solved. While this problem is noted universally, it could be particularly serious in China, where Confucian Heritage Culture is deeply ingrained and education development is not quite balanced. This study, via interview data with English teachers and deans from eight universities in an undeveloped region of the Mid-western China, explores the overall environment for a formative assessment initiative that is currently in place. Data analysis reveals multiple issues, such as insufficient support, improper dissemination and ineffective training at the meso-level and the instructors’ limited assessment ability, large class sizes and student’s resistance at the micro-level. A conclusion is thus drawn that the overall environment in this region is by no means favourable for the effective implementation of formative assessment, and implications are derived for better realisation of assessment innovations in this and other undeveloped regions of China.
 
In this study, we examine children’s National Assessment Program—Literacy and Numeracy (NAPLAN) achievement predictors, which may enable or limit their numeracy performance and assess the relative importance of the predictor variables. Our data source was the NAPLAN numeracy results of Queensland schools from 2014 to 2017. Years 3 and 5 children’s NAPLAN numeracy scores were analysed using a hierarchical multiple regression model. We examined eight variables grouped into four themes to determine their predictive value for children’s numeracy performance in NAPLAN. Findings from this study indicate that parent’s educational level, parent’s occupation and indigenous status variables accounted for 10–11% of the total variance, while geolocation and sector type contributed an additional 0.2–0.4% of the variance. Gender and language background other than English (LBOTE) contributed 0.1–0.4% of the variance. These results were consistent across levels (Years 3 and 5) and test years (2014–2017). When these predictors were controlled, the influence of parent’s post-school education and LBOTE status were less and non-significant. Previous NAPLAN numeracy results for Year 5 children were found to be very large in its predictive value (R2 = 0.50). The implications of these results for teachers, parents and researchers are described.
 
Plots of interaction effects of country and academic discipline by assessment type
This study examined differences in the assessment criteria used by the USA and Spanish university instructors to assign course grades. The US sample included two hundred and fifty course syllabi (159 from universities and 91 from 4-year colleges) developed by randomly selected instructors from five academic disciplines (education, math, science, psychology, and English). Spanish data set included 175 syllabi, chosen from the national database from the same five domains. The results revealed that university instructors employed a number of criteria when assigning course grades, with the US instructors relying equally on process and product criteria, and Spanish instructors using a higher proportion of product indicators. We also found that self- and peer assessment were used scarcely between the two countries and that no syllabi employed progress criteria. Theoretical, practical, and policy implications are discussed along with avenues for further research.
 
Studies that focus on measurement invariance are of significant importance in proving the validation of high-stake tests, and in order to provide fairness from the results of these exams for special needs students. The aim of this study is to examine the measurement invariance of the “Central Examination for Secondary Education Institutions” in Turkey according to participant disability status. A focus group comprised of 369 visually impaired students was formed. An equal number of non-visually impaired peers were randomly selected as a reference group. Mantel-Haenszel, logistic regression, Breslow-Day, and standardization methods of classical test theory were used in order to detect items with differential item functioning (DIF). DIF analysis results proved that 16 (17.78%) of the 90 test items indicated DIF, and that ten of the DIF detected items (62.5%) represented a disadvantage for visually impaired participants. A total of 17 experts were consulted in order to investigate item bias. As a result of the collective expert opinion, five items were found to be “biased” in the Turkish (n = 1), English (n = 2), and in Science (n = 2) subtests. Close agreement was obtained between the experts that the “biased items” favored non-visually impaired participants. Use of visuals/graphics, complex/lengthy texts and response options, the need for rereading questions, and the negative attitudes of readers/coders were pointed out as sources of item bias.
 
IPA grid
IPA grid of teaching quality dimensions
Students’ feedback is usually gathered in institutions of higher education to evaluate the teaching quality from the students’ perspective, using questionnaires administered at the end of the courses. These evaluations are useful to pinpoint the course strengths, identify areas of improvement, and understand the factors that contribute to students’ satisfaction. They are an important mechanism for improving the teaching and learning processes. However, there is little standardisation in how this kind of feedback is collected, analysed, and used, and their active use for improving the teaching and learning processes is low. Additionally, students are rarely asked if they consider that those aspects included in the questionnaires are really important; this information would allow relativizing students’ evaluations of teaching. This research proposes the use of importance-performance analysis (IPA) together with a student’s evaluation of teaching questionnaire as a tool for lecturers to collect, analyse, and interpret the data obtained from the student’s feedback. This work shows how using IPA lecturers can obtain a visual representation of what teaching attributes are important for their students, how important each attribute is, and how well the instructors performed on each attribute from their students’ point of view. The usefulness of this tool for lecturers to assess students’ evaluation of their teaching and to guide the course programming in higher education is shown. Keywords Students’ feedback . Evaluation of teaching . Importance-performance analysis . Higher education . Course programming. Acces: https://rdcu.be/b8Ka1
 
The proliferation of standardized testing and administrative statistics in compulsory education is embedded in the rise to prominence of quantified accountability as a mechanism of education governance. Numbers work by stripping away the contexts of their production and the granular and ambiguous detail of the phenomena they claim to represent. The article re-examines qualitative interviews collected during a completed international project that studied policies and practices of accountability reforms and quality evaluation in Russian school education to understand how actors involved in quantified accountability—from producers to users of large-scale assessments of learning outcomes and administrative statistical data—articulate measuring or being measured in a decontextualized manner. The article offers two theoretical contributions on the enactment of accountability policies. First, it shows the importance of analytical attention to the materiality of accountability policies, that is, their specifically numerical nature. Second, it proposes that the study of accountability enactment should address the question of how actors deal with the decontextualizing propensities of quantified accountability. I conclude that enactment of accountability is fuelled by quantified decontextualization and the diverse ways in which actors experience, make sense of and act upon it.
 
The diffusion of national standardized testing, large-scale survey assessments and the promotion of policies of self-evaluation are making large amounts of data on education systems available and transforming schools into collecting units for a notable range of educational, institutional and socioeconomic indicators. The datafication and related digital technologies for collecting, analysing, retrieving and displaying data activate, at least in principle, new spaces of visibility and forms of school data-based managerialism (Williamson, 2017). While the policy of transparency is oriented to the development and consolidation of data-based school governance (Selwyn in European Educational Research Journal, 15, 54–68, 2016), its implementation in practice remains an open question. It solicits the analysis of the enactment of school data infrastructures to understand their mobilization in the governance of schooling. Schools can align with digital technologies and data, or they can resist these in many ways. By drawing on a multi-sited ethnography on the development and consolidation of the digital governance of education in Italy (Landri, 2018), I will display how schools can align, imitate, and fabricate their data, use them partially and instrumentally, gaming, or opting-out from the current regime of accountability. These findings complexify a typology of resistance to the digitalization proposed by Souto-Otero and Beneito-Montagut (European Educational Research Journal, 15, 14–33, 2016). They trouble the either/or logic that presents ‘alignment’ and ‘resistance’ as they were different alternatives to underline the subtleties of the policy enactment of the data-based school governance. The investigation illustrates that the space of the school agency is not entirely lost: the destiny of the digital governance of education; in other words, is not inevitable. It draws attention to the singularity of the schools concerning the policies of digital accountabilities. The singularity is a capacity to react that depends ultimately on the sedimented circuits of knowledge. These latter ones orient in different ways how noticing, interpreting and drawing conclusions from data.
 
Across the globe, education quality has become synonymous with student performance. The shift towards test-based accountability (TBA) has changed what is required of schools and what it means to be a ‘good teacher’. Different tools may trigger a performance orientation within schools, from administrative (such as the Inspectorate) to market (schools competing for students). It is logical to assume that TBA policies will be interpreted and enacted differently in schools at different ends of the performance spectrum, and this, in turn will affect the expectations on teachers and the pressures they feel. Based on interviews with teachers (n = 15), principals (n = 4) and the school board (n = 1), this study compares the experiences of teachers in two ‘high’ and two ‘low’ performing primary schools under the same management in one Dutch city. Findings reveal that the schools respond differently to TBA, and are facing different performance pressures, yet in all four, test data was found to significantly shape educational practices. It was further found that teachers experience pressure in different ways; however, it cannot be said that those in high-performing schools experience less pressure compared to those in low-performing schools, or vice versa. Rather, teachers’ experience of pressure is more closely connected to their schools’ logics of action: the practices the schools adopted in response to accountability measures and their relative market position.
 
Ideal-type accountability model
The accountability system enacted in Berlin
The accountability system enacted in Thuringia
The paper proposes to study side effects of accountability in education within the theoretical framework of enactment research. The potential value of this approach for the study of side effects is shown by using the example of the side effect “Dependency on Expert Judgements.” Therefore, findings from the research project “Unintended Effects of Accountability in the School System” (acronym “Nefo”) are presented. The project entailed an analysis of policy documents to describe the accountability contexts under study, as well as a survey study with 2637 participating teachers and principals to examine the distribution of side effects in the no- and low-stakes contexts of the four German federal states of Berlin, Brandenburg, Thuringia, and Rhineland-Palatinate. The findings for Berlin and Thuringia are triangulated with the results from a qualitative in-depth group-discussion substudy of the Nefo project on how teachers in Berlin and Thuringia deal with accountability measures. The analysis of policy documents reveals that in Berlin, performance results gain the status of objective data that teachers need to improve their work, while teachers in Thuringia are prompted to judge by themselves the meaning of performance results for their teaching. This suggests that “Dependency on Expert Judgements” is a larger issue in Berlin than in Thuringia. However, contrary to what would be expected, the findings of the survey study indicate that the side effect plays a greater role in Thuringia than in Berlin. To explain this counterintuitive finding, teachers’ responses to standards-based accountability in Berlin and Thuringia are delineated. The paper shows that the study of meaning-making processes, which is central to the enactment framework, must not only be ignored if one tries to understand the processes that lead to the occurrence or absence of side effects. It is also important to prevent fallacies in the interpretation of survey data.
 
Performance-based accountability (PBA) policies are increasingly adopted in a wide range of education systems in order to reform school governance and to improve students’ results and schools’ performance. Countries around the world have been implementing national large-scale assessments to make school actors more accountable and responsible for students’ results. This policy model has been generalized in countries with different administrative traditions, including those with a short tradition in New Public Management. This is the case in Spain, where PBA has been adopted unevenly in different regions, with Madrid being one of the earliest adopters. In recent decades, Madrid has developed a model that combines administrative test-based accountability with a system of broad parental school choice, which also facilitates the activation of market forms of accountability. However, the combination and interaction between market and administrative forms of accountability is understudied. This paper adopts a policy enactment perspective to analyze, through a case study approach, the interaction of administrative and market forms of accountability and its enactment at the school level. The case study is based on a set of 41 semi-structured interviews with teachers, principals, and school inspectors in a sample of eight schools in Madrid, combined with document analysis of school educational projects and improvement plans. The evidence suggests that administrative and market forms of accountability tend to generate dynamics of interdependence, resulting in increasing external pressures which schools tend to address with superficial responses, including teaching to the test, or second-order competition between schools.
 
Overview of participating schools
Overview of three response patterns
In recent decades, performance-based accountability (PBA) has become an increasingly popular policy instrument to ensure educational actors are responsive to and assume responsibility for achieving centrally defined learning goals. Nonetheless, studies report mixed results with regard to the impact of PBA on schools' internal affairs and instructional practices. With the aim of contributing to the understanding of the social mechanisms and processes that induce particular school responses, this paper reports on a study that examines how Norwegian principals perceive, interpret, and translate accountability demands. The analysis is guided by the policy enactment perspective and the sociological concept of "reactivity", and relies on 23 in-depth interviews with primary school principals in nine urban municipalities in Norway. The findings highlight three distinct response patterns in how principals perceive, interpret, and translate PBA demands: alignment, balancing multiple purposes, and symbolic responses. The study simultaneously shows how different manifestations of two social mechanisms form important explanatory factors to understand principals' varying responses, while it is highlighted how the mechanisms are more likely to operate under particular conditions, which relate both to principals' trajectories and views on education, and to school-specific characteristics and the local accountability regime. The study contributes to the accountability literature by showing how, even in the relative absence of material consequences and low levels of marketization, standardized testing and PBA can drive behavioral change, by reframing norms of good educational practice and by affecting how educators make sense of core aspects of their work.
 
Despite the growing number of researches about performance-based accountability (PBA) in education, there is still scarce evidence on the mediating role of subjective variables (e.g., perceived pressure and alignment to PBA mandates) in the enactment of PBA in socially disadvantaged contexts. This is paradoxical because marginalized schools are usually those that are on probation and have to cope with the threat of sanctions more frequently. Existing investigations on PBA enactment have put increasing attention to the role of situated and material contexts, but there is still limited knowledge on how subjective variables can mediate policy enactment processes and enable the adoption of different school responses. To address these gaps, the article aims to explore how the perceived accountability pressure, the school performative culture and meaning-making processes at the school level are mediating the enactment of PBA policies in disadvantaged schools. At the theoretical level, the study is informed by sense-making and policy enactment frameworks. Methodologically speaking, the investigation uses a comparative case study approach based on two extreme cases, which have been selected on the basis of a factorial analysis that combines both survey and secondary data. The extreme cases represent two different scenarios, which, despite operating in similar situated contexts, are characterized by having opposite levels of perceived pressure and alignment with the performative culture. The case studies combine survey data (n=39) with documentary analysis and semi-structured interviews with the management team and teachers (n=7). The findings show that subjective variables, in interaction with other contextual factors, can exacerbate or inhibit PBA regulatory pressures, and trigger diverging school responses. Full-text view-only version of the paper: https://rdcu.be/cbdEv
 
Iris’s concept map
Sophie’s concept map
This paper examines staff’s enactment and perceptions of a continuous independent school self-evaluation (SSE) process implemented at a semi-private school network for the past decade. In light of research arguing SSE was perceived and used primarily as a self-inspection or self-regulation tool emphasizing accountability goals; this case suggests the promise of engaging in SSE that the staff perceives as positive and aimed towards their school’s improvement. Findings reported in this work are based on analyzing Concept Structuring Analysis Task (ConSAT) interviews in which participants created their own concept maps, and participant observation of a two-year-long SSE process. This paper identifies three organizational mechanisms that facilitated a sustainable improvement-oriented SSE: the role of the evaluator, pooling resources through network structure, and the way the network uses evaluation data. These findings yield implications for (a) research on the enactment of sustainable SSE and (b) implementation of SSE that balances accountability and improvement goals.
 
Data-driven decision-making (DDDM) refers to the process of using data to inform educational decisions. Due to DDDM’s positive effects on student achievement and the pressure for educational accountability, DDDM has become a recent focus of numerous educational policies. However, few teachers fully utilize DDDM. While, broadly, DDDM may use various types of data to make different types of decisions, the current study focuses on the use of formative assessment data to guide instructional adaptations. This study serves as an elicitation study to explore teachers’ perceptions of DDDM, illuminating both facilitating and inhibitory factors affecting assessment practices. The Theory of Planned Behavior was applied as a theoretical framework, which suggests that individuals’ behaviors can be explained by their attitudes, perceptions of social norms, and perceived behavioral control. Nine elementary teachers from Indiana (the USA) participated in focus groups. The findings indicated teachers (a) had positive thoughts (e.g., helpful) but negative feelings (e.g., stressful) about DDDM, (b) were highly impacted by their schools’ culture of assessment, and (c) had mixed perceptions about their capacity and autonomy in conducting DDDM. These findings will be used to develop a quantitative instrument for future research. Furthermore, these findings can be used to support educational leaders’ efforts to provide better professional development and to facilitate more supportive school environments to ensure teachers can successfully implement DDDM practices.
 
School improvement research has insufficiently considered the importance of intervening in schools with declining academic performance. Fields such as engineering and medicine have prioritized predicting decline to save structures or patients before they are in peril. Unfortunately, in education, school improvement policies and interventions are only enacted once schools reach low levels of academic performance. In this study, we apply sophisticated statistical models to analyze more than 10 years of longitudinal student achievement data in English/language arts and mathematics in the US state of Texas. We find that a considerable number of schools consistently decline over time. Some significant predictors of decline included shifting student demographics and changes in the percent of economically disadvantaged students. Higher starting percentages of students labeled as English language learners also increased the likelihood of decline, but increasing percentages of English language learners over timereduced the rate of decline. Leadership stability also appears to be important to impeding decline. We close by discussing implications for research, policy, and practice.
 
Top-cited authors
Stephan Gerhard Huber
  • Institute for the Management and Economics of Education
Christoph Helm
  • Johannes Kepler University Linz
Joseph Murphy
  • Vanderbilt University
Sabine Ogrin
  • Technische Universität Darmstadt
Anne Roth
  • Technische Universität Darmstadt