Article

Developing an Internationally Comparable Balanced Assessment System That Supports High-Quality Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Created by Educational Testing Service (ETS) to forward a larger social mission, the Center for K – 12 Assessment & Performance Management has been given the directive to serve as an independent catalyst and resource for the improvement of measurement and data systems to enhance student achievement.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It refers to all the activities performed by teachers and students, which provide information that can be used as feedback to adapt the teaching and learning activities that it relates to. A variety of assessment tools can be used for formative assessment purposes (Darling-Hammond and Pecheone 2010;Wiliam 2006). The tool must fit the type of assignment that the students perform. ...
Article
This study sought to delineate the implementation of Problem-Based Learning (PBL) and peer- and self- assessment in a teacher training programme. This intervention was accompanied by measuring the participants’ perceptions of the PBL environment and the assessment methods used compared with those of other courses they were previously enrolled in. Another aim was to reveal the most effective perceived PBL constructivist activities in enhancing the assessment methods. Data were gathered from 61 second-year students in a M.Ed. study track by the Constructivist Learning in Higher Education Settings Questionnaire (CLHES), the Peer- and Self- Assessment Questionnaire, and reflective journals. Quantitative results have shown that the PBL related activities were more pronounced in the designed intervention than in previous courses the students were enrolled in. Large effect size results were found for the perceptions of the peer- and self- assessment factors. Social interaction was found connected to the peer assessment variable; whereas Cooperative dialogue was related to the self-assessment construct. Analysis of the students’ reflective journals revealed three key categories: 1. Knowledge and lifelong learning skills; 2. Social and cultural aspects of joint learning; 3. Perceptions of peer assessment, self-assessment and instructor’s assessment. Implications, limitations, and suggestions for future studies are discussed.
... For example, in 2001 an influential committee of the National Research Council advanced, in a seminal report, Knowing What Students Know, an ambitious vision for comprehensive assessment systems that specifically included formative assessment. This vision has been taken up by others in the United States and formative assessment is routinely discussed together with summative and interim/benchmark assessment in conceptions of 'balanced assessment systems' (see, for example, Darling-Hammond and Pecheone, 2010;Stiggins, 2006Stiggins, , 2008. ...
Article
This paper discusses the emergence of assessment for learning (AfL) across the globe with particular attention given to Western educational jurisdictions. Authors from Australia, Canada, Ireland, Israel, New Zealand, Norway, and the USA explain the genesis of AfL, its evolution and impact on school systems, and discuss current trends in policy directions for AfL within their respective countries. The authors also discuss the implications of these various shifts and the ongoing tensions that exist between AfL and summative forms of assessment within national policy initiatives.
... There has also been substantial interest in formative assessment in the USA during recent years, with increasing numbers of educators regarding it as a means of not only improving student learning, but also as a means of increasing student scores on significant achievement examinations. The extent of this attention is manifested by its inclusion in notions of 'balanced assessment systems' (see, for example, Darling-Hammond & Pecheone, 2010;Stiggins, 2008), which proposes a system that integrates curriculum and assessments, both formative and summative, and is designed to support higher quality, coherent instruction. To promote the use of formative assessment in the USA, the Council of Chief State School Officers formed the Formative Assessment Advisory Group in 2006 to expand the implementation of formative assessment in the classroom. ...
Article
The notion that future performance can be affected by information about previous performance is often expressed in terms of ‘closing the gap’. Feedback has long been recognised as a mechanism through which teaching and learning may be influenced. The current wave of support in the United Kingdom for assessment for learning echoes these sentiments. This paper examines the feedback strategies employed by two experienced literacy practitioners in England. Using data gathered from field observations, interviews and documentary sources, the paper presents evidence of espoused practice associated with feedback, demonstrating that whilst teachers may claim that they make effective use of some feedback strategies to support pupils’ learning and motivation, that this is not supported by empirical data. The paper also identifies that whilst some teachers aim to mark every piece of pupils’ written work for perceived motivational benefits; such a strategy can undermine pupils’ intrinsic motivation and lead to a culture of over-dependency, whereby the locus of control with regard to feedback lies solely with the teacher. The paper concludes by exploring some possible implications for practice with regard to the provision of written feedback in particular.
... For example, a state choosing to administer two off-the-shelf tests where each test measures a mathematics construct may combine the two tests so that the mathematics construct is more fully assessed. Moving further still, is the notion of the comprehensive assessment discussed byDarling-Hammond and Pecheone (2010). Here students are administered tests where each test measures a specific portion of the overall construct Among its many practical uses in testing applications, IRT is a tool often used in the augmentation of standardized tests. ...
... In order to meet the educational demands of an evolving society, new comprehensive assessment systems have been implemented around the world with positive results (Darling-Hammond & Pecheone, 2010). For example, in countries such as the United Kingdom, Singapore, Hong Kong and Australia, the assessment system focuses on central concepts in the disciplines and students' using knowledge to solve authentic problems, and demonstrating higher order skills. ...
Article
Full-text available
This paper describes the characteristics of games and how they can be applied to the design of innovative assessment tasks for formative and summative purposes. Examples of current educational games and game-like assessment tasks in mathematics, science, and English language learning are used to illustrate some of these concepts. We argue that the inclusion of some aspects from gaming technology may have a positive effect in the development of innovative assessment systems (e.g., by supporting the development of highly engaging assessment tasks). However, integrating game elements as part of assessment tasks is a complex process that needs to take into account not only the engaging or motivational aspects of the activity but also the quality criteria that are needed according to the type of assessment that is being developed.
Article
Full-text available
Universal primary education continues to be a major issue in developing countries, including Pakistan. The purpose of this study was to analyze the perceptions of teachers and headteachers about the context of primary education in Khyber Pakhtunkhwa. The objectives of this study were: (a) to analyze the perceptions of teachers and headteachers about the objectives of primary education in Khyber Pakhtunkhwa; (b) to examine the perceptions of teachers and headteachers about the context of primary education in terms of the quality improvement in Khyber Pakhtunkhwa; (c) to study the perceptions of teachers and headteachers about the context of primary education in terms of character building and skill development in Khyber Pakhtunkhwa. Five hundred and six (506) respondents i.e. teachers and headteachers were selected from five districts (out of a total of 26 districts of Khyber Pakhtunkhwa) using a proportionate stratified random sampling technique. Data was collected using a questionnaire. The statistics used for the analysis of collected data included frequency, mean, percentage, and chi-square. The results indicate that learners are mostly made to memorize and learn through rote instead of developing them as independent learners. In addition, the curriculum is not updated, and teachers are not properly qualified to teach according to the needs of students for conceptual learning.
Article
Students’ social identity is considered to have profound effects on their learning and academic performance, but there was a lack of research on whether students’ social identity would also influence their aspirations to pursue STEM careers. This research studied the associations between the social identity of students and their aspirations to engage in STEM careers. Nearly 4,000 Hong Kong students from three key learning stages participated in a survey related to their STEM aspirations. Data analysis using multiple regression and structural equation modelling showed that local and national identity were predictive of STEM career aspirations. Moreover, students’ sense of contribution to the nation and local society, as well as the perceived value of STEM professionals were mediators of the impact of their social identity on their STEM career aspirations. The implications provide insights for policymakers and educators to formulate appropriate curricula benefiting students’ social identity development as well as their STEM aspirations.
Article
Approaches to test score use and test purpose lack the well-developed methodological guidelines and established sources of evidence available for intended score interpretation. We argue in this paper that this lack fails to reflect the ultimate purpose of a test score—to help solve an important problem faced by intended test users. We explore the treatment of intended test purpose and test score use under the chain of assumption/inferences perspective identified within an argument-based approach to validity. Next, we revisit the notion of test score use and argue that, at least for classroom assessments based on complex constructs, such as learning progressions in math and science, test score use can be more effectively conceptualized as part of a potential solution to solving a problem, or “job-to-be-done.”. We argue for shifting from the definition of validity to the concept of effectiveness. Finally, we illustrate an argument- based approach to test score effectiveness by contrasting effectiveness arguments for interim assessments based on a conventional test blueprint or a test blueprint augmented with learning progressions.
Chapter
To do things the way they were done in the past will not help our youngsters face our changing times. The ability to recognize, choose, and invent options, the ability to decide to learn how to learn—all this will better equip our youngsters for the future.
Article
Full-text available
With increasingly tight budgets, many public school districts lack research personnel to evaluate program efficacy or investigate best practices that raise student achievement. We highlight an example of a successful university-district partnership that offers district-driven research support while providing opportunities for practitioner-scholars to learn first-hand how to perform rigorous evaluation work. This article details the Early Kindergarten Transition program evaluation study conducted by a university-district partnership as well as testimony from district leadership on the utility of the research deliverables and long-term benefits of the research collaboration.
Chapter
The purpose of this study is to describe the perception of teachers in Pasir Puteh, Kelantan with regard to the school-based assessment (SBA). This study attempts to identify how our quality of work life, flexible learning, and balance between academics and nonacademic jobs predict the perception of process improvement. Besides, this study also meant to identify the most contributing factor that may influence the perception of teachers toward this system. A total of 335 respondents were selected with a questionnaire as an instrument to collect the data. Out of the three independent variables, it was found that flexible learning and balance in academics and nonacademics jobs were significant in explaining the perception of process improvement. Moreover, flexible learning also found to be the greatest influence predictor to the perception of process improvement at p-value of 0.000 (β = 0.654). The Ministry of Education should encourage the teachers to use their own strategy to teach and educate the students according to their personal creativity and flexibility in implementing this system; hence, it may improve the level of education in this country.
Chapter
At a time when accountability for student performance continues to be a central theme in education reform policy, literacy in student assessment is considered key, if not indispensable for successful educational leaders at every administrative level. Drawing from a wide range of studies published in the last decade on the links between student assessment and educational leadership, three overarching themes are demonstrated to provide a framework for understanding assessment literacy for educational leaders. The ‘Triple-A Model’ proposes three intersecting points to reflect the complex construction of assessment: aims, approach, and accountability. First, today’s leader understands that the aim of educational assessment is no longer straightforward but encompasses a wide variety of purposes that are often confused and poorly understood. Second, an intentional, knowledgeable, and visionary approach to leadership is shown to be a key factor in the quality of instruction that occurs in classrooms. Third, accountability is not only a means for communication between schools and the public, but the result of increasing demands by society and its governments to know and understand what actually happens with students in classrooms.
Article
Full-text available
This article reports on the collaboration of six states to study how simulation-based science assessments can become transformative components of multi-level, balanced state science assessment systems. The project studied the psychometric quality, feasibility, and utility of simulation-based science assessments designed to serve formative purposes during a unit and to provide summative evidence of end-of-unit proficiencies. The frameworks of evidence-centered assessment design and model-based learning shaped the specifications for the assessments. The simulations provided the three most common forms of accommodations in state testing programs: audio recording of text, screen magnification, and support for extended time. The SimScientists program at WestEd developed simulation-based, curriculum-embedded, and unit benchmark assessments for two middle school topics, Ecosystems and Force & Motion. These were field-tested in three states. Data included student characteristics, responses to the assessments, cognitive labs, classroom observations, and teacher surveys and interviews. UCLA CRESST conducted an evaluation of the implementation. Feasibility and utility were examined in classroom observations, teacher surveys and interviews, and by the six-state Design Panel. Technical quality data included AAAS reviews of the items' alignment with standards and quality of the science, cognitive labs, and assessment data. Student data were analyzed using multidimensional Item Response Theory (IRT) methods. IRT analyses demonstrated the high psychometric quality (reliability and validity) of the assessments and their discrimination between content knowledge and inquiry practices. Students performed better on the interactive, simulation-based assessments than on the static, conventional items in the posttest. Importantly, gaps between performance of the general population and English language learners and students with disabilities were considerably smaller on the simulation-based assessments than on the posttests. The Design Panel participated in development of two models for integrating science simulations into a balanced state science assessment system. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 363–393, 2012
Book
Value-added methods refer to efforts to estimate the relative contributions of specific teachers, schools, or programs to student test performance. In recent years, these methods have attracted considerable attention because of their potential applicability for educational accountability, teacher pay-for-performance systems, school and teacher improvement, program evaluation, and research. Value-added methods involve complex statistical models applied to test data of varying quality. Accordingly, there are many technical challenges to ascertaining the degree to which the output of these models provides the desired estimates. Despite a substantial amount of research over the last decade and a half, overcoming these challenges has proven to be very difficult, and many questions remain unanswered--at a time when there is strong interest in implementing value-added models in a variety of settings. The National Research Council and the National Academy of Education held a workshop, summarized in this volume, to help identify areas of emerging consensus and areas of disagreement regarding appropriate uses of value-added methods, in an effort to provide research-based guidance to policy makers who are facing decisions about whether to proceed in this direction. © 2010 by the National Academy of Sciences. All rights reserved.
Chapter
This chapter addresses design, scoring, and psychometric advances in performance assessments that allow for the evaluation of twenty-first century skills. It begins with a discussion of advances in the design of performance assessments, including a description of the important learning outcomes that can be assessed by performance assessments, and not by other assessment formats. Then, the chapter discusses advances in the scoring of performance assessments, including both the technical and substantive advances in automated scoring methods that allow for timely scoring of student performances to innovative item. Next, it addresses issues related to the validity and fairness of the use and interpretation of scores derived from performance assessments. The type of evidence needed to support the validity of score interpretations and use, such as content representation, cognitive complexity, fairness, generalizability, and consequential evidence, is discussed. Finally, the chapter addresses additional psychometric advances in performance assessments, including advances in measurement models.
Article
Current educational policy debate in the US centres on educational standards and their possible role as primary motors of reform. A striking feature of the policy debate is its international character-with the question of the comparability of educational standards in the USA to those of its economic competitors becoming a primary concern of policy-makers, curriculum designers and other major actors in curriculum governance policy. Defining what constitute 'world-class' standards remains problematic. The US Research Centre for the Third International Mathematics and Science Study (TIMSS) has designed research to measure and portray standards in order to make empirical benchmarking studies possible. Using data from the USA and 21 high-achieving countries that participated in the study of national samples of curriculum documents and textbooks conducted as a component of the TIMSS, we explore two questions: What are the expectations regarding the performance of students held in the educational systems of the countries that perform among the best in the world?, and How do academic standards in the USA compare to the expectations of this select group of countries?
Article
In recent years, US curriculum policy has emphasized standards‐based conceptions of curricula in mathematics and science. This paper explores the data from the Third International Mathematics and Science Study (TIMSS) to argue that the presence of content standards is not sufficient to guarantee curricula that lead to high‐quality instruction and achievement. An examination of the content topics covered in each grade of a group of six of the highest‐achieving TIMSS countries in mathematics and science shows a pattern in which new topics are gradually introduced, are a part of instruction for a few grades, and then often leave the curriculum as separate topics. This contrasts sharply with mapping of topics in the various US national standards in mathematics and science. Topics enter and linger, so that each grade typically devotes instructional attention to many more topics than is typical of the six high‐achieving countries; in addition, each topic stays in the curriculum for more grades than in the high‐achieving countries. An examination of mathematics and science content standards from 21 states and 50 districts in the US shows a pattern more like that of the US national standards than those of the high‐achieving TIMSS countries. While content standards have become integral to US curriculum development and reform, they have yet to reflect the coherence that is typical of countries that achieved significantly better than the US in the TIMSS study.
Chapter
Why are policies not implemented as planned? Why are classroom practices so hard to change? The “implementation problem” was discovered in the early 1970’s as policy analysts took a look at the school level consequences of the Great Society’s sweeping education reforms. The 1965 passage of the Elementary and Secondary Education Act (ESEA), with its support for compensatory education, innovation, strengthened state departments of education, libraries and, subsequently, bilingual education, signaled the substantive involvement of the federal government in local educational activities. ESEA’s comprehensive intergovernmental initiatives meant that implementation no longer was just primarily a management problem, confined to relations between a boss and a subordinate, or an administrator and a teacher, or even to processes within a single institution. Implementation of the Great Society’s education policies stretched across levels of government — from Washington to state capitals to local districts and schools — and across agents of government-legislative, executive, administrative. As federal, state and local officials developed responses to these new education policies, implementation issues were revealed in all their complexity, intractability, and inevitability.
Do graduation tests measure up? A closer look at state high school exit exams: Executive summary
  • Achieve
Achieve. (2004). Do graduation tests measure up? A closer look at state high school exit exams: Executive summary. Washington, DC: Author.
Standards for psychological testing
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for psychological testing. Washington, DC: American Educational Research Association.
Commonwealth accountability testing system 2007-08 technical report
  • Kentucky Commonwealth Of
Commonwealth of Kentucky. (2009). Commonwealth accountability testing system 2007-08 technical report. Kentucky Department of Education and Measured Progress.
School‐based assessment: Changing the assessment culture
  • Hong Kong Education
  • Assessment Authority
Hong Kong Education and Assessment Authority. (2009). School‐based assessment: Changing the assessment culture. Retrieved October 1, 2009, from Retrieved from http://www.hkeaa.edu.hk/en/sba/.
The creation and use of instructional resources: The puzzle of professional development (Doctoral dissertation)
  • A Jaquith
Jaquith, A. (2009). The creation and use of instructional resources: The puzzle of professional development (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3364072)
Remarks by the President to the Hispanic Chamber of Commerce on a complete and competitive American education. Retrieved from http
  • B Obama
Obama, B. (2009, March 10). Remarks by the President to the Hispanic Chamber of Commerce on a complete and competitive American education. Retrieved from http://www.whitehouse.gov/the_press_office/Remarks‐of‐the‐President‐to‐the‐United‐States‐ Hispanic‐Chamber‐of‐Commerce.
Programme for international student assessment
Organisation for Economic Co‐operation and Development. (2007). Programme for international student assessment 2006. Paris, France: Author.
A new conceptual framework for analyzing the costs of performance assessment
  • L Picus
  • F Adamson
  • W Montague
  • M Owens
Picus, L, Adamson, F., Montague, W., & Owens, M. (2010). A new conceptual framework for analyzing the costs of performance assessment. Stanford, CA: Stanford Center for Opportunity Policy in Education.
Project work assessment document
Singapore Examinations and Assessment Board. (2009). Project work assessment document. Singapore: Author.
An assessment system for the United States: Why not build on the best?
  • M Tucker
Tucker, M. (2010). An assessment system for the United States: Why not build on the best? Washington, DC: National Center for Education and the Economy.
Remarks by the President to the Hispanic Chamber of Commerce on a complete and competitive American education
  • B Obama
Obama, B. (2009, March 10). Remarks by the President to the Hispanic Chamber of Commerce on a complete and competitive American education. Retrieved from http://www.whitehouse.gov/the_press_office/Remarks-of-the-President-to-the-United-States-Hispanic-Chamber-of-Commerce.
The differential effects of time on accommodated vs. unaccommodated content assessments for English language learners
  • M Pennock-Roman
  • C Rivera
Pennock-Roman, M., & Rivera, C. (2007). The differential effects of time on accommodated vs. unaccommodated content assessments for English language learners. Houston, TX: Center on Instruction.
Preparing teachers for a changing world: What teachers should learn and be able to do
  • L Shepard
  • K Hammerness
  • L Darling-Hammond
  • F Rust
  • J With Baratz Snowden
  • E Gordon
  • Gutierrez
  • A Pacheco
Shepard, L., Hammerness, K., Darling-Hammond, L., & Rust, F., with Baratz Snowden, J., Gordon, E., Gutierrez, & Pacheco, A. (2005). Assessment. In L. Darling-Hammond & J. Bransford (Eds.), Preparing teachers for a changing world: What teachers should learn and be able to do (pp. 275-326). San Francisco, CA: Jossey-Bass.