Studies In Educational Evaluation

Published by Elsevier
Print ISSN: 0191-491X
Publications
After delineating the major rationale for computer education, data are presented from Stage 1 of the IEA Computers in Education Study showing international comparisons that may reflect differential priorities. Rapid technological change and the lack of consensus on goals of computer education impedes the establishment of stable curricula for “general computer education” or computer literacy. In this context the construction of instruments for student assessment remains a challenge. Seeking to anticipate and measure what educators will view as the essential computer-related abilities for students in the mid-1990s, the second stage of the IEA Computers in Education Study developed a student assessment instrument grounded in the perspective of “functionality,” student prerequisites to functioning effectively with practical information-related tasks. The threat of test obsolescence as well as philosophical differences among the experts in their goals for general computer education challenged traditional test construction procedures. The resulting content objectives and test procedures can serve as guideposts for research and planning in computer education.
 
The article examines the processes and challenges involved in conducting participatory research and evaluation in schools, by looking at a case study of a collaborative evaluation research that was conducted in a secondary school in Israel by two researchers and three teachers. It describes the experiences of the research team while conducting the study, the processes that developed during this collaboration, and the kinds of knowledge that emerged over the two years of teamwork. It also describes the features of the relationships between the various individuals who participated in the study. Implications for evaluation in schools are presented.
 
This article reports findings obtained from a large-scale national study (299 schools; 2761 students) that examined academic achievements of immigrants in Israeli schools. It focused on two distinct groups of immigrant students – those from the former USSR and from Ethiopia, in two subject areas – mathematics and academic language (Hebrew), and in three grade levels – 5, 9 and 11. The scores of the immigrant students and those of a parallel group of native-borns were compared and analyzed. The findings demonstrate differences in achievements between the groups. The scores also demonstrate that immigrants require a substantial number of years to reach achievement levels similar to those of students who were born in Israel in academic subjects, specifically, 5–7, 9 or 11 years in mathematics, and 5–7, 8 or 11 years in academic Hebrew, depending on the grade levels and the groups. The study discusses the implications of using large-scale evaluation of educational achievement for educational policy and evaluation designs.
 
This study illustrates the development and validation of an admission test, labeled as Performance Samples on Academic Tasks in Educational Sciences (PSAT-Ed), designed to assess samples of performance on academic tasks characteristic of those that would eventually be encountered by examinees in an Educational Sciences program. The test was based on one of Doyle's (1983) categories of academic tasks, namely comprehension tasks. There were 108 examinees who completed the test consisting of nine comprehension tasks. Factor analysis indicated that the test is basically unidimensional. Furthermore, generalizability analysis indicated adequate reliability of the pass/fail decisions. Regression analysis then showed that the test significantly predicted later academic performance. The implications of using performance assessments such as PSAT-Ed in admission procedures are discussed.
 
In general, studies on gender and mathematics show that the advantage held by boys over girls in mathematics achievement has diminished markedly over the last 40 years. Some researchers even argue that gender differences in mathematics achievement are no longer a relevant issue. However, the results of the Trends in Mathematics and Science Study of 2003 (TIMSS-2003), as well as the participation rates of girls in (advanced) mathematics courses, show that in some countries, such as the Netherlands, gender equity in mathematics is still far from a reality. Research on gender and mathematics is often limited to the relationship between gender differences in attitudes toward mathematics and gender differences in mathematics achievement. In school effectiveness research, theories and empirical evidence emphasize the importance of certain school and class characteristics (e.g., strong educational leadership, safe and orderly learning climate) for achievement and attitudes. However, there is little information available at to whether these factors have the same or a different influence on the achievement of girls and boys. This study used the Dutch data from TIMSS-2003 to explore the relationship between school- and class characteristics and the mathematics achievement and attitudes for both girls and boys in Grade 4 of the primary school. The explorations documented in this paper were guided by a conceptual model of concentric circles and involved multilevel analyses. Interaction effects with gender were assessed for each influencing factor that turned out to have a significant effect. The results of these analyses provide additional insight into the influence that non-school-related and school-related factors have on the mathematics achievement and attitudes of girls and boys.
 
During the 1990s, Spain experienced a phenomenon that is slightly unusual in educational systems: the coexistence of two secondary education models and one single tertiary education model. The overall purpose of this article has been to study the possible differences in the academic performance of students in university education in terms of the model followed in secondary education. We carried out a comparative monitoring of various cohorts of students in the Universities of Barcelona, Oviedo, the Basque Country, Salamanca and Zaragoza (Spain). These institutions are located in regions of differing size and nature, and are representative of the Spanish university system as a whole. In all, we analysed the academic performance of nearly 150,000 university students. The main conclusion we reached is that there are no differences in student academic performance at university in terms of the secondary education system followed.
 
The studies reviewed here support the proposition that more accurate and informative measures of the home educational environment are possible than the status characteristics typically used. The stability of SES measures and the ease with which this type of data may be collected have promoted the continued use of static variables as measures of family background. But as we have indicated in this review, studies have shown that what parents do rather than who they are is the more important determinant of the home's influence on the child's achievement. Thus, the emphasis is on the process variables in the home, although the static variables are still relevant to gaining a complete picture of the home environment and its influence. The home-based intervention studies further support these results as well as the alterability of process characteristics. The alterability of home processes is an important factor in moving away from the restrictions of prediction and classification imposed by the traditional static measures of the home environment.
 
This study investigates how different types of prior knowledge influence student achievement and how different assessment measures influence the observed effect of prior knowledge. We introduce a model of prior knowledge that distinguishes between different types of prior knowledge and uses different assessment measures to assess different types of knowledge. The sample consists of 202 mathematics students who completed the prior knowledge test during the first lesson. The student achievement was measured by the final grade on the course. The results indicate that the type of prior knowledge makes a difference: The measures assessing procedural knowledge predicted the final grades best whereas measures assessing declarative knowledge did not predict final grades. Additionally, previous study success was the best predictor of student achievement. These results are discussed in relation to assessment measures and their implications for practice.
 
Opportunity to learn is considered an important contributing factor in learning outcomes. In some of the latest international comparative studies of mathematics achievement, such as SIMS and TIMSS, painstaking efforts have been made to find out what the participating students' opportunities to learn mathematics had been. However, there have been problems with relating the findings to student outcomes. This article approaches opportunity to learn in three different ways. Among these approaches, an item-based analysis of textbook contents resulted in fairly high correlations with student performance at the item level in TIMSS 1999. This implies that even a quite simple analysis of textbooks can produce valuable information when looking for explanations for student achievement in mathematics.
 
The purpose of the present study was to improve a multivariate multilevel model in the research literature which estimates the consistency in the rates of growth between mathematics and science achievement among students and schools. We introduced a new multivariate multilevel model via a latent variable approach. Data from the Longitudinal Study of American Youth (LSAY) provided scores on basic skills, algebra, geometry, and quantitative literacy as indicators of the latent variable mathematics achievement, and scores on biology, physics, and environmental science as indicators of the latent variable science achievement. Using this multivariate multilevel model with latent variables, we examined the relationship between growth in mathematics and science achievement during middle and high school among students and schools, and we demonstrated that such a model was more sensitive to this relationship.
 
The topic of teacher credentials and student performance is revisited in an international setting using the TIMSS-99 data. The lack of consistent positive link between credentials and performance can be explained via three routes: measurement problem of “teacher quality” input, measurement problem of “student outcomes”, and the production function form that is assumed to link the input and the output. Although there is a small literature focusing on student outcome measurement problems, suggesting the use of cognitive achievements rather than test scores, in most cases those cognitive measures are nothing but math and science scores. This study contributes to the literature by borrowing from the measurement and psychometrics theories to decompose single scores into three categories of cognitive abilities. The hypothesis is that teachers may play a crucial role in the development of some student cognitive skills while not in the others. Using a “rule-space” model, this article identifies three cognitive skills: the process skill, the reading skill and the mathematical think skill. The study finds that: (1) in general teacher credentials have no effect on any type of cognitive skill development as well as on the test score, and (2) the within-teacher variance of student performance is much larger than between-teacher variance in Japan and Korea, whereas the reverse is true in the US and the Netherlands. The phenomenon of “private tutoring” is quoted as an explanation of this pattern.
 
Investigations of children's learning contexts have typically shown that, in relation to family environments, school measures have weak associations with children's achievements. Much of the research is restricted, however, by an inadequate conceptualization of both contexts or, if one of the environments is assessed by proximal social-psychological variables, then the other is generally defined by gross social indicators. In the present study a selective evaluation of prior environmental investigations is presented. Then two analyses are considered which attempt to overcome some of the restrictions of previous research. In the two studies regression surfaces are plotted to examine relations between school environments and children's achievements at different family environment levels, with both environments defined by proximal social-psychological variables.
 
One of the strongest traditions during the past decade of classroom environment research has involved investigation of the predictability of students' cognitive and affective learning outcomes from their perceptions of psychosocial characteristics of their classrooms. Moreover numerous research programs involving many thousands of students from various nations have provided convincing and consistent support for the incremental predictive validity of student perceptions in accounting for appreciable amounts of variance in learning outcomes beyond that attributable to initial student characteristics such as pretest performance and general ability (see Walberg, Singh and Rasher, 1977; Haertel, Walberg and Haertel, 1979; Walberg, 1979; Fraser, 1980; Walberg and Haertel's paper in this issue).
 
This paper investigates the usefulness of a verbal protocol approach in examining the underlying construct of a cloze test, i e. the reasons that a test writer had for deleting some lexical items from a passage to construct a cloze test. The informants were asked to “think aloud” while they were doing the cloze test. Observation of the informants verbalising their thoughts revealed inadequacies in using this approach. To compensate for this inadequacy, retrospective interviews in which the informants were asked about their choices after their verbal protocols, were conducted. The analyses of the informants' think-aloud and their retrospection showed that in their think-aloud they could not verbalize all the mental processes they used in taking the test. The results, however, suggest verbal protocols as useful instruments for collecting a particular type of data which is inaccessible when using other approaches.
 
Large scale surveys in education have to face non-response issues that might bias the results. Non-response can occur at three levels: (i) a school refuses to participate, (ii) a sample student fails to participate and (iii) a participating student refuses to answer a particular question. Until now schools and student non-response have been counterbalanced by a non-response weight adjustment. This assumes that both the school and student non-respondents have similar characteristics to the school and student respondents, respectively, within classes. In this article results of analyses conducted on the Student Tracking Form data of the OECD/PISA 2000 survey are presented. The non-randomness of student absenteeism or refusal is demonstrated. Then, a simulation that compares the relative efficiency of the student weight adjustment with a multiple imputation method is presented where the superiority of the multiple imputation method, in particular for educational systems with a small school variance is shown. Finally, the multiple imputation method applied to some PISA 2000 countries identifies biases that become substantial in a longitudinal perspective.
 
Data from the Health Behavior in School-Aged Children Study (Currie, Samdal, Boyce, & Smith, 2001) were used to analyze the differences in perceptions of educational experiences among over 10,000 sixth to tenth graders of different grades, genders, races and ethnicities. The relationships between students’ evaluations of their school experiences and their perceptions of their achievement were also examined. The results indicated that older students tended to feel more negative about their educational experience than younger students. Male students tended to have more negative attitudes than female students. African American students reported more negative evaluations of their school environment, but tended to report liking school more. Perceptions of achievement were associated significantly with liking of school and with perceptions of teacher caring.
 
This article employs the data collected during 1970 in Australia as part of the International Association for the Evaluation of Educational Achievement (IEA) Six Subject Survey, in a model which posits that affective factors mediate the influence of student background on performance at high school. The data used is that for Population II (students between the ages of 14.0 and 14.11) and is augmented by follow-up data on the same students two years later. This second data set was collected by the Australian Council for Educational Research and includes information on students still at school and those who had left school. Only those students who were at school at the time of both surveys are included in the analysis. An analysis of the data on school leavers has recently been published by Rosier (1978). As Australia collected data on only one of the six subjects in the survey program, this analysis is limited to performance in that subject-science.
 
Given South Africa's divided past, it is imperative to improve educational outcomes to overcome labour market inequalities. Historically white and Indian schools still outperform black and coloured schools in examinations, and intraclass correlation coefficients (rho) reflect far greater between-school variance than for other countries.SACMEQ's rich data sets provide new possibilities for investigating relationships between educational outcomes, socio-economic status (SES), pupil and teacher characteristics, and school resources and processes. As a different data generating process applied in affluent historically white schools (test scores showed bimodal distributions), part of the analysis excluded such schools, sharply reducing rho. Test scores were regressed on various SES measures and school inputs for the full and reduced sample, using survey regression and hierarchical (multilevel or HLM) models. This shows that poor schools were least able to systematically overcome inherited socio-economic disadvantage. Schools diverged in their ability to convert inputs into outcomes, with large random effects in the HLM models. Outside of the richest schools, SES had only a mild impact on test scores, which were quite low in SACMEQ context.
 
One of the most important components of teacher education is the practical part, the Practicum, and assessment of the candidates’ performance plays a major part in forming the future generation of teachers. Little is known about the extent of agreement between the two main actors in the Practicum, the candidates and the school-based teacher educators. The aim of this paper is to add information about a rather blurred area of assessment in teacher education. The findings indicate there is a considerably extent of disagreement about assessment in the Practicum between the mentors and the candidates. It is suggested that instead of seeing the disagreements merely as obstacles to valid assessment, they can be exploited to initiate professional learning for the candidates.
 
This article presents a method for computer-aided tutor evaluation: Bayesian Networks are used for organizing the collected data about tutors and for enabling accurate estimations and predictions about future tutor behavior. The model provides indications about each tutor's strengths and weaknesses, which enables the evaluator to exploit strengths to the benefit of the university and offer advice for tutors’ improvement. It also allows the evaluator to make hypotheses about potential tutor approaches and test the effect of such approaches on the educational procedure in advance. The article briefly discusses Bayesian Networks and introduces a model that has been used at the Hellenic Open University for aiding tutor evaluation.
 
Text anxiety research in West German schools is explored and measures currently used are described. Test anxiety is examined in relation to academic achievement research and theory. Distinctions between state and trait anxiety and between worry and emotionality are discussed. Test anxiety is investigated both in the traditional tripartite school system as well as in the non-traditional unitary comprehensive school. Results of several longitudinal studies point to the findings that there are socialization effects as explained by reference group theory and that these effects also are specific to the particular learning environment which the students experience in each school system. Future directions of test anxiety research in West German schools are explored.
 
From this evidence, it is clear that tutors do make assessments of personality during interview, and that this assessment determines selection. In general, their responses on the forms show that this is done with some care. Tutors try to put candidates at ease and give every opportunity for them to display their best; for example a number of forms make reference to initial nervousness of candidates, but this made no difference to the final outcome and ‘nervous’ is only included in the categories if it refers to a specific and persistent quality (e.g., ‘too nervous a disposition’). Although offset by unfavourable judgments and often qualified, all rejected candidates had something positive said about them. In general, the remarks about rejected candidates are fuller and more individual than those for accepted candidates.The expressions tutors record on the interview form may differ in detailed wording, but the kinds of qualities laid down by the DES are measured and the students selected are perceived to have those qualities to a greater degree than those rejected. The insights which can be gleaned from comparing the judgments made at interview with the outcomes of the course indicate that not only do interviewers take this responsibility seriously they also carry it out well in respect of subsequent performance in practical teaching.
 
Approaches utilized to appraise student progress depend upon the philosophy of education involved. Each specific philosophy uniquely determines that which learners should acquire.The testing and measurement movement stresses the utilization of predetermined objectives written in measurable terms. The objectives are written prior to instruction of learners. With appropriate learning opportunities, a student either does or does not achieve one or more precise objectives. Measuring student progress against the stated objectives emphasizes the concept of criterion referenced testing (CRT).The testing and measurement movement also advocates using norm referenced tests (NRT). Students are spread out on a continuum from highest to lowest based on test scores. Predetermined objectives tend not to exist when utilizing norm referenced tests to measure student achievement. Norm references tests spread students' results in terms of test scores much more so than criterion referenced testing. Students attempt to attain predetermined objectives with CRTs. The measurably stated objectives represent absolute standards. A high number of students might well achieve the measurably stated objectives, as the teacher usually intends.Self-evaluation by the student is an opposite approach to appraisal of learner progress as advocated by the testing and measurement movements. With self-evaluation, responsibility rests with the learner him/herself to acknowledge strengths, weaknesses, and modifications to attain at a higher level. Learners, when evaluating themselves, need to perceive the processes and products completed from the frame of reference of personal improvement. The truth resulting from evaluation may well reside within the student. Subjectivity in results is to be expected, since open-ended criteria are utilized to appraise progress. The tendency here will be not to utilize objective tests to ascertain progress. With self-evaluation, the student might well perceive increased purpose in assessing the self. The teacher is a stimulator and initiator guiding the self-evaluation process.Idealism advocates students' achievement in mental development. Mental maturity here is prized more highly than affective and psychomotor objectives. The affective dimension is salient to the point that learners attain well academically and intellectually. Students' attaining vital concepts and generalizations is of utmost importance to an idealist. To appraise learner progress effectively, the teacher must evaluate student growth in achieving worthwhile subject matter content, consisting of vital broad ideas. Discussions and essay tests, in particular, assist the idealist teacher to determine student acquisition of subject matter.Experimentalists depend upon teacher observation, basically, to evaluate student progress. The experimentalist teacher evaluates students in life-like situations in which they select information and solve problems. Hypotheses, tentative in nature, attempt to provide answers to identified problems. Since each hypothesis is to be tested within a social context, modifications or revisions may need to be made.Perennialism is a philosophy of conservation, rejecting a continually changing environment as identified and defined by experimentalists. The great ideas of thinkers of the past provide subject matter content. The abstract and academic are preferred to the concrete and the practical. Transitory ideas from the past have no place in a perennialist's curriculum. Rather, content must remain salient, vital, and significant as the decades and centuries pass. Endurance in time and in diverse geographical regions characterize that which is classic. Perennialists emphasize the liberal arts and general education for all. Preparing for jobs, careers, and the professions, has no place in such a curriculum. Vocational needs are not to be emphasized on the elementary, high school, or baccalaureate degree level, but only on graduate levels of study which should prepare the student for a niche in the world of work. Prior to that time, however, common learning should be acquired by students in the form of a liberal arts education.Liberal education, which is non-vocational, needs to emphasize as objectives of instruction the development of the mind towards maturity so that the great ideas of the past may be understood and accepted. The classics provide the intellectual system of subject matter which is offered to students. The content needs to be challenging intellectually and enable retention of major concepts and generalizations.
 
-tests results for low and high confidence pupils
Pupils’ attitudes influence both learning and teaching processes and affect the way pupils will engage with art as adults. This article introduces an attitude scale, the Attitude Scale for Art Experienced in School (ASAES), which comprises four subscales: enjoyment, confidence, usefulness, and support. A three-step procedure was followed for the construction and validation of the scale which was administered to 420 primary school pupils in Cyprus. The scale's psychometric properties are evaluated through Confirmatory Factor analysis. The findings indicate that teachers’ art specialisation and attitudes towards art teaching, pupils’ perceived competence and pupils’ gender are three important variables that influence the formation of pupils’ attitudes. Important interactions between these variables are also reported.
 
Pupil monitoring systems support the teacher in tailoring teaching to the individual level of a student and in comparing the progress and results of teaching with national standards. The systems are based on the availability of an item bank calibrated using item response theory. The assessment of the students’ progress and results can be further supported by using computerized adaptive testing where the items selected from the item bank are targeted at the specific ability level of the student. The present article discusses psychometric issues of pupil monitoring systems, such as ability estimation, the optimal construction of tests from the item bank and monitoring of progress.
 
In two papers recently published in this journal, effectiveness is claimed for some training programs (development of social abilities and skills of disadvantaged pupils in Israeli youth villages; change in prejudices toward people of other races in a West German integrated comprehensive school) due to certain changes between sets of variables. As to the conclusions of the authors, some scepticism seems in order, however. Detailed reanalyses of the data were therefore performed using an explicitly structurally oriented approach via target analysis, the results of which contradict those reached by the authors.
 
Cognitive styles characterize individuals' personalities as well as their social and cognitive functioning. An assessment of cognitive styles provides an appraisal which extends the assessment of mental performance beyond the levelsof achievement to patterns of cognitive functioning and thus can present another dimension of individual differences in children as the basis for planning and evaluating early childhood programs.
 
The benefits to be gained from evaluating motor development in young children far outweigh the problems and pitfalls. As information accumulates and is replicated, a more accurate picture of the process of motor development of children is being formed. Research efforts that use a combination of longitudinal and cross-sectional approaches, that focus on both form and performance, and that recognize and need for patient, unbiased data collection are making significant contributions to our understanding of motor development.
 
Numerous factors contribute in affecting classroom instruction, yet it is the teacher that is recognized as having the greatest influence on program success. But how is teacher effectiveness measured? On what standards is a teacher judged? Whatever the specific answers to these questions, one thing is certain: any system of teeacher evaluation requires a detailed description of teacher performance. This paper deals with a technique for examining teacher effectiveness as a function of behaviors exhibited in the classroom. The technique uses pattern analytic procedures applied to interaction data and provides detailed profiles of teacher behavior which are useful in training programs, research, and teacher assessment studies. Following discussion of the technique, selected data from a junior high school science assessment study are presented to illustrate the application of pattern analytic procedures.
 
High quality assessment practice is expected to yield valid and useful score-based interpretations about what the examinees know and are able to do with respect to a defined target domain. Given this assertion, the article presents a framework based on the “unified view of validity,” advanced by Cronbach and Messick over two decades ago, to assist in generating an evidence-based argument regarding the quality of a given assessment practice. The framework encompasses ten sources of evidence pertaining to six aspects: content, structure, sampling, contextual influences, score production, and utility. Each source is addressed with respect to the kinds of evidence that can be accumulated to help support the quality argument and refute rival hypotheses regarding systematic and unsystematic errors that can cause bias among the score-based interpretations. Methods and tools for obtaining the evidence are described and a sample of guiding questions for planning an assessment evaluation is presented in the concluding section.
 
Peer assessment can be a valuable learning tool in teacher education because it supports student teachers to acquire skills that are essential in their professional working life. This article presents a conceptual framework in which the training of peer assessment skills by means of peer assessment tasks is integrated in teacher education courses. Theories about constructive alignment, student involvement, instructional design, and performance assessment underlie the framework. Furthermore, two recently published empirical studies will be briefly described to provide empirical support for the value of the framework. Results of these studies show that the framework offers powerful guidelines for the design and integration of peer assessment activities in teacher training courses. In general, the peer assessment tasks that were embedded in the courses led to a general improvement in students' peer assessment skills as well as their task performance in the domain of the course. Implications for course and curriculum design are discussed.
 
This article argues that a probabilistic interpretation of competence can provide the basis for a link between assessment, teaching and learning, curriculum resources and policy development. Competence is regarded as a way of interpreting the quality of performance in a coherent series of hierarchical tasks. The work of Glaser is combined with that of Rasch and Vygotsky. When assessment performance is reported in terms of competence levels, the score is simply a code for a level of development and helps to indicate Vygotsky's zone of proximal development where the student is ready to learn.
 
This study investigates the effect of method of assessment on student performance. Five research conditions go together with one of four assessment modes, namely: portfolio, case-based, peer assessment, and multiple choice evaluation. Data collection is done by means of a pre-test/ post-test-design with the help of two standardised tests (N=816). Results show that assessment method does make a difference: assessments do not produce overall effects on student performance. Moreover, student-activating instruction efforts do not automatically result in more extensive learning gains. Finally, test results show, when compared to other assessments, a statistically significant positive effect of the multiple choice test on students' test scores. However, students' preparation level and the closed book format of the tests might serve explanatory purposes.
 
This study examines how different stakeholders experience the quality of a nationally developed assessment framework for summative, competence-based assessment (CBA) in AVET, which aims to reflect theoretical characteristics of high quality CBAs. The quality of two summative CBAs, based on this national framework, is evaluated along an extensive, validated set of quality criteria for CBA evaluation and through involving key stakeholders (i.e., students, teachers, developers, and employers). By triangulating quantitative and qualitative evaluations and argumentations of key stakeholders, this study gives insight into the processes and characteristics that determine CBA quality in VET educational practice in relation to theoretical notions of high quality CBAs. Results support many theoretical characteristics and refine them for reaching quality in actual assessment practice. Strikingly, developers and teachers are more critical about the assessment quality than students and employers. The discussion reflects on the theoretical CBA characteristics in the light of the empirical findings and deduces practical implications for the national assessment framework as well as other summative CBAs in VET.
 
Learning outcomes are statements of intended learning within a module. In practice, students may consider various options when undertaking assessments. They may feel they can meet the stated learning outcomes by demonstrating other aspects of study which are termed distractions. Thirty-three undergraduate students' constructed scientific posters and completed questionnaires prior to and on the day of poster submission. Additionally, ten of these students were interviewed. Analysis of questionnaire data using principle components analysis and interview data by clustering units of relevant meaning revealed that students did not differentiate between learning outcomes and distractions, both of which steered their learning during the assessment. These results are discussed within the context of assessment tasks and implications for practice.
 
As assessment methods are changing, the way to determine their quality needs to be changed accordingly. This article argues for the use Competence Assessment Programs (CAPs), combinations of traditional tests and new assessment methods which involve both formative and summative assessments. To assist schools in evaluating their CAPs, a self-evaluation procedure was developed, based on 12 quality criteria for CAPs developed in earlier studies. A self-evaluation was chosen as it is increasingly used as an alternative to external evaluation. The CAP self-evaluation is carried out by a group of functionaries from the same school and comprises individual self-evaluations and a group interview. The CAP is rated on the 12 quality criteria and a piece of evidence is asked for to support these ratings. In this study, three functionaries from eight schools (N = 24) evaluated their CAP using the self-evaluation procedure. Results show that the group interview was very important as different perspectives on the CAP are assembled here into an overall picture of the CAP's quality. Schools seem to use mainly personal experiences to support their ratings and need to be supported in the process of carrying out a self-evaluation.
 
The paper reports results of three studies that used a formative assessment (FA) framework to compare schools that vary in their level of functioning as professional learning communities with respect to three processes: classroom assessment (study 1), development and implementation of school-based curriculum (study 2), and pedagogical conversations at teachers’ lounge professional meetings (study 3). When performed at their best, these three are inquiry processes that follow the phases of formative assessment cycles. Results supported the conclusion that school-based professional learning communities (SBPLC) make a difference in terms of FA practices enacted at both classroom and organizational levels. Moreover, classroom culture seems to mirror the organizational culture, where attributes considered as enhancing FA practice are clearly more noticeable in the high SBPLC group than in the low one.
 
The collection of which this article is a part is entitled “Functions of Assessment in Teacher Education.” The other papers in this set draw our attention to multiple functions of assessment including peer evaluation, promoting reflection, use of technology in demonstrating teacher competence, and validating new national standards of excellence among veteran teachers. Our contribution to the conversation is a modest one — a simple heuristic device that we have found useful in designing and administering assessments in teacher education programs in the USA. Our purpose is to give a brief description of the heuristic device and the assumptions that support it and then to illustrate how we have made use of it in our work as teacher educators.
 
A Comparison Between the Characteristics of Project-Based Education and the Student Research Course
Because of an increasing quality concern for higher education, additional attention is being paid to new educational principles with a more student- and competence-centred vision. Project-based education is one of the learning environments congruent with these principles. Ideally, the students in this learning environment are assessed by suitable assessment modes, like peer assessment and co-assessment. This article evaluates this obviousness critically, focusing on how instructors and students perceive these project-based learning environments and group-based assessment methods. The main conclusion is that it seems very difficult to create a complete assessment procedure in which both parties’ assessment expectations are being met. This is due to crucial contradictions in opinions about assessment in project-based education.
 
Accountability-oriented reforms demand immediate and continual gains on achievement test, for all students, and without diminishing other outcomes or undermining instruction. This paper describes a framework for aligning classroom assessment and external testing with the aim of negotiating these seemingly contradictory goals. The framework varies the sensitivity to instruction and the representations of knowledge across approaches to assessment. Cycles of design- based studies successively refine relationships between a curriculum and the frame that each assessment provides. Doing so, we argue, leverages the unique formative and summative balance across assessments in order to scaffold learning and demonstrate the “consequential” validity of our strategy without compromising curricula, instruction, or the “evidential” validity that warrants their continued use.
 
The study aimed to determine if students in a redesigned course, firstly, hold different perceptions of the assessment demands and, secondly, adjusted their learning strategies towards deeper learning. Contrary to expectations, the students in the original assignment-based (ABL) course (n = 406 students) adopted more deep- learning strategies and less surface-learning strategies than the students in the problem-based (PBL) course (n = 312 students). Although both course format as well as assessment clearly differed in the two conditions, this has not resulted in different perceptions of the assessment demands. Additionally, the results show clearly that the students who express their intentions to employ a certain learning strategy perceive the assessment demands as such and actually employ a related learning strategy.
 
This article explores the role of voluntary quality assessments in higher education, underscoring their main features and potentialities, and showing that basic principles and guidelines can be customised in different countries or single institutions. The article extensively presents an assessment project developed at the University of Siena, the first integrated assessment of teaching and research quality of a whole university carried out in Italy. The article clearly highlights the relevance of quality assurance and performance measurement systems and the behavioural and organisational impacts they may have.
 
In 1989, the IEA Computers in Education study collected data on computer use in elementary, lower- and upper secondary schools in 22 educational systems. The data collection included attitude measures for principals of computer-using as well as non-using schools and for teachers of computer education courses and teachers of existing subjects. The latter group consisted of computer-using as well as non-computer using teachers from mathematics, science and mother tongue. This article raises the question to what extent attitudes play a role in the process of integrating computers in the existing curriculum. The article mainly focuses on lower secondary schools from 14 countries and shows that attitudes of principals and teachers of existing subjects vary greatly, between as well as within countries. Moreover, it can be shown that attitudes covary in a meaningful way with the extent to which computers are used by computer-using teachers.
 
Special education has seen rapid and extensive changes over the last decade in many countries. Even where a legislative mandate has been introduced (as with P.L. 94–142 in the United States), evaluation has developed more slowly than the provision of services. For example, the comprehensive (Kauffman & Hallahan, 1981) contains no index entry for evaluation. Furthermore, it will be argued here that contemporary emphases in educational evaluation have not addressed certain values that are crirical for special education.
 
In many universities authorship credit plays an important role in academic decision-making, such as for tenure and promotion. The purpose of this study was to identify factors that influence the placement of names in coauthored works in specific education-related journals, and to identify perceived benefits of single and coauthored publications. Results indicate that both contribution amount and idea origination were typically used to determine name placement, but respondents also noted that authorship credit was assigned based upon other criteria, such as seniority and assistance to colleagues. A number of benefits for both sole and coauthored publications were also found.
 
This article discusses the positive and negative consequences of large-scale testing on five key stakeholders of testing results: students, teachers, administrators, policymakers and parents. The factors that affect the nature of testing consequences are also discussed and means that may provide remedies for associated pitfalls are proposed.
 
This article compares the invariance properties of two methods of psychometric instrument calibration for the development of a measure of wealth among families of Grade 5 pupils in five provinces in Vietnam. The measure is based on self-reported lists of possessions in the home. Its stability has been measured over two time periods. The concept of fundamental measurement, and the properties of construct and measurement invariance have been outlined. Item response modelling (IRM) and confirmatory factor modelling (CFM) as comparative methodologies, and the processes used for evaluating these, have been discussed. Each procedure was used to calibrate a 23-item instrument with data collected from a probability sample of Grade 5 pupils in a total of 60 schools. The two procedures were compared on the basis of their capacity to provide evidence of construct and measurement invariance, stability of parameter estimates, bias for or against sub samples, and the simplicity of the procedures and their interpretive powers. Both provided convincing evidence of construct invariance, but only the Rasch procedure was able to provide firm evidence of measurement invariance, parameter stability and a lack of bias across samples.
 
This article focuses on eight evaluation studies conducted in one school district in California during a period of 25 months (July 1999 to July 2001). The district enrolls close to 30,000 students in kindergarten through Grade 8. Three goals guide this article, as follows: (1) to highlight the studies, (2) to examine them in relation to educational administration, and (3) to derive lessons and implications from the analysis.
 
Recent decades have witnessed a remarkable rise in the regulation of public services and servants, education being a case in point. External evaluation and inspection has been an important element of this trend. Increasingly, however as the limitations of external surveillance systems have become clear the concept of internal or self-evaluation has grown in importance. This paper explores the concept of self-evaluation in education and gives an account of some of the possibilities and problems associated with it. In particular it is argued that enabling individual schools and teachers to self-evaluate effectively is a complex task that will require help and support from the community of professional evaluators.
 
A number of strategies relating to the evaluation of preschool programs have been changing. One change has been from evaluating service delivery systems and child development outcomes independent of one another. A second area of change has been from evaluating the impact of a particular intervention on an individual child to evaluating the impact of an intervention which is part of a dynamic (ecological) setting, on a child who is within that dynamic (ecological) setting. A third change has been from evaluating simply in terms of child development outcomes to evaluating both outcomes as well as those aspects of a child development program which can be manipulated or controlled by policy and management decisions. These are not mutually exclusive changes, and, indeed, a good deal of overlap will be evident. However, there are sufficient differences in approach to warrant separate treatment. Each of these areas of change will be described largely within the context of programs and concerns of the Administration for Children, Youth and Families (ACYF).
 
Top-cited authors
Filip Dochy
  • KU Leuven
Dylan Wiliam
  • University College London
Richard M. Clifford
  • University of North Carolina at Chapel Hill
Mien R.Segers
  • Maastricht University School of Business and Economics
Kari Smith
  • Norwegian University of Science and Technology