Educational measurement: Prospects for research and innovation
  • Article

December 1988


1 Read


2 Citations

The Australian Educational Researcher

Geofferey N. Masters

Reforming the Assessment of Student Achievement in the Senior Secondary School

November 1988


10 Reads


10 Citations

Australian Journal of Education

The challenge that confronts agencies responsible for assessment and reporting in the senior secondary school is to extend systematic assessment procedures to a broader range of learning outcomes than those currently assessed by public examinations, to develop methods of reporting which are more descriptive of individual achievement and which provide a better basis for describing and maintaining standards, and to provide results which are sufficiently comparable across schools to enable fair comparisons of applicants for tertiary study. Some recent developments in assessment and reporting practice are considered with a view to identifying methods and approaches capable of satisfying this diverse set of demands. An approach which is particularly appealing because of its potential to provide simultaneously more descriptive reports of student achievement and adequate levels of comparability is the use of a set of common assessment tasks attempted by all students enrolled in each Year 12 course of study.

The Analysis of Partial Credit Scoring

October 1988


311 Reads


86 Citations

Applied Measurement in Education

This article discusses a range of issues in the practical application of an item response theory (IRT) method for partial credit scoring. After a brief discussion of partial credit scoring as an alternative to right-wrong scoring in the measurement of educational achievement, an IRT model for partial credit analysis is developed and described. This model is presented as a straightforward and logical application of Rasch's dichotomous model to a sequence of ordered response alternatives. The distinctive nature of the item parameters in the model is described and these parameters are contrasted with two more familiar sets of parameters: Thurstone thresholds and the difficulties of dichotomously scored subitems. Issues in marking out and interpreting variables using this model are discussed. Brief mention is made of several special cases of the partial credit model that may be useful in particular applications and for particular kinds of test and questionnaire data.

Anchor Tests, Score Equating and Sex Bias

April 1988


4 Reads


2 Citations

Australian Journal of Education

This paper discusses the use of anchor tests (scaling tests) to bring two or more sets of scores to a common scale. Particular attention is given to the rescaling of school-based assessments against an external test or examination and to potential sources of bias in this procedure. The need for routine validity checks is emphasized, and a latent trait approach to constructing a statistical framework for tests and examination score equating is described and illustrated. Bias caused by rescaling school assessments against an inappropriate anchor test is illustrated using a 1984 attempt to rescale students' assessments in English against the Australian Scholastic Aptitude Test.

Item Discrimination: When More Is Worse

March 1988


57 Reads


116 Citations

Journal of Educational Measurement

High item discrimination can be a symptom o f a special kind of measurement disturbance introduced by an item that gives persons o f high ability a special advantage over and above their higher abilities. This type o f disturbance, which can be interpreted as a form o f item “bias,” can be encouraged by methods that routinely interpret highly discriminating items as the “best” items on a test and may be compounded by procedures that weight items by their discrimination. The type of measurement disturbance described and illustrated in this paper occurs when an item is sensitive to individual differences on a second, undesired dimension that is positively correlated with the variable intended to be measured. Possible secondary influences o f this type include opportunity to learn, opportunity to answer, and test wiseness

Measurement Models for Ordered Response Categories

January 1988


11 Reads


40 Citations

Quantitative educational research depends on the availability of carefully constructed variables. The construction and use of a variable begin with the idea of a single dimension or line on which students can be compared and along which progress can be monitored. This idea is operationalized by inventing items intended as indicators of this latent variable and using these items to elicit observations from which students’ positions on the variable might be inferred.

Banking Non-Dichotomously Scored Items

December 1986


39 Reads


18 Citations

Applied Psychological Measurement

A method for constructing a bank of items scored in two or more ordered response categories is de scribed and illustrated. This method enables multistep problems, rating scale items, question "clusters," and other items using partial credit scoring to be calibrated and incorporated into an item bank, and it provides a mechanism for computer adaptive testing with items of this type. Procedures are described for calibrating an initial set of items, for testing the fit of items to the underlying measurement model, and for linking new items to an existing item bank. The method is illus trated using items from the Watson-Glaser Critical Thinking Appraisal.

A Comparison of Latent Trait and Latent Class Analyses of Likert-Type Data

March 1985


39 Reads


44 Citations


Common-Person Equating with the Rasch Model

March 1985


40 Reads


35 Citations

Applied Psychological Measurement

Two procedures, one based on item difficulties, the other based on person abilities, were used to equate 14 forms of a reading comprehension test using the Rasch model. These forms had no items in common. For practical purposes, the two procedures produced equivalent results. An advantage of common-person equating for testing the unidimensionality assumption is pointed out, and the need for caution in interpreting tests of common-item invariance is stressed.

The Essential Process in a Family of Measurement Models

December 1984


9 Reads


179 Citations


