Journal of applied measurement (J Appl Meas)

Journal description

Journal of Applied Measurement publishes refereed scholarly work from all academic disciplines that relates to measurement theory and its application to developing variables. The construction and interpretation of meaningful and unambiguous variables is a salient feature of measurement. It represents the congruence of measurement theory and substantive research in a wide range of scientific endeavors. The development of variables that map the persons and items onto a common metric, operational defined by the items, that are invariant across samples of persons and items, is a cornerstone of developing an understanding of the phenomena being measured and the construction and verification of hypotheses based on these phenomena. The journal will also publish invited articles that provide examples of methodological issues that are relevant to constructing useful variables.

Current impact factor: 0.00

Impact Factor Rankings

Additional details

5-year impact 0.00
Cited half-life 0.00
Immediacy index 0.00
Eigenfactor 0.00
Article influence 0.00
Website Journal of Applied Measurement website
Other titles Journal of applied measurement
ISSN 1529-7713
OCLC 43888528
Material type Periodical
Document type Journal / Magazine / Newspaper

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: The most common approach to modelling item discrimination and guessing for multiple-choice questions is the three parameter logistic (3PL) model. However, proponents of Rasch models generally avoid using the 3PL model because to model guessing entails sacrificing the distinctive property and advantages of Rasch models. One approach to dealing with guessing based on the application of Rasch models is to omit responses in which guessing appears to play a significant role. However, this approach entails loss of information and it does not account for variable item discrimination. It has been shown, though, that provided specific constraints are met, it is possible to parameterize discrimination while preserving the distinctive property of Rasch models. This article proposes an approach that uses Rasch models to account for guessing on standard multiple-choice items simply by treating it as a source of low item discrimination. Technical considerations are noted although a detailed examination of such considerations is beyond the scope of this article.
    Journal of applied measurement 06/2015; 16(2):193-203.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The latest national science framework has formally stated the need for developing assessments that test both students' content knowledge and scientific practices. In response to this call, a science assessment that consists of (a) content items that measure students' understanding of a grade eight physics topic and (b) argumentation items that measure students' argumentation competency has been developed. This paper investigated the function of these content and argumentation items with a multidimensional measurement framework from two perspectives. First, we performed a dimensionality analysis to investigate whether the relationship between the content and argumentation items conformed to test deign. Second, we conducted a differential item functioning analysis in the multidimensional framework to examine if any content or argumentation item unfairly favored students with an advanced level of English literacy. Methods and findings of this study could inform future research on the validation of assessments measuring higher-order and complex abilities.
    Journal of applied measurement 06/2015; 16(2):171-92.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Engelhard (1996) proposed a rater accuracy model (RAM) as a means of evaluating rater accuracy in rating data, but very little research exists to determine the efficacy of that model. The RAM requires a transformation of the raw score data to accuracy measures by comparing rater-assigned scores to true scores. Indices computed based on raw scores also exist for measuring rater effects, but these indices ignore deviations of rater-assigned scores from true scores. This paper demonstrates the efficacy of two versions of the RAM (based on dichotomized and polytomized deviations of rater-assigned scores from true scores) to two versions of raw score rater effect models (i.e., a Rasch partial credit model, PCM, and a Rasch rating scale model, RSM). Simulated data are used to demonstrate the efficacy with which these four models detect and differentiate three rater effects: severity, centrality, and inaccuracy. Results indicate that the RAMs are able to detect, but not differentiate, rater severity and inaccuracy, but not rater centrality. The PCM and RSM, on the other hand, are able to both detect and differentiate all three of these rater effects. However, the RSM and PCM do not take into account true scores and may, therefore, be misleading when pervasive trends exist in the rater-assigned data.
    Journal of applied measurement 06/2015; 16(2):153-60.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The main aim of this study was to evaluate whether the construct validity of the Tampa Scale for Kinesiophobia (TSK) is consistent with respect to its scaling properties, unidimensionality and targeting among workers with different levels of pain. The 311 participating Danish workers reported kinesiophobia by TSK (13 statement version) and number of days with pain during the past year (less than 8 days, less than 90 days and greater than 90 days). A Rasch analysis was used to evaluate the measurement properties of the TSK in the workers across pain levels, ages, genders and ethnicities. The TSK did not fit the Rasch model, but removing one item solved the poorness of fit. Invariance was found across the pain levels, ages and genders. Thus, with a few modifications, the TSK was shown to capture a unidimensional construct of fear of movement in workers with different pain levels, ages, and genders.
    Journal of applied measurement 06/2015; 16(2):218-27.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Chi-square statistics are commonly used for tests of fit of measurement models. Chi-square is also sensitive to sample size, which is why several approaches to handle large samples in test of fit analysis have been developed. One strategy to handle the sample size problem may be to adjust the sample size in the analysis of fit. An alternative is to adopt a random sample approach. The purpose of this study was to analyze and to compare these two strategies using simulated data. Given an original sample size of 21,000, for reductions of sample sizes down to the order of 5,000 the adjusted sample size function works as good as the random sample approach. In contrast, when applying adjustments to sample sizes of lower order the adjustment function is less effective at approximating the chi-square value for an actual random sample of the relevant size. Hence, the fit is exaggerated and misfit under-estimated using the adjusted sample size function. Although there are big differences in chi-square values between the two approaches at lower sample sizes, the inferences based on the p-values may be the same.
    Journal of applied measurement 06/2015; 16(2):204-217.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study describes how the Programme for International Student Assessment (PISA) can be used to internationally benchmark state performance standards. The process is accomplished in three steps. First, PISA items are embedded in the administration of the state assessment and calibrated on the state scale. Second, the international item calibrations are then used to link the state scale to the PISA scale through common item linking. Third, the statistical linking results are used as part of the state standard setting process to help standard setting panelists determine how high their state standards need to be in order to be internationally competitive. This process was carried out in Delaware, Hawaii, and Oregon, in three subjects-science, mathematics and reading with initial results reported by Phillips and Jiang (2011). An in depth discussion of methods and results are reported in this article for one subject (mathematics) and one state (Hawaii).
    Journal of applied measurement 06/2015; 16(2):161-70.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gendered language attitudes (GLAs) are gender-based perceptions of language varieties based on connections between gender-related and linguistic characteristics of individuals, including the perception of language varieties as possessing degrees of masculinity and femininity. This study combines substantive theory about language learning and gender with a model based on Rasch measurement theory to explore the psychometric properties of a new measure of GLAs. Findings suggest that GLAs is a unidimensional construct and that the items used can be used to describe differences among students in terms of the strength of their GLAs. Implications for research, theory, and practice are discussed. Special emphasis is given to the teaching and learning of languages.
    Journal of applied measurement 01/2015; 16(1):95-112.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Employers frequently select an employee among numerous candidates. They have to evaluate these candidates by multiple criteria that raise the problem of how to determinate the relative importance of these criteria. Traditionally, when engaging a new employee, the employer will develop a set of criteria and their associate weightings according with its institution's goals. However, the weight setting also reflects the priority of goals. It is frequently ignored. That is to say, it is necessary to recheck whether the weighting set reflects the institution's goals' priority appropriately. In this research, we proposed a mechanism that gives the chance to review the criteria weighting to see if it is adequately satisfies its institution's actual goals. This double-check procedure can further help the employer select appropriate personnel for his or her institution.
    Journal of applied measurement 01/2015; 16(1):82-94.
  • [Show abstract] [Hide abstract]
    ABSTRACT: English and number literacy are important for successful learning and testing student literacy and numeracy standards enables early identification and remediation of children who have difficulty. Rasch measures were created with the RUMM2020 computer program for the perceptual constructs of visual discrimination upper case letters, lower case letters and numbers. Thirty items for Visual Discrimination of Upper Case Letters (VDUCL), 36 for Lower Case Letters (VDLCL) and 20 for Visual Discrimination of Numbers (VDN) were presented to 324 Pre-Primary through Year 4 children, aged 4-9 years old. All students attended school in Perth, Western Australia. Eighteen of the initial 30 items for VDUCL, thirty-one of the original 36 items for VDLCL and thirteen of the original 20 items for VDN were used to create linear scales (the others were deleted due to misfit) and these clearly showed which letters and numbers children said were easy and which were hard.
    Journal of applied measurement 01/2015; 16(1):24-40.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This article reports the results of an application of the Rasch rating scale model to the Teaching Strategies GOLD assessment system in a norm sample of children aged birth to 71 months. The analyses focused on the examination of dimensionality, rating scale effectiveness, the hierarchy of item difficulties, and the relationship of developmental scale scores to child age. Results show that each subscale satisfies the Rasch model for unidimensionality. Ratings were found to be less reliable at the lowest and highest ends of the scale and less distinct at 'In-between' levels. Items appear to form theoretically expected hierarchies, supporting evidence for construct validity for the measures. Moderately high correlations of developmental scale scores with child age suggest that teachers are able to make valid ratings of the developmental progress of children across the intended age range.
    Journal of applied measurement 09/2014; 15(4):405-421.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The model of hierarchical complexity (MHC) provides an analytic a priori measurement of the difficulty of tasks. As part of the theory of measurement in mathematical psychology, the model of hierarchical complexity (Commons and Pekker, 2008) defines a new kind of scale. It is important to note that the orders of hierarchical complexity of tasks are postulated to form an ordinal scale. A formal definition of the model of hierarchical complexity is presented along with the descriptions of its five axioms that help determine how the model of hierarchical complexity orders actions to form a hierarchy. The fourth and the fifth axioms are of particular importance in establishing that the orders of hierarchical complexity form an equally spaced ordinal scale. Previously, it was shown that Rasch-scaled items followed the same sequence as their orders of hierarchical complexity. Here, it is shown that the gaps between the highest Rasch scaled item scores at a lower order and the lowest scores at the next higher order exist. We found there was no overlap between the Rasch-scaled item scores at one order of complexity, and those of the adjoining orders. There are 'gaps' between the stages of performance on those items. Second, we tested for equal spacing between the orders of hierarchical complexity. We found that the orders of hierarchical complexity were equally spaced. To deviate significantly from the data, the orders had to deviate from linearity by over .25 of an order. This would appear to be an empirical and mathematical confirmation for the equally spaced stages of development.
    Journal of applied measurement 09/2014; 15(4):422-449.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This research provides a demonstration of the utility of mixture Rasch models (MRMs) for the analysis of survey data. Specifically, a framework based on a mixture partial credit model (MPCM) will be presented. MRMs are able to provide information regarding latent classes (subpopulations without manifest grouping variables) and separate item parameter estimates for each of these latent classes. Analyses can provide insight into how a survey scale is functioning and how survey respondents differ from one another. The paper provides a detailed example with real survey data from a higher education survey administered to college seniors through all stages of model estimation and selection, description of model results, and follow-up analyses using the MRM results. The results found three distinct classes and discussed each class in terms of the pattern of item parameter estimates within class. The paper also investigated differences of class assignment based on the college the student belongs to on campus.
    Journal of applied measurement 09/2014; 15(4):394-404.