Journal of applied measurement (J Appl Meas)

Journal description

Journal of Applied Measurement publishes refereed scholarly work from all academic disciplines that relates to measurement theory and its application to developing variables. The construction and interpretation of meaningful and unambiguous variables is a salient feature of measurement. It represents the congruence of measurement theory and substantive research in a wide range of scientific endeavors. The development of variables that map the persons and items onto a common metric, operational defined by the items, that are invariant across samples of persons and items, is a cornerstone of developing an understanding of the phenomena being measured and the construction and verification of hypotheses based on these phenomena. The journal will also publish invited articles that provide examples of methodological issues that are relevant to constructing useful variables.

Current impact factor: 0.00

Impact Factor Rankings

Additional details

5-year impact 0.00
Cited half-life 0.00
Immediacy index 0.00
Eigenfactor 0.00
Article influence 0.00
Website Journal of Applied Measurement website
Other titles Journal of applied measurement
ISSN 1529-7713
OCLC 43888528
Material type Periodical
Document type Journal / Magazine / Newspaper

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Chi-square statistics are commonly used for tests of fit of measurement models. Chi-square is also sensitive to sample size, which is why several approaches to handle large samples in test of fit analysis have been developed. One strategy to handle the sample size problem may be to adjust the sample size in the analysis of fit. An alternative is to adopt a random sample approach. The purpose of this study was to analyze and to compare these two strategies using simulated data. Given an original sample size of 21,000, for reductions of sample sizes down to the order of 5,000 the adjusted sample size function works as good as the random sample approach. In contrast, when applying adjustments to sample sizes of lower order the adjustment function is less effective at approximating the chi-square value for an actual random sample of the relevant size. Hence, the fit is exaggerated and misfit under-estimated using the adjusted sample size function. Although there are big differences in chi-square values between the two approaches at lower sample sizes, the inferences based on the p-values may be the same.
    Journal of applied measurement 06/2015; 16(2):204-217.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Gendered language attitudes (GLAs) are gender-based perceptions of language varieties based on connections between gender-related and linguistic characteristics of individuals, including the perception of language varieties as possessing degrees of masculinity and femininity. This study combines substantive theory about language learning and gender with a model based on Rasch measurement theory to explore the psychometric properties of a new measure of GLAs. Findings suggest that GLAs is a unidimensional construct and that the items used can be used to describe differences among students in terms of the strength of their GLAs. Implications for research, theory, and practice are discussed. Special emphasis is given to the teaching and learning of languages.
    Journal of applied measurement 01/2015; 16(1):95-112.
  • [Show abstract] [Hide abstract]
    ABSTRACT: English and number literacy are important for successful learning and testing student literacy and numeracy standards enables early identification and remediation of children who have difficulty. Rasch measures were created with the RUMM2020 computer program for the perceptual constructs of visual discrimination upper case letters, lower case letters and numbers. Thirty items for Visual Discrimination of Upper Case Letters (VDUCL), 36 for Lower Case Letters (VDLCL) and 20 for Visual Discrimination of Numbers (VDN) were presented to 324 Pre-Primary through Year 4 children, aged 4-9 years old. All students attended school in Perth, Western Australia. Eighteen of the initial 30 items for VDUCL, thirty-one of the original 36 items for VDLCL and thirteen of the original 20 items for VDN were used to create linear scales (the others were deleted due to misfit) and these clearly showed which letters and numbers children said were easy and which were hard.
    Journal of applied measurement 01/2015; 16(1):24-40.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Employers frequently select an employee among numerous candidates. They have to evaluate these candidates by multiple criteria that raise the problem of how to determinate the relative importance of these criteria. Traditionally, when engaging a new employee, the employer will develop a set of criteria and their associate weightings according with its institution's goals. However, the weight setting also reflects the priority of goals. It is frequently ignored. That is to say, it is necessary to recheck whether the weighting set reflects the institution's goals' priority appropriately. In this research, we proposed a mechanism that gives the chance to review the criteria weighting to see if it is adequately satisfies its institution's actual goals. This double-check procedure can further help the employer select appropriate personnel for his or her institution.
    Journal of applied measurement 01/2015; 16(1):82-94.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This article reports the results of an application of the Rasch rating scale model to the Teaching Strategies GOLD assessment system in a norm sample of children aged birth to 71 months. The analyses focused on the examination of dimensionality, rating scale effectiveness, the hierarchy of item difficulties, and the relationship of developmental scale scores to child age. Results show that each subscale satisfies the Rasch model for unidimensionality. Ratings were found to be less reliable at the lowest and highest ends of the scale and less distinct at 'In-between' levels. Items appear to form theoretically expected hierarchies, supporting evidence for construct validity for the measures. Moderately high correlations of developmental scale scores with child age suggest that teachers are able to make valid ratings of the developmental progress of children across the intended age range.
    Journal of applied measurement 01/2014; 15(4):405-421.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The assessment of differential item functioning (DIF) remains an area of active research in psychometrics and educational measurement. In recent years, methodological innovations involving mixture Rasch models have provided researchers with an additional set of tools for more deeply understanding the root causes of DIF, while at the same time increased interest in the role of disabilities and accommodations has also made itself felt in the measurement community. The current study furthered work in both areas by using the newly described multilevel mixture Rasch model to investigate the presence of DIF associated with disability and accommodation status at both examinee and school levels for a 3rd grade language assessment. Results of the study found that indeed DIF was present at both levels of analysis, and that it was associated with the presence of disabilities and the receipt of accommodations. Implications of these results for both practitioners and researchers are discussed.
    Journal of applied measurement 01/2014; 15(2):133-51.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Automatic item generation (AIG) is a broad class of methods that are being developed to address psychometric issues arising from internet and computer-based testing. In general, issues emphasize efficiency, validity, and diagnostic usefulness of large scale mental testing. Rapid prominence of AIG methods and their implicit perspective on mental testing is bringing painful scrutiny to many sacred psychometric assumptions. This report reviews basic AIG ideas, then presents conceptual foundations, image model development, and operational application to artistic judgment aptitude testing.
    Journal of applied measurement 01/2014; 15(1):1-25.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This research provides a demonstration of the utility of mixture Rasch models (MRMs) for the analysis of survey data. Specifically, a framework based on a mixture partial credit model (MPCM) will be presented. MRMs are able to provide information regarding latent classes (subpopulations without manifest grouping variables) and separate item parameter estimates for each of these latent classes. Analyses can provide insight into how a survey scale is functioning and how survey respondents differ from one another. The paper provides a detailed example with real survey data from a higher education survey administered to college seniors through all stages of model estimation and selection, description of model results, and follow-up analyses using the MRM results. The results found three distinct classes and discussed each class in terms of the pattern of item parameter estimates within class. The paper also investigated differences of class assignment based on the college the student belongs to on campus.
    Journal of applied measurement 01/2014; 15(4):394-404.
  • [Show abstract] [Hide abstract]
    ABSTRACT: A large body of literature exists describing how rater effects may be detected in rating data. In this study, we compared the flag and agreement rates for several rater effects based on calibration of a real data under two psychometric models-the Rasch rating scale model (RSM) and the Rasch testlet-based rater bundle model (RBM). The results show that the RBM provided more accurate diagnoses of rater severity and leniency than do the RSM which is based on the local independence assumption. However, the statistical indicators associated with rater centrality and inaccuracy remain consistent between these two models.
    Journal of applied measurement 01/2014; 15(2):152-9.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of this study was to develop a general procedure for evaluation of a dynamic assessment and to demonstrate an analysis of a dynamic assessment, the CITM (Tzuriel, 1995b), as an objective measure for use as a group assessment. The techniques used to determine the fit of the CITM to a Rasch partial credit model are explicitly outlined. A modified format of the CITM was administered to 266 diverse second grade students in the USA; 58% of participants were identified as low SES. The participants (males n = 144) were White Anglo and Latino American students (55%), many of whom were first generation Mexican immigrants. The CITM was found to adequately fit a Rasch partial credit model (PCM) indicating that the CITM is a likely candidate for a group administered dynamic assessment that can be measured objectively. Data also supported that a model for objectively measuring change in learning ability for inferential thinking in the CITM was feasible.
    Journal of applied measurement 01/2014; 15(1):40-52.
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study assessed the ability of military aircrews to adapt to stressors when undergoing centrifuge training and determined what equipment items caused perceived stress and needed to be upgraded. We used questionnaires and the Rasch model to measure aircrew personnel's ability to adapt to centrifuge training. The measurement items were ranked by 611 military aircrew personnel. Analytical results indicated that the majority of the stress perceived by aircrew personnel resulted from the lightproof cockpit without outer reference. This study prioritized the equipment requiring updating as the lightproof cockpit design, the dim lighting of the cockpit, and the pedal design. A significant difference was found between pilot and non-pilot subjects' stress from the pedal design; and considerable association was discernible between the seat angle design and flight hours accrued. The study results provide aviators, astronauts, and air forces with reliable information as to which equipment items need to be urgently upgraded as their present physiological and psychological effects can affect the effectiveness of centrifuge training.
    Journal of applied measurement 01/2014; 15(2):200-12.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Infit mean square W and the Outfit mean square U are commonly used person fit indexes under Rasch measurement. However, they suffer from two major weaknesses. First, their asymptotic distribution is usually derived by assuming that the true ability levels are known. Second, such distributions are even not clearly stated for indexes U and W. Both issues can seriously affect the selection of an appropriate cut-score for person fit identification. Snijders (2001) proposed a general approach to correct some person fit indexes when specific ability estimators are used. The purpose of this paper is to adapt this approach to U and W indexes. First, a brief sketch of the methodology and its application to U and W is proposed. Then, the corrected indexes are compared to their classical versions through a simulation study. The suggested correction yields controlled Type I errors against both conservatism and inflation, while the power to detect specific misfitting response patterns gets significantly increased.
    Journal of applied measurement 01/2014; 15(1):82-93.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The model of hierarchical complexity (MHC) provides an analytic a priori measurement of the difficulty of tasks. As part of the theory of measurement in mathematical psychology, the model of hierarchical complexity (Commons and Pekker, 2008) defines a new kind of scale. It is important to note that the orders of hierarchical complexity of tasks are postulated to form an ordinal scale. A formal definition of the model of hierarchical complexity is presented along with the descriptions of its five axioms that help determine how the model of hierarchical complexity orders actions to form a hierarchy. The fourth and the fifth axioms are of particular importance in establishing that the orders of hierarchical complexity form an equally spaced ordinal scale. Previously, it was shown that Rasch-scaled items followed the same sequence as their orders of hierarchical complexity. Here, it is shown that the gaps between the highest Rasch scaled item scores at a lower order and the lowest scores at the next higher order exist. We found there was no overlap between the Rasch-scaled item scores at one order of complexity, and those of the adjoining orders. There are 'gaps' between the stages of performance on those items. Second, we tested for equal spacing between the orders of hierarchical complexity. We found that the orders of hierarchical complexity were equally spaced. To deviate significantly from the data, the orders had to deviate from linearity by over .25 of an order. This would appear to be an empirical and mathematical confirmation for the equally spaced stages of development.
    Journal of applied measurement 01/2014; 15(4):422-449.