Journal of applied measurement (J Appl Meas)

Journal description

Journal of Applied Measurement publishes refereed scholarly work from all academic disciplines that relates to measurement theory and its application to developing variables. The construction and interpretation of meaningful and unambiguous variables is a salient feature of measurement. It represents the congruence of measurement theory and substantive research in a wide range of scientific endeavors. The development of variables that map the persons and items onto a common metric, operational defined by the items, that are invariant across samples of persons and items, is a cornerstone of developing an understanding of the phenomena being measured and the construction and verification of hypotheses based on these phenomena. The journal will also publish invited articles that provide examples of methodological issues that are relevant to constructing useful variables.

Current impact factor: 0.00

Impact Factor Rankings

Additional details

5-year impact 0.00
Cited half-life 0.00
Immediacy index 0.00
Eigenfactor 0.00
Article influence 0.00
Website Journal of Applied Measurement website
Other titles Journal of applied measurement
ISSN 1529-7713
OCLC 43888528
Material type Periodical
Document type Journal / Magazine / Newspaper

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: This study explored the utility of logistic mixed models for the analysis of differential item functioning when item response data were testlet-based. Decomposition of differential item functioning (DIF) into item level and testlet level for the testlet-based data was introduced to separate possible sources of DIF: (1) an item, (2) a testlet, and (3) both the item and the testlet. Simulation study was conducted to investigate the performance of several logistic mixed models as well as the Mantel-Haenszel method under the conditions, in which the item-related DIF and testlet-related DIF were present simultaneously. The results revealed that a new DIF model based on a logistic mixed model with random item effects and item covariates could capture the item-related DIF and testlet-related DIF well under certain conditions.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Summative didactic evaluation often involves multiple choice questions which are then aggregated into exam scores, course scores, and cumulative grade point averages. To be valid, each of these levels should have some relationship to the topic tested (dimensionality) and be sufficiently reproducible between persons (reliability) to justify student ranking. Evaluation of dimensionality is difficult and is complicated by the classic observation that didactic performance involves a generalized component (g) in addition to subtest specific factors. In this work, 183 students were analyzed over two academic years in 13 courses with 44 exams and 3352 questions for both accuracy and speed. Reliability at all levels was good (>0.95). Assessed by bifactor analysis, g effects dominated most levels resulting in essential unidimensionality. Effect sizes on predicted accuracy and speed due to nesting in exams and courses was small. There was little relationship between person ability and person speed. Thus, the hierarchical grading system appears warrented because of its g-dependence.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study looked at numerous aspects of item parameter drift (IPD) and its impact on measurement in computer adaptive testing (CAT). A series of CAT simulations were conducted, varying the amount and magnitude of IPD, as well as the size of the item pool. The effects of IPD on measurement precision, classification, and test efficiency, were evaluated using a number of criteria. These included bias, root mean square error (RMSE), absolute average difference (AAD), total percentages of misclassifcation, the number of false positives and false negatives, the total test lengths, and item exposure rates. The results revealed negligible differences when comparing the IPD conditions to the baseline condition for all measures of precision, classification accuracy, and test efficiency. The most relevant finding indicates that magnitude of drift has a larger impact on measurement precision than the number of items with drift.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes an approach to the assessment of human to human collaborative problem solving using a set of online interactive tasks completed by student dyads. Within the dyad, roles were nominated as either A or B and students selected their own roles. The question as to whether role selection affected individual student performance measures is addressed. Process stream data was captured from 3402 students in six countries who explored the problem space by clicking, dragging the mouse, moving the cursor and collaborating with their partner through a chat box window. Process stream data were explored to identify behavioural indicators that represented elements of a conceptual framework. These indicative behaviours were coded into a series of dichotomous items. These items represented actions and chats performed by students. The frequency of occurrence was used as a proxy measure of item difficulty. Then given a measure of item difficulty, student ability could be estimated using the difficulty estimates of the range of items demonstrated by the student. The Rasch simple logistic model was used to review the indicators to identify those that were consistent with the assumptions of the model and were invariant across national samples, language, curriculum and age of the student. The data were analysed using a one and two dimension, one parameter model. Rasch separation reliability, fit to the model, distribution of students and items on the underpinning construct, estimates for each country and the effect of role differences are reported. This study provides evidence that collaborative problem solving can be assessed in an online environment involving human to human interaction using behavioural indicators shown to have a consistent relationship between the estimate of student ability, and the probability of demonstrating the behaviour.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: The authors investigated the effect of missing completely at random (MCAR) item responses on partial credit model (PCM) parameter estimates in a longitudinal study of Positive Affect. Participants were 307 adults from the older cohort of the Notre Dame Study of Health and Well-Being (Bergeman and Deboeck, 2014) who completed questionnaires including Positive Affect items for 56 days. Additional missing responses were introduced to the data, randomly replacing 20%, 50%, and 70% of the responses on each item and each day with missing values, in addition to the existing missing data. Results indicated that item locations and person trait level measures diverged from the original estimates as the level of degradation from induced missing data increased. In addition, standard errors of these estimates increased with the level of degradation. Thus, MCAR data does damage the quality and precision of PCM estimates.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Satisfied patients are more likely to be compliant, have better outcomes, and are more likely to return to the same provider or institution for future care. The Satisfaction with a Continuum of Care survey (SCC) was designed to improve patient care using measures of patient satisfaction and facilitate a cultural shift from a "silos-of-care" to a "continuum-of-care" mentality by fostering inter-departmental communication as patients moved between environments of care at a Midwestern rehabilitation hospital. This study provides a Rasch measurement framework for investigating issues related to survey reliability and validity. The results indicate that although certain aspects of the survey seem to function in a psychometrically sound manner, the questions are too easy to endorse and provide little information to help improve patient care. Suggestions for future revisions to this survey instrument are provided.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mobility or freedom and ability to move is gendered in many cultural contexts. In this paper I analyse mobility associated with work from the capability approach perspective of Sen. This is an empirical paper which uses the Rasch Rating Scale Model (RSM) to construct the measure of mobility of women for the first time in the development studies discourse. I construct a measure of mobility (latent trait) of women workers engaged in two types of informal work, namely, peeling work and fish vending, in fisheries in the cultural context of India. The scale measure enables first, to test the unidimensionality of my construct of mobility of women and second, to analyse the domains of mobility of women workers. The comparative analysis of the scale of permissibility of mobility constructed using the RSM for the informal women workers shows that women face constraints on mobility in social and personal spaces in the socially advanced state of Kerala in India. Work mobility does not expand the real freedoms, hence work mobility can be termed as bounded capability which is a capability limited or bounded by either the social, cultural and gender norms or a combination of all of these. Therefore at the macro level, growth in informal employment in sectors like fisheries which improve mobility of women through work mobility does not necessarily expand the capability sets by contributing to greater freedoms and transformational mobility. This paper has a significant methodological contribution in that it uses an innovative method for the measurement of mobility of women in the development studies discipline.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Differences between working conceptually and procedurally with mathematics are well documented. In short, working procedurally can be characterized as learning and applying rules without reason. Working conceptually, in contrast, means creating and applying a web of knowledge. To continue this line of research, an instrument that is able to measure the level of conceptual work, and that is based on the basic requirements of measurement, is desireable. As such, this paper presents a Rasch calibrated instrument that measures the extent to which students work conceptually with mathematics. From a sample of 133 student teachers and 185 Civil Engineering students, 20 items are concluded as being productive for measurement.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Research in teacher self-efficacy has a long history that can be traced back to Bandura (1986) and has been shown to be linked to teacher performance. This article presents evidence for teacher self-efficacy in urban schools, a construct that is separate from but related to the more general construct of teacher self-efficacy. An instrument was developed and validated by a team of university faculty, urban teachers, and school administrators. The Teachers' Sense of Efficacy in Urban Schools (SEUS) is a 15-item instrument designed to address factors that are important for success in teaching in an urban environment, including working effectively with English language learners, students with disabilities, economically disadvantaged students, cultural diversity, literacy, technology, differentiation, and assessment data. The present study analyzes SEUS on multiple levels, using the Rasch partial credit model.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study investigated the perceptions of 1235 students of their form teachers' interpersonal behaviors across 40 classrooms in 24 Singaporean secondary schools. The 32-item Questionnaire on Teacher Interaction (QTI) survey was administered to obtain the initial quantitative data of teacher behaviors perceived by the students in these classrooms. The eight scales of QTI are: Leadership, Helping/Friendly, Understanding, Student Responsibility/ Freedom, Uncertain, Dissatisfied, Admonishing, and Strict. The Rasch measurement model was used to estimate students' traits with respect to each subscale, and then to examine its proposed multidimensional structure. Findings demonstrate overall good fit of the responses with the Rasch model for each subscale. Findings also support the hypothesized relationships among the eight dimensions proposed for the QTI.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we investigate the novel method, penalized joint maximum likelihood estimation (PJMLE), for estimating the parameters of the Rasch model (Rasch, 1960). Here we use joint maximum likelihood estimation (JMLE) along with elastic net penalization using the glmnet package (Friedman, Hastie, and Tibshirani, 2010) in R to obtain estimates for item difficulties and examinee abilities. Through simulation we compared the accuracy of PJMLE to conditional maximum likelihood estimation (CMLE), marginal maximum likelihood estimation (MMLE), and marginal Bayes modal estimation (MBME). We show that PJMLE successfully estimates parameters of a Rasch model when the number of items is greater than the number of examinees, which is a shortcoming of traditional estimation techniques. In addition, we further show that PJMLE performs similarly to traditional techniques when the number of examinees is greater than the number of assessment items without specifying a mixing distribution or a prior distribution.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: This article describes the development and validation of the Attitudes toward Physical Activity Scale (APAS) to measure the attitudes, beliefs, and self-efficacy toward physical activity by children at the primary school level. The framework included: physical fitness, self-efficacy, personal best goal orientation in physical activity, interest in physical activity, importance of physical activity, benefits of physical activity, contributions of video exercise to learning in school subjects, contributions of video exercise to learning about health and environmental support. The sample comprised of 630 school students between grades 1 and 7 from five countries, namely Lithuania (29%), Poland (26%), Serbia (19%), Singapore (16%) and Zimbabwe (11%). Rasch analysis found empirical evidence in support of measurement validity of the APAS in terms of Rasch item reliabilities, unidimensionality, effectiveness of response categories, and absence of gender differential item functioning (DIF). The validation of the APAS according to the Rasch model meant that a dependable tool was established for gauging programme effectiveness of intervention programs on physical activity of primary school children in classroom settings at various geographical locations globally.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Margin is a function of the relationship of stress to strength. The greater the margin, the more likely students are able to successfully navigate academic structures. This study examined the psychometric properties of a newly created instrument designed to measure margin - the Power-Load-Margin Inventory (PLMI). The PLMI was created using eight domains: (A) Student's aptitude and ability, (B) Course structure, (C) External motivation, (D) Student health, (E) Instructor style, (F) Internal motivation, (G) Life opportunities, and (H) University support structure. A three-point response scale was used to measure the domains: (1) stress, (2) neither stress nor strength, and (3) strength. The PLMI was administered to 586 medical, dental, and pharmacy students. A Rasch rating scale model was used to examine the psychometric properties of the PLMI. The PLMI demonstrated acceptable psychometric properties for use with pharmacy, dental, and medical students. The PLMI's primary weakness was with the subscales' reliability. We attribute this to the small number of items per subscale.
    No preview · Article · Jan 2016 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Effectively assessing children’s academic development can help school professionals make placement decisions and prepare appropriate instructional supports. The KeyMath-3 Diagnostic Assessment (Connolly, 2008) is a widely used assessment of children’s mathematical abilities; however, despite much use, the measurement properties of the KeyMath-3 DA have not been examined, aside from the development and standardization phases. The current study conducted a Rasch analysis of the Foundational Concepts content area of the KeyMath-3 DA in a diverse sample of 308 young children to assess the quality of the assessment. Rasch analytic procedures examined unidimensionality, item and person fit statistics, reliability, and item hierarchy. Misfitting items were further examined, and response patterns were modified. In general, results show that the Foundational Concepts subscale is a good measure of the underlying construct of young children’s understanding of the basic concepts in mathematics. Implications are discussed.
    No preview · Article · Dec 2015 · Journal of applied measurement

  • No preview · Article · Sep 2015 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: The most common approach to modelling item discrimination and guessing for multiple-choice questions is the three parameter logistic (3PL) model. However, proponents of Rasch models generally avoid using the 3PL model because to model guessing entails sacrificing the distinctive property and advantages of Rasch models. One approach to dealing with guessing based on the application of Rasch models is to omit responses in which guessing appears to play a significant role. However, this approach entails loss of information and it does not account for variable item discrimination. It has been shown, though, that provided specific constraints are met, it is possible to parameterize discrimination while preserving the distinctive property of Rasch models. This article proposes an approach that uses Rasch models to account for guessing on standard multiple-choice items simply by treating it as a source of low item discrimination. Technical considerations are noted although a detailed examination of such considerations is beyond the scope of this article.
    No preview · Article · Jun 2015 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: Engelhard (1996) proposed a rater accuracy model (RAM) as a means of evaluating rater accuracy in rating data, but very little research exists to determine the efficacy of that model. The RAM requires a transformation of the raw score data to accuracy measures by comparing rater-assigned scores to true scores. Indices computed based on raw scores also exist for measuring rater effects, but these indices ignore deviations of rater-assigned scores from true scores. This paper demonstrates the efficacy of two versions of the RAM (based on dichotomized and polytomized deviations of rater-assigned scores from true scores) to two versions of raw score rater effect models (i.e., a Rasch partial credit model, PCM, and a Rasch rating scale model, RSM). Simulated data are used to demonstrate the efficacy with which these four models detect and differentiate three rater effects: severity, centrality, and inaccuracy. Results indicate that the RAMs are able to detect, but not differentiate, rater severity and inaccuracy, but not rater centrality. The PCM and RSM, on the other hand, are able to both detect and differentiate all three of these rater effects. However, the RSM and PCM do not take into account true scores and may, therefore, be misleading when pervasive trends exist in the rater-assigned data.
    No preview · Article · Jun 2015 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: The latest national science framework has formally stated the need for developing assessments that test both students' content knowledge and scientific practices. In response to this call, a science assessment that consists of (a) content items that measure students' understanding of a grade eight physics topic and (b) argumentation items that measure students' argumentation competency has been developed. This paper investigated the function of these content and argumentation items with a multidimensional measurement framework from two perspectives. First, we performed a dimensionality analysis to investigate whether the relationship between the content and argumentation items conformed to test deign. Second, we conducted a differential item functioning analysis in the multidimensional framework to examine if any content or argumentation item unfairly favored students with an advanced level of English literacy. Methods and findings of this study could inform future research on the validation of assessments measuring higher-order and complex abilities.
    No preview · Article · Jun 2015 · Journal of applied measurement
  • [Show abstract] [Hide abstract]
    ABSTRACT: The main aim of this study was to evaluate whether the construct validity of the Tampa Scale for Kinesiophobia (TSK) is consistent with respect to its scaling properties, unidimensionality and targeting among workers with different levels of pain. The 311 participating Danish workers reported kinesiophobia by TSK (13 statement version) and number of days with pain during the past year (less than 8 days, less than 90 days and greater than 90 days). A Rasch analysis was used to evaluate the measurement properties of the TSK in the workers across pain levels, ages, genders and ethnicities. The TSK did not fit the Rasch model, but removing one item solved the poorness of fit. Invariance was found across the pain levels, ages and genders. Thus, with a few modifications, the TSK was shown to capture a unidimensional construct of fear of movement in workers with different pain levels, ages, and genders.
    No preview · Article · Jun 2015 · Journal of applied measurement