Educational and Psychological Measurement Journal Impact Factor & Information

Publisher: American College Personnel Association; Science Research Associates, SAGE Publications

Journal description

Educational and Psychological Measurement publishes data-based studies in educational measurement, as well as theoretical papers in the measurement field. The journal focuses on discussions of problems in measurement of individual differences, as well as research on the development and use of tests and measurement in education, psychology, industry and government.

Current impact factor: 1.15

Impact Factor Rankings

2015 Impact Factor Available summer 2016
2014 Impact Factor 1.154
2013 Impact Factor 1.167
2012 Impact Factor 1.07
2011 Impact Factor 1.158
2010 Impact Factor 0.831
2009 Impact Factor 0.633
2008 Impact Factor 0.872
2007 Impact Factor 0.831
2006 Impact Factor 0.921
2005 Impact Factor 0.773
2004 Impact Factor 0.756
2003 Impact Factor 0.815
2002 Impact Factor 1.661
2001 Impact Factor 0.789
2000 Impact Factor 0.608
1999 Impact Factor 0.623
1998 Impact Factor 0.618
1997 Impact Factor 0.444
1996 Impact Factor 0.316
1995 Impact Factor 0.317
1994 Impact Factor 0.368
1993 Impact Factor 0.228
1992 Impact Factor 0.324

Impact factor over time

Impact factor

Additional details

5-year impact 1.51
Cited half-life >10.0
Immediacy index 0.23
Eigenfactor 0.00
Article influence 0.89
Website Educational and Psychological Measurement website
Other titles Educational and psychological measurement, EPM
ISSN 0013-1644
OCLC 1567567
Material type Periodical, Internet resource
Document type Journal / Magazine / Newspaper, Internet Resource

Publisher details

SAGE Publications

  • Pre-print
    • Author can archive a pre-print version
  • Post-print
    • Author can archive a post-print version
  • Conditions
    • Authors retain copyright
    • Pre-print on any website
    • Author's post-print on author's personal website, departmental website, institutional website or institutional repository
    • On other repositories including PubMed Central after 12 months embargo
    • Publisher copyright and source must be acknowledged
    • Publisher's version/PDF cannot be used
    • Post-print version with changes from referees comments can be used
    • "as published" final version with layout and copy-editing changes cannot be archived but can be used on secure institutional intranet
    • Must link to publisher version with DOI
    • Publisher last reviewed on 29/07/2015
  • Classification
    ​ green

Publications in this journal

  • [Show abstract] [Hide abstract]
    ABSTRACT: Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model, therefore, the same linear restriction is imposed in both groups. Items in the restriction are termed the ``anchor items''. Ideally, these items are DIF-free to avoid artificially augmented false alarm rates. However, the question how DIF-free anchor items are selected appropriately is still a major challenge. Furthermore, various authors point out the lack of new anchor selection strategies and the lack of a comprehensive study especially for dichotomous IRT models. This article reviews existing anchor selection strategies that do not require any knowledge prior to DIF analysis, offers a straightforward notation, and proposes three new anchor selection strategies. An extensive simulation study is conducted to compare the performance of the anchor selection strategies. The results show that an appropriate anchor selection is crucial for suitable item-wise DIF analysis. The newly suggested anchor selection strategies outperform the existing strategies and can reliably locate a suitable anchor when the sample sizes are large enough.
    Educational and Psychological Measurement 12/2015; 75(1). DOI:10.1177/0013164414529792
  • [Show abstract] [Hide abstract]
    ABSTRACT: This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF studies. In this study, a simulation was conducted based on data from a 60-item ACT Mathematics test (ACT; Hanson & Béguin). The unsigned area measure method (Raju) was used as the DIF detection method. An application to operational data was also completed in the study, as well as a comparison of observed Type I error rates and false discovery rates across the two methods of defining groups. Results indicate that the amount of flagged DIF or interpretations about DIF in all conditions were not the same across the two methods, and there may be some benefits to using composite group approaches. The results are discussed in connection to differing definitions of fairness. Recommendations for practice are made.
    Educational and Psychological Measurement 08/2015; 75(4):648-676. DOI:10.1177/0013164414549764
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interest in using Bayesian methods for estimating item response theory models has grown at a remarkable rate in recent years. This attentiveness to Bayesian estimation has also inspired a growth in available software such as WinBUGS, R packages, BMIRT, MPLUS, and SAS PROC MCMC. This article intends to provide an accessible overview of Bayesian methods in the context of item response theory to serve as a useful guide for practitioners in estimating and interpreting item response theory (IRT) models. Included is a description of the estimation procedure used by SAS PROC MCMC. Syntax is provided for estimation of both dichotomous and polytomous IRT models, as well as a discussion on how to extend the syntax to accommodate more complex IRT models.
    Educational and Psychological Measurement 08/2015; 75(4):585-609. DOI:10.1177/0013164414551411
  • [Show abstract] [Hide abstract]
    ABSTRACT: In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data follow a three-parameter logistic model. With large group ability differences, difficult non-DIF items appeared to favor the focal group and easy non-DIF items appeared to favor the reference group. Correspondingly, the effect sizes for DIF items were biased. These effects were mitigated when data were coded as missing for item–examinee encounters in which the person measure was considerably lower than the item location. Explanation of these results is provided by illustrating how the item response function becomes differentially distorted by guessing depending on the groups’ ability distributions. In terms of practical implications, results suggest that measurement practitioners should not trust the DIF estimates from the Rasch model when there is a large difference in ability and examinees are potentially able to answer items correctly by guessing, unless data from examinees poorly matched to the item difficulty are coded as missing.
    Educational and Psychological Measurement 08/2015; 75(4):610-633. DOI:10.1177/0013164414554082
  • Educational and Psychological Measurement 07/2015; DOI:10.1177/0013164415596421
  • Educational and Psychological Measurement 07/2015; DOI:10.1177/0013164415594658
  • [Show abstract] [Hide abstract]
    ABSTRACT: This article proposes a general parametric item response theory approach for identifying sources of misfit in response patterns that have been classified as potentially inconsistent by a global person-fit index. The approach, which is based on the weighted least squared regression of the observed responses on the model-expected responses, can be used with a variety of unidimensional and multidimensional models intended for binary, graded, and continuous responses and consists of procedures for identifying (a) general deviation trends, (b) local inconsistencies, and (c) single response inconsistencies. A free program called REG-PERFIT that implements most of the proposed techniques has been developed, described, and made available for interested researchers. Finally, the functioning and usefulness of the proposed procedures is illustrated with an empirical study based on a statistics-anxiety scale.
    Educational and Psychological Measurement 07/2015; DOI:10.1177/0013164415594659
  • [Show abstract] [Hide abstract]
    ABSTRACT: Coefficient omega and alpha are both measures of the composite reliability for a set of items. Unlike coefficient alpha, coefficient omega remains unbiased with congeneric items with uncorrelated errors. Despite this ability, coefficient omega is not as widely used and cited in the literature as coefficient alpha. Reasons for coefficient omega’s underutilization include a limited knowledge of its statistical properties. However, consistent efforts to understand the statistical properties of coefficient omega can help improve its utilization in research efforts. Here, six approaches for estimating confidence intervals for coefficient omega with unidimensional congeneric items were evaluated through a Monte Carlo simulation. The evaluations were made through simulation conditions that mimic realistic conditions that investigators are likely to face in applied work, including items that are not normally distributed and small sample size(s). Overall, the normal theory bootstrap confidence interval had the best performance across all simulation conditions that included sample sizes less than 100. However, most methods had sound coverage with sample sizes of 100 or more.
    Educational and Psychological Measurement 07/2015; DOI:10.1177/0013164415593776
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The consequence is that the proficiencies of the more proficient students are increased relative to those of the less proficient. Not controlling the guessing bias underestimates the progress of students across 7 years of schooling with important educational implications.
    Educational and Psychological Measurement 07/2015; DOI:10.1177/0013164415594202
  • [Show abstract] [Hide abstract]
    ABSTRACT: To further understand the properties of data-generation algorithms for multivariate, nonnormal data, two Monte Carlo simulation studies comparing the Vale and Maurelli method and the Headrick fifth-order polynomial method were implemented. Combinations of skewness and kurtosis found in four published articles were run and attention was specifically paid to the quality of the sample estimates of univariate skewness and kurtosis. In the first study, it was found that the Vale and Maurelli algorithm yielded downward-biased estimates of skewness and kurtosis (particularly at small samples) that were also highly variable. This method was also prone to generate extreme sample kurtosis values if the population kurtosis was high. The estimates obtained from Headrick’s algorithm were also biased downward, but much less so than the estimates obtained through Vale and Maurelli and much less variable. The second study reproduced the first simulation in the Curran, West, and Finch article using both the Vale and Maurelli method and the Heardick method. It was found that the chi-square values and empirical rejection rates changed depending on which data-generation method was used, sometimes sufficiently so that some of the original conclusions of the authors would no longer hold. In closing, recommendations are presented regarding the relative merits of each algorithm.
    Educational and Psychological Measurement 07/2015; 75(4):541–567. DOI:10.1177/0013164414548894
  • Educational and Psychological Measurement 06/2015; DOI:10.1177/0013164415584576
  • [Show abstract] [Hide abstract]
    ABSTRACT: Latent transition analysis (LTA) was initially developed to provide a means of measuring change in dynamic latent variables. In this article, we illustrate the use of a cognitive diagnostic model, the DINA model, as the measurement model in a LTA, thereby demonstrating a means of analyzing change in cognitive skills over time. An example is presented of an instructional treatment on a sample of seventh-grade students in several classrooms in a Midwestern school district. In the example, it is demonstrated how hypotheses could be framed and then tested regarding the form of the change in different groups within the population. Both manifest and latent groups also are defined and used to test additional hypotheses about change specific to particular subpopulations. Results suggest that the use of a DINA measurement model expands the utility of LTA to practical problems in educational measurement research.
    Educational and Psychological Measurement 06/2015; DOI:10.1177/0013164415588946