Psychological Methods

Published by American Psychological Association
Online ISSN: 1939-1463
Publications
Possible Attribute Patterns 
Article
Cognitive diagnosis models are constrained (multiple classification) latent class models that characterize the relationship of questionnaire responses to a set of dichotomous latent variables. Having emanated from educational measurement, several aspects of such models seem well suited to use in psychological assessment and diagnosis. This article presents the development of a new cognitive diagnosis model for use in psychological assessment--the DINO (deterministic input; noisy "or" gate) model--which, as an illustrative example, is applied to evaluate and diagnose pathological gamblers. As part of this example, a demonstration of the estimates obtained by cognitive diagnosis models is provided. Such estimates include the probability an individual meets each of a set of dichotomous Diagnostic and Statistical Manual of Mental Disorders (text revision [DSM-IV-TR]; American Psychiatric Association, 2000) criteria, resulting in an estimate of the probability an individual meets the DSM-IV-TR definition for being a pathological gambler. Furthermore, a demonstration of how the hypothesized underlying factors contributing to pathological gambling can be measured with the DINO model is presented, through use of a covariance structure model for the tetrachoric correlation matrix of the dichotomous latent variables representing DSM-IV-TR criteria.
 
Article
Cluster randomized trials (CRTs) have been widely used in field experiments treating a cluster of individuals as the unit of randomization. This study focused particularly on situations where CRTs are accompanied by a common complication, namely, treatment noncompliance or, more generally, intervention nonadherence. In CRTs, compliance may be related not only to individual characteristics but also to the environment of clusters individuals belong to. Therefore, analyses ignoring the connection between compliance and clustering may not provide valid results. Although randomized field experiments often suffer from both noncompliance and clustering of the data, these features have been studied as separate rather than concurrent problems. On the basis of Monte Carlo simulations, this study demonstrated how clustering and noncompliance may affect statistical inferences and how these two complications can be accounted for simultaneously. In particular, the effect of the intervention on individuals who not only were assigned to active intervention but also abided by this intervention assignment (complier average causal effect) was the focus. For estimation of intervention effects considering noncompliance and data clustering, an ML-EM estimation method was employed.
 
Article
In response to N. Cliff and J. C. Caruso (1998), the author clarifies that it is the sum of the reliabilities of the components that remains invariant under rotation in reliable component analysis.
 
Article
Evidence of group matching frequently takes the form of a nonsignificant test of statistical difference. Theoretical hypotheses of no difference are also tested in this way. These practices are flawed in that null hypothesis statistical testing provides evidence against the null hypothesis and failing to reject H-sub-0 is not evidence supportive of it. Tests of statistical equivalence are needed. This article corrects the inferential confidence interval (ICI) reduction factor introduced by W. W. Tryon (2001) and uses it to extend his discussion of statistical equivalence. This method is shown to be algebraically equivalent with D. J. Schuirmann's (1987) use of 2 one-sided t tests, a highly regarded and accepted method of testing for statistical equivalence. The ICI method provides an intuitive graphic method for inferring statistical difference as well as equivalence. Trivial difference occurs when a test of difference and a test of equivalence are both passed. Statistical indeterminacy results when both tests are failed. Hybrid confidence intervals are introduced that impose ICI limits on standard confidence intervals. These intervals are recommended as replacements for error bars because they facilitate inferences.
 
Simulated Efficiency for Estimators of Mean Correlation, by Mean Sample Size, Meta-Analytic Approach and Estimand, Nominal Mean Correlation, and Number of Studies: Heterogeneous Case k RMSE, by n , approach, and Hz estimand Relative efficiency, by n and Hz estimand n 20 n 40 n 80 n 160 n 20 n 40 n 80 n 160 Hz Hz n Sr Hz Hz n Sr Hz Hz n Sr Hz Hz n Sr Hz Hz n Hz Hz n Hz Hz n Hz Hz n 
Article
In 2 Monte Carlo studies of fixed- and random-effects meta-analysis for correlations, A. P. Field (2001) ostensibly evaluated Hedges-Olkin-Vevea Fisher-z and Schmidt-Hunter Pearson-r estimators and tests in 120 conditions. Some authors have cited those results as evidence not to meta-analyze Fisher-z correlations, especially with heterogeneous correlation parameters. The present attempt to replicate Field's simulations included comparisons with analytic values as well as results for efficiency and confidence-interval coverage. Field's results under homogeneity were mostly replicable, but those under heterogeneity were not: The latter exhibited up to over .17 more bias than ours and, for tests of the mean correlation and homogeneity, respectively, nonnull rejection rates up to .60 lower and .65 higher. Changes to Field's observations and conclusions are recommended, and practical guidance is offered regarding simulation evidence and choices among methods. Most cautions about poor performance of Fisher-z methods are largely unfounded, especially with a more appropriate z-to-r transformation. The Appendix gives a computer program for obtaining Pearson-r moments from a normal Fisher-z distribution, which is used to demonstrate distortion due to direct z-to-r transformation of a mean Fisher-z correlation.
 
Article
P. E. Meehl and N. G. Waller's (2002) proposed method may not yield unique solutions for model parameters nor unique solutions for model lack of fit. The author argues from a naturalistic-cognitive philosophy of science that science seeks objective knowledge and that hypothesis testing is central to achieving that goal. It is also argued that P. E. Meehl and N. G. Waller's proposal blurs the distinction between hypothesis testing and explorations of the data seeking an optimal model to serve as a prospective inductive generalization. But it is noted that inductive generalizations are never unique and must be tested to eliminate those that reflect subjective aspects of the researcher's methods and points of view.
 
Article
P. E. Meehl and N. G. Waller (2002) proposed an innovative method for assessing path analysis models wherein they subjected a given model, along with a set of alternatives, to risky tests using selected elements of a sample correlation matrix. Although the authors find much common ground with the perspective underlying the Meehl-Waller approach, they suggest that there are aspects of the proposed procedure that require close examination and further development. These include the selection of only one subset of correlations to estimate parameters when multiple solutions are generally available, the fact that the risky tests may test only a subset of parameters rather than the full model of interest, and the potential for different results to be obtained from analysis of equivalent models.
 
Article
In their criticism of B. E. Wampold and R. C. Serlin's analysis of treatment effects in nested designs, M. Siemer and J. Joormann argued that providers of services should be considered a fixed factor because typically providers are neither randomly selected from a population of providers nor randomly assigned to treatments, and statistical power to detect treatment effects is greater in the fixed than in the mixed model. The authors of the present article argue that if providers are considered fixed, conclusions about the treatment must be conditioned on the specific providers in the study, and they show that in this case generalizing beyond these providers incurs inflated Type I error rates.
 
Article
The authors disagree with M. Siemer and J. Joormann's assertion that therapist should be a fixed effect in psychotherapy treatment outcome studies. If treatment is properly standardized, therapist effects can be examined in preliminary tests and the therapist term deleted from analyses if such differences approach zero. If therapist effects are anticipated and either cannot be minimized through standardization or are specifically of interest because of the nature of the research question, the study has to be planned with adequate statistical power for including therapist as a random term. Simulation studies conducted by Siemer and Joormann confounded bias due to small sample size and inconsistent estimates.
 
Article
D. J. Bauer and P. J. Curran (2003) raised some interesting issues with respect to mixture models of growth curves. Many useful lessons can be learned from their work, and more can be learned by extending the inquiry in related directions. These lessons involve the following issues: (a) what a mixture distribution looks like, (b) the meaning of the term homogeneous distribution, (c) the importance of model checking, (d) advantages and disadvantages of using mixtures and similar procedures to approximate complicated distributions, and (e) intrinsic versus nonintrinsic transformability.
 
Article
This commentary discusses the D. J. Bauer and P. J. Curran (see record 2003-09632-007 ) investigation of growth mixture modeling. Single-class modeling of nonnormal outcomes is compared with modeling with multiple latent trajectory classes. New statistical tests of multiple-class models are discussed. Principles for substantive investigation of growth mixture model results are presented and illustrated by an example of high school dropout predicted by low mathematics achievement development in Grades 7-10.
 
Article
D. J. Bauer and P. J. Curran (2003) cautioned that results obtained from growth mixture models may sometimes be inaccurate. The problem they addressed occurs when a growth mixture model is applied to a single, general population of individuals but findings incorrectly support the conclusion that there are 2 subpopulations. In an artificial sampling experiment, they showed that this can occur when the variables in the population have a nonnormal distribution. A realistic perspective is that although a healthy skepticism to complex statistical results is appropriate, there are no true models to discover. Consequently, the issue of model misspecification is irrelevant in practical terms. The purpose of a mathematical model is to summarize data, to formalize the dynamics of a behavioral process, and to make predictions. All of this is scientifically valuable and can be accomplished with a carefully developed model, even though the model is false.
 
Article
In their comments on the authors' article, R. C. Serlin, B. E. Wampold, and J. R. Levin and P. Crits-Christoph, X. Tu, and R. Gallop took issue with the authors' suggestion to evaluate therapy studies with nested providers with a fixed model approach. In this rejoinder, the authors' comment on Serlin et al's critique by showing that their arguments do not apply, are based on misconceptions about the purpose and nature of statistical inference, or are based on flawed reasoning. The authors also comment on Crits-Christoph et al's critique by showing that the proposed approach is very similar to, but less inclusive than, their own suggestion.
 
Article
In his article, "An alternative to null-hypothesis significance tests," Killeen (2005) urged the discipline to abandon the practice of p obs-based null hypothesis testing and to quantify the signal-to-noise characteristics of experimental outcomes with replication probabilities. He described the coefficient that he invented, prep, as the probability of obtaining "an effect of the same sign as that found in an original experiment" (Killeen, 2005, p. 346). The journal Psychological Science quickly came to encourage researchers to employ prep, rather than p obs, in the reporting of their experimental findings. In the current article, we (a) establish that Killeen's derivation of prep contains an error, the result of which is that prep is not, in fact, the probability that Killeen set out to derive; (b) establish that prep is not a replication probability of any kind but, rather, is a quasi-power coefficient; and (c) suggest that Killeen has mischaracterized both the relationship between replication probabilities and statistical inference, and the kinds of claims that are licensed by knowledge of the value assumed by the replication probability that he attempted to derive.
 
Article
I. Klugkist, O. Laudy, and H. Hoijtink (2005) presented a Bayesian approach to analysis of variance models with inequality constraints. Constraints may play 2 distinct roles in data analysis. They may represent prior information that allows more precise inferences regarding parameter values, or they may describe a theory to be judged against the data. In the latter case, the authors emphasized the use of Bayes factors and posterior model probabilities to select the best theory. One difficulty is that interpretation of the posterior model probabilities depends on which other theories are included in the comparison. The posterior distribution of the parameters under an unconstrained model allows one to quantify the support provided by the data for inequality constraints without requiring the model selection framework.
 
Article
The authors applaud A. S. Green, E. Rafaeli, N. Bolger, P. E. Shrout, and H. T. Reis's (2006) response to one-sided comparisons of paper versus electronic (plastic) diary methods and hope that it will stimulate more balanced considerations of the issues involved. The authors begin by highlighting areas of agreement and disagreement with Green et al. The authors review briefly the broader literature that has compared paper and plastic diaries, noting how recent comparisons have relied on study designs and methods that favor investigators' allegiances. The authors note some sorely needed data for the evaluation of the implications of paper versus plastic for the internal and external validity of research. To facilitate evaluation of the existing literature and assist in the design of future studies, the authors offer a balanced comparison of paper and electronic diary methods across a range of applications. Finally, the authors propose 2 study designs that offer fair comparisons of paper and plastic diary methods.
 
Article
In a recent article, A. Maydeu-Olivares and D. L. Coffman (2006) presented a random intercept factor approach for modeling idiosyncratic response styles in questionnaire data and compared this approach with competing confirmatory factor analysis models. Among the competing models was the CT-C(M-1) model (M. Eid, 2000). In an application to the Life Orientation Test (M. F. Scheier & C. S. Carver, 1985), Maydeu-Olivares and Coffman found that results obtained from the CT-C(M-1) model were difficult to interpret. In particular, Maydeu-Olivares and Coffman challenged the asymmetry of the CT-C(M-1) model. In the present article, the authors show that the difficulties faced by Maydeu-Olivares and Coffman rest upon an improper interpretation of the meaning of the latent factors. The authors' aim is to clarify the meaning of the latent variables in the CT-C(M-1) model. The authors explain how to properly interpret the results from this model and introduce an alternative restricted model that is conceptually similar to the CT-C(M-1) model and nested within it. The fit of this model is invariant across different reference methods. Finally, the authors provide guidelines as to which model should be used in which research context.
 
Article
This commentary discusses 4 issues relevant to interpretation of A. S. Green, E. Rafaeli, N. Bolger, P. E. Shrout, and H. T. Reis's (2006) article: (a) Self-reported compliance in medical settings has generally been substantially higher than verified compliance, suggesting that this is not a rare phenomenon; (b) none of the studies reported in Green et al. explicitly verified paper diary compliance; (c) the impact of participant motivation on diary compliance is unknown, and it may be difficult for researchers to accurately assess it in their own studies; and (d) without objective verification of diary compliance, analysis of the effects of noncompliance on data quality is difficult to interpret. The authors conclude that compliance in paper diaries and the effects of noncompliance on data quality are still unsettled issues.
 
Article
In this commentary, the authors discuss the implications of A. S. Green, E. Rafaeli, N. Bolger, P. E. Shrout, and H. T. Reis's (2006) diary studies with respect to memory. Researchers must take 2 issues into account when determining whether paper-and-pencil or handheld electronic diaries gather more trustworthy data. The first issue is a matter of prospective memory, and the second is a matter of reconstructive memory. The authors review the research on these issues and conclude that regardless of the type of diary researchers use, several factors can conspire to produce prompt--but inaccurate--data.
 
Article
D. Howell, E. Breivik, and J. B. Wilcox (2007) have presented an important and interesting analysis of formative measurement and have recommended that researchers abandon such an approach in favor of reflective measurement. The author agrees with their recommendations but disagrees with some of the bases for their conclusions. He suggests that although latent variables refer to mental states or mental events that have objective reality, to gain knowledge of the existence of these states or events requires that emphasis be placed on the nature and interpretation of the relationship between latent and manifest variables. This relationship is not a causal one but rather a kind of correspondence rule that contains theoretical, empirical, operational, and logical meanings as part of its content and structure. Implications of the above views are discussed for formative and reflective measurement.
 
Article
Rater biases are of interest to behavior genetic researchers, who often use ratings data as a basis for studying heritability. Inclusion of multiple raters for each sibling pair (M. Bartels, D. I. Boomsma, J. J. Hudziak, T. C. E. M. van Beijsterveldt, & E. J. C. G. van den Oord, see record 2007-18729-006) is a promising strategy for controlling bias variance and may yield information about sources of bias in heritability studies. D. A. Kenny's (2004) PERSON model is presented as a framework for understanding determinants of rating reliability and validity. Empirical findings on rater bias in other contexts provide a starting point for addressing the impact of rater-unique perceptions in heritability studies. However, heritability studies use distinctive rating designs that may accentuate some sources of bias, such as rater communication and contrast effects, which warrant further study.
 
Two models to illustrate interpretational confounding with causal indicators (x 1 , x 2 ), latent variables ( 1 , 2 ), outcome variables (y 1 , y 2 ), and disturbances ( 1 to 4 ).
Causal indicators (x 1 , x 2 ), latent variable ( 1* ), and disturbances ( 1*-3* ) with different outcome variables (y 1 or y 2 ).
Article
R. D. Howell, E. Breivik, and J. B. Wilcox (2007) have argued that causal (formative) indicators are inherently subject to interpretational confounding. That is, they have argued that using causal (formative) indicators leads the empirical meaning of a latent variable to be other than that assigned to it by a researcher. Their critique of causal (formative) indicators rests on several claims: (a) A latent variable exists apart from the model when there are effect (reflective) indicators but not when there are causal (formative) indicators, (b) causal (formative) indicators need not have the same consequences, (c) causal (formative) indicators are inherently subject to interpretational confounding, and (d) a researcher cannot detect interpretational confounding when using causal (formative) indicators. This article shows that each claim is false. Rather, interpretational confounding is more a problem of structural misspecification of a model combined with an underidentified model that leaves these misspecifications undetected. Interpretational confounding does not occur if the model is correctly specified whether a researcher has causal (formative) or effect (reflective) indicators. It is the validity of a model not the type of indicator that determines the potential for interpretational confounding.
 
Article
Reports a clarification to "Psychometric approaches for developing commensurate measures across independent studies: Traditional and new models" by Daniel J. Bauer and Andrea M. Hussong (Psychological Methods, 2009[Jun], Vol 14[2], 101-125). In this article, the authors wrote, "To our knowledge, the multisample framework is the only available option within these [latent variable] programs that allows for the moderation of all types of parameters, and this approach requires a single categorical moderator variable to define the samples." Bengt Muthén has clarified for the authors that some programs, including Mplus and Mx, can allow for continuous moderation through the implementation of nonlinear constraints involving observed variables, further enlarging the class of MNLFA models that can be fit with these programs. (The following abstract of the original article appeared in record 2009-08072-001.) When conducting an integrative analysis of data obtained from multiple independent studies, a fundamental problem is to establish commensurate measures for the constructs of interest. Fortunately, procedures for evaluating and establishing measurement equivalence across samples are well developed for the linear factor model and commonly used item response theory models. A newly proposed moderated nonlinear factor analysis model generalizes these models and procedures, allowing for items of different scale types (continuous or discrete) and differential item functioning across levels of categorical and/or continuous variables. The potential of this new model to resolve the problem of measurement in integrative data analysis is shown via an empirical example examining changes in alcohol involvement from ages 10 to 22 years across 2 longitudinal studies. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
 
Article
This article offers reflections on the development of the Rubin causal model (RCM), which were stimulated by the impressive discussions of the RCM and Campbell's superb contributions to the practical problems of drawing causal inferences written by Will Shadish (2010) and Steve West and Felix Thoemmes (2010). It is not a rejoinder in any real sense but more of a sequence of clarifications of parts of the RCM combined with some possibly interesting personal historical comments, which I do not think can be found elsewhere. Of particular interest in the technical content, I think, are the extended discussions of the stable unit treatment value assumption, the explication of the variety of definitions of causal estimands, and the discussion of the assignment mechanism.
 
Article
In Shadish (2010) and West and Thoemmes (2010), the authors contrasted 2 approaches to causality. The first originated in the psychology literature and is associated with work by Campbell (e.g., Shadish, Cook, & Campbell, 2002), and the second has its roots in the statistics literature and is associated with work by Rubin (e.g., Rubin, 2006). In this article, I discuss some of the issues raised by Shadish and by West and Thoemmes. I focus mostly on the impact the 2 approaches have had on research in a 3rd field, economics. In economics, the ideas of both Campbell and Rubin have been very influential, with some of the methods they developed now routinely taught in graduate programs and routinely used in empirical work and other methods receiving much less attention. At the same time, economists have added to the understanding of these methods and through these extensions have further improved researchers' ability to draw causal inferences in observational studies.
 
Article
This comment offers three descriptions of prep that start with a frequentist account of confidence intervals, draw on R. A. Fisher's fiducial argument, and do not make Bayesian assumptions. Links are described among prep, p values, and the probability a confidence interval will capture the mean of a replication experiment. The descriptions suggest the criticism of Maraun and Gabriel (2010) is unjustified. Iverson, Wagenmakers, and Lee (2010) discussed prep in terms of Bayesian model averaging. This went usefully beyond the dichotomous decision making of significance testing, but an extension to Bayesian estimation would be welcome. Lecoutre, Lecoutre, and Poitevineau (2010) referred to and extended their substantial research based on predictive approaches. Some of the links they make among p values, confidence intervals, and prep parallel links described earlier, although their conceptual framework is different. The interesting prep experiment in Psychological Science may be coming to a close; it suggests that statistical innovation, including that proposed by Iverson et al. (2010) and Lecoutre et al. (2010), is likely to be most successful if guided by cognitive evidence and supported by resources tailored for researchers generally.
 
Article
Lecoutre, Lecoutre, and Poitevineau (2010) have provided sophisticated grounding for prep. Computing it precisely appears, fortunately, no more difficult than doing so approximately. Their analysis will help move predictive inference into the mainstream. Iverson, Wagenmakers, and Lee (2010) have also validated prep as a boundary condition in a model-averaging approach. I argue that the boundary is good; that the proper place for their assignment of subjective priors is to update personal belief; and that such assignment has no place in the evaluation of evidence, on which priors should be flat. Maraun and Gabriel (2010) have provided clarification, context, and critique of the derivation of prep. They concluded that prediction can never be precise without knowledge of population parameters and joint empirical distributions; that nature evolves; that a large initial effect will encourage replication attempts, which are therefore not independent of it; and that there is no substitute for a sustained program of replication attempts-all of which are prudent cautions. If statistics is to be the handmaiden of science and policy, however, rather than history, it ineluctably must speak to the future. The posterior predictive distribution is an underexploited tool in the analyst's kit that will serve that end, and prep is but one valid implementation of it.
 
Article
The sense that replicability is an important aspect of empirical science led Killeen (2005a) to define prep, the probability that a replication will result in an outcome in the same direction as that found in a current experiment. Since then, several authors have praised and criticized prep, culminating in the 3 articles in the current issue of Psychological Methods. In this article, Killeen's prep is reviewed, and the contributions of the current articles are summarized and discussed. An examination of the role of a measure of theoretical support such as prep in the acquisition of knowledge leads me to concur with Senn (2002) that prep is of little epistemological value.
 
Article
McLachlan (2011) and Vermunt (2011) each provided thoughtful replies to our original article (Steinley & Brusco, 2011). This response serves to incorporate some of their comments while simultaneously clarifying our position. We argue that greater caution against overparamaterization must be taken when assuming that clusters are highly elliptical in nature. Specifically, users of mixture model clustering techniques should be wary of overreliance on fit indices, and the importance of cross-validation is highlighted. Additionally, we note that K-means clustering is part of a larger family of discrete partitioning algorithms, many of which are designed to solve problems identical to those for which mixture modeling approaches are often touted. (PsycINFO Database Record (c) 2011 APA, all rights reserved).
 
Generated S5 data set.  
Generated S9(x 1 –x 2 ) data set.  
Article
Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.
 
Article
I discuss the recommendations and cautions in Steinley and Brusco's (2011) article on the use of finite models to cluster a data set. In their article, much use is made of comparison with the K-means procedure. As noted by researchers for over 30 years, the K-means procedure can be viewed as a special case of finite mixture modeling in which the components are in equal (fixed) proportions and are taken to be normal with a common spherical covariance matrix. In this commentary, I pay particular attention to this link and to the use of normal mixture models with arbitrary component-covariance matrices.
 
Article
Reports an error in "Many tests of significance: New methods for controlling type I errors" by H. J. Keselman, Charles W. Miller and Burt Holland (Psychological Methods, 2011[Dec], Vol 16[4], 420-431). The R code for arriving at adjusted p values for one of the methods is incorrect. The specific changes that need to be made are provided in the erratum. (The following abstract of the original article appeared in record 2011-24639-001.) There have been many discussions of how Type I errors should be controlled when many hypotheses are tested (e.g., all possible comparisons of means, correlations, proportions, the coefficients in hierarchical models, etc.). By and large, researchers have adopted familywise (FWER) control, though this practice certainly is not universal. Familywise control is intended to deal with the multiplicity issue of computing many tests of significance, yet such control is conservative-that is, less powerful-compared to per test/hypothesis control. The purpose of our article is to introduce the readership, particularly those readers familiar with issues related to controlling Type I errors when many tests of significance are computed, to newer methods that provide protection from the effects of multiple testing, yet are more powerful than familywise controlling methods. Specifically, we introduce a number of procedures that control the k-FWER. These methods-say, 2-FWER instead of 1-FWER (i.e., FWER)-are equivalent to specifying that the probability of 2 or more false rejections is controlled at .05, whereas FWER controls the probability of any (i.e., 1 or more) false rejections at .05. 2-FWER implicitly tolerates 1 false rejection and makes no explicit attempt to control the probability of its occurrence, unlike FWER, which tolerates no false rejections at all. More generally, k-FWER tolerates k - 1 false rejections, but controls the probability of k or more false rejections at α =.05. We demonstrate with two published data sets how more hypotheses can be rejected with k-FWER methods compared to FWER control. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
 
Article
It is well documented that studies reporting statistically significant results are more likely to be published than are studies reporting nonsignificant results--a phenomenon called publication bias. Publication bias in meta-analytic reviews should be identified and reduced when possible. Ferguson and Brannick (2012) argued that the inclusion of unpublished articles is ineffective and possibly counterproductive as a means of reducing publication bias in meta-analyses. We show how idiosyncratic choices on the part of Ferguson and Brannick led to an erroneous conclusion. We demonstrate that their key finding--that publication bias was more likely when unpublished studies were included--may be an artifact of the way they assessed publication bias. We also point out how the lack of transparency about key choices and the absence of information about critical features of Ferguson and Brannick's sample and procedures might have obscured readers' ability to assess the validity of their claims. Furthermore, we demonstrate that many of the claims they made are without empirical support, even though they could have tested these claims empirically, and that these claims may be misleading. With their claim that addressing publication bias introduces subjectivity and bias into meta-analysis, they ignored a large body of evidence showing that including unpublished studies that meet the inclusion criteria of a meta-analysis decreases (rather than increases) publication bias. Rather than exclude unpublished studies, we recommend that meta-analysts code study characteristics related to methodological quality (e.g., experimental vs. nonexperimental design) and test whether these factors influence the meta-analytic results.
 
Article
Muthén and Asparouhov (2012) made a strong case for the advantages of Bayesian methodology in factor analysis and structural equation models. I show additional extensions and adaptations of their methods and show how non-Bayesians can take advantage of many (though not all) of these advantages by using interval restrictions on parameters. By keeping parameters restricted to intervals (such as loadings between -.3 and .3 to produce small loadings), frequentists using standard structural equation modeling software can do something similar to what a Bayesian does by putting prior distributions on these parameters. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
 
Article
This rejoinder discusses the general comments on how to use Bayesian structural equation modeling (BSEM) wisely and how to get more people better trained in using Bayesian methods. Responses to specific comments cover how to handle sign switching, nonconvergence and nonidentification, and prior choices in latent variable models. Two new applications are included. The first one revisits the Kaplan (2009) science model by considering priors on primary parameters. The second one applies BSEM to the bifactor model that was hypothesized in the original Holzinger and Swineford (1939) study. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
 
Article
Reports an error in "A mixed model approach to meta-analysis of diagnostic studies with binary test outcome" by Philipp Doebler, Heinz Holling and Dankmar Böhning (Psychological Methods, 2012[Sep], Vol 17[3], 418-436). For the article, Drs. Daming Lin of the Dalla Lana School of Public Health and George Tomlinson of Toronto General Hospital and the Dalla Lana School of Public Health noted an error in the final version of Equations 6 and 7 on page 423. Dr. Doebler, in a conversation with the Interim Editor, acknowledged the error in the printing of the equations. Dr. Doebler also checked and assured the Interim Editor that the R-code that generated the substantive results for the paper were correctly coded and are identical to the R-code that would result from the derivations suggested by Drs. Lin and Tomlinson and is provided. (The following abstract of the original article appeared in record 2012-12662-001.) We propose 2 related models for the meta-analysis of diagnostic tests. Both models are based on the bivariate normal distribution for transformed sensitivities and false-positive rates. Instead of using the logit as a transformation for these proportions, we employ the tα family of transformations that contains the log, logit, and (approximately) the complementary log. A likelihood ratio test for the cutoff value problem is developed, and summary receiver operating characteristic (SROC) curves are discussed. Worked examples showcase the methodology. We compare the models to the hierarchical SROC model, which in contrast employs a logit transformation. Data from various meta-analyses are reanalyzed, and the reanalysis indicates a better performance of the models based on the tα transformation. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
 
Article
Rennie (2012) made the claim that, despite their diversity, all qualitative methods are essentially hermeneutical, and he attempted to back up that claim by demonstrating that certain core steps that he called hermeneutical are contained in all of the other methods despite their self-interpretation. In this article, I demonstrate that the method I developed based upon Husserlian phenomenology cannot be so interpreted despite Rennie's effort to do so. I claim that the undertaking of a psychological investigation at large can be considered interpretive but that when the phenomenological method based upon Husserl is employed, it is descriptive. I also object to the attempt to reduce varied theoretical perspectives to the methodical steps of one of the competing theories. Reducing theoretical perspectives to core steps distorts the full value of the theoretical perspective. The last point is demonstrated by showing how the essence of the descriptive phenomenological method is missed if one follows Rennie's core steps. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
 
Article
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)² and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First, strictly speaking, MSD should not be compared to SEM because they measure different things, have different assumptions, and capture different sources of errors. Second, the related proof and conclusions in Barchard hold only under the assumptions of equal reliabilities, homogeneous variances, and independent measurement errors. To address the limitations, we propose that MSD should be compared to the standard error of measurement of difference scores (SEMx-y) so that the comparison can be extended to the conditions when 2 tests have unequal reliabilities and score variances.
 
Parameter Estimates of the Discrete Time Autoregressive Cross- Lagged Model of Anomia and Authoritarianism Across Four Time Intervals Parameter Estimate SE
Bivariate discrete time autoregressive cross-lagged model of authoritarianism (au) and anomia (an). Squares represent observed (manifest) variables, circles/ellipses represent latent variables. For reasons of simplicity, all measurement errors were fixed to zero in the present example (dashed circles), making it a model with only manifest variables. However, as the figure illustrates, the model can be easily extended to latent variables. The triangle to the right represents the constant 1. Its path coefficients (single-headed arrows) represent the means or intercepts of the variables in question. Path coefficients associated with dashed lines are all fixed to 1, while path coefficients associated with solid lines are freely estimated but are usually constrained to other parameters as described in the text. Double-headed arrows indicate covariances.  
Article
Reports an error in "An SEM approach to continuous time modeling of panel data: Relating authoritarianism and anomia" by Manuel C. Voelkle, Johan H. L. Oud, Eldad Davidov and Peter Schmidt (Psychological Methods, 2012[Jun], Vol 17[2], 176-192). The supplemental materials link was missing. All versions of this article have been corrected. (The following abstract of the original article appeared in record 2012-09124-001.) Panel studies, in which the same subjects are repeatedly observed at multiple time points, are among the most popular longitudinal designs in psychology. Meanwhile, there exists a wide range of different methods to analyze such data, with autoregressive and cross-lagged models being 2 of the most well known representatives. Unfortunately, in these models time is only considered implicitly, making it difficult to account for unequally spaced measurement occasions or to compare parameter estimates across studies that are based on different time intervals. Stochastic differential equations offer a solution to this problem by relating the discrete time model to its underlying model in continuous time. It is the goal of the present article to introduce this approach to a broader psychological audience. A step-by-step review of the relationship between discrete and continuous time modeling is provided, and we demonstrate how continuous time parameters can be obtained via structural equation modeling. An empirical example on the relationship between authoritarianism and anomia is used to illustrate the approach. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
 
Article
Reports an error in "Mediation analysis allowing for exposure-mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros" by Linda Valeri and Tyler J. VanderWeele (Psychological Methods, 2013[Jun], Vol 18[2], 137-150). The technical appendix was missing from the original supplemental materials. The appendix has been added to the supplemental materials. (The following abstract of the original article appeared in record 2013-03476-001.) Mediation analysis is a useful and widely employed approach to studies in the field of psychology and in the social and biomedical sciences. The contributions of this article are several-fold. First we seek to bring the developments in mediation analysis for nonlinear models within the counterfactual framework to the psychology audience in an accessible format and compare the sorts of inferences about mediation that are possible in the presence of exposure-mediator interaction when using a counterfactual versus the standard statistical approach. Second, the work by VanderWeele and Vansteelandt (2009, 2010) is extended here to allow for dichotomous mediators and count outcomes. Third, we provide SAS and SPSS macros to implement all of these mediation analysis techniques automatically, and we compare the types of inferences about mediation that are allowed by a variety of software macros. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
 
Article
Although primary studies often report multiple outcomes, the covariances between these outcomes are rarely reported. This leads to difficulties when combining studies in a meta-analysis. This problem was recently addressed with the introduction of robust variance estimation. This new method enables the estimation of meta-regression models with dependent effect sizes, even when the dependence structure is unknown. Although robust variance estimation has been shown to perform well when the number of studies in the meta-analysis is large, previous simulation studies suggest that the associated tests often have Type I error rates that are much larger than nominal. In this article, I introduce 6 estimators with better small sample properties and study the effectiveness of these estimators via 2 simulation studies. The results of these simulations suggest that the best estimator involves correcting both the residuals and degrees of freedom used in the robust variance estimator. These studies also suggest that the degrees of freedom depend on not only the number of studies but also the type of covariates in the meta-regression. The fact that the degrees of freedom can be small, even when the number of studies is large, suggests that these small-sample corrections should be used more generally. I conclude with an example comparing the results of a meta-regression with robust variance estimation with the results from the corrected estimator. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
 
Article
A broad theory of scientific method is sketched that has particular relevance for the behavioral sciences. This theory of method assembles a complex of specific strategies and methods that are used in the detection of empirical phenomena and the subsequent construction of explanatory theories. A characterization of the nature of phenomena is given, and the process of their detection is briefly described in terms of a multistage model of data analysis. The construction of explanatory theories is shown to involve their generation through abductive, or explanatory, reasoning, their development through analogical modeling, and their fuller appraisal in terms of judgments of the best of competing explanations. The nature and limits of this theory of method are discussed in the light of relevant developments in scientific methodology.
 
Article
In accelerated longitudinal design, one samples multiple age cohorts and then collects longitudinal data on members of each cohort. The aim is to study age-outcome trajectories over a broad age span during a study of short duration. A threat to valid inference is the Age x Cohort interaction effect. S. W. Raudenbush and W. S. Chan (1993) developed a test for such interactions in the context of 2 cohorts by using a hierarchical model. The current article extends this approach to include any number of cohorts. Using the National Youth Survey, the authors combine data collected on 7 cohorts over 5 years to approximate change in antisocial attitudes between 11 and 21 years of age. They show how to test for cohort differences in trajectories, how to calculate the power of the test, and how to use graphical procedures to aid understanding. The approach allows unbalanced designs and the clustering of participants within families, neighborhoods, or other social units.
 
Article
The proportion of studies that use one-tailed statistical significance tests (pi) in a population of studies targeted by a meta-analysis can affect the bias of the sample effect sizes (sample ESs, or ds) that are accessible to the meta-analyst. H. C. Kraemer, C. Gardner, J. O. Brooks, and J. A. Yesavage (1998) found that, assuming pi = 1.0, for small studies (small Ns) the overestimation bias was large for small population ESs (delta < or = 0.2) and reached a maximum for the smallest population ES (viz., delta = 0). The present article shows (with a minor modification of H. C. Kraemer et al.'s model) that when pi = 0, the small-N bias of accessible sample ESs is relatively small for delta < or = 0.2, and a minimum (in fact, nonexistent) for delta = 0. Implications are discussed for interpretations of meta-analyses of (a) therapy efficacy and therapy effectiveness studies, (b) comparative outcome studies, and (c) studies targeting small but important population ESs.
 
Article
Latent state-trait (LST) analysis is frequently applied in psychological research to determine the degree to which observed scores reflect stable person-specific effects, effects of situations and/or person-situation interactions, and random measurement error. Most LST applications use multiple repeatedly measured observed variables as indicators of latent trait and latent state residual factors. In practice, such indicators often show shared indicator-specific (or method) variance over time. In this article, the authors compare 4 approaches to account for such method effects in LST models and discuss the strengths and weaknesses of each approach based on theoretical considerations, simulations, and applications to actual data sets. The simulation study revealed that the LST model with indicator-specific traits (Eid, 1996) and the LST model with M - 1 correlated method factors (Eid, Schneider, & Schwenkmezger, 1999) performed well, whereas the model with M orthogonal method factors used in the early work of Steyer, Ferring, and Schmitt (1992) and the correlated uniqueness approach (Kenny, 1976) showed limitations under conditions of either low or high method-specificity. Recommendations for the choice of an appropriate model are provided.
 
Article
Researchers commonly collect repeated measures on individuals nested within groups such as students within schools, patients within treatment groups, or siblings within families. Often, it is most appropriate to conceptualize such groups as dynamic entities, potentially undergoing stochastic structural and/or functional changes over time. For instance, as a student progresses through school, more senior students matriculate while more junior students enroll, administrators and teachers may turn over, and curricular changes may be introduced. What it means to be a student within that school may thus differ from 1 year to the next. This article demonstrates how to use multilevel linear models to recover time-varying group effects when analyzing repeated measures data on individuals nested within groups that evolve over time. Two examples are provided. The 1st example examines school effects on the science achievement trajectories of students, allowing for changes in school effects over time. The 2nd example concerns dynamic family effects on individual trajectories of externalizing behavior and depression. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
 
Article
Longitudinal studies are necessary to examine individual change over time, with group status often being an important variable in explaining some individual differences in change. Although sample size planning for longitudinal studies has focused on statistical power, recent calls for effect sizes and their corresponding confidence intervals underscore the importance of obtaining sufficiently accurate estimates of group differences in change. We derived expressions that allow researchers to plan sample size to achieve the desired confidence interval width for group differences in change for orthogonal polynomial change parameters. The approaches developed provide the expected confidence interval width to be sufficiently narrow, with an extension that allows some specified degree of assurance (e.g., 99%) that the confidence interval will be sufficiently narrow. We make computer routines freely available, so that the methods developed can be used by researchers immediately.
 
Article
In multilevel modeling, group-level variables (L2) for assessing contextual effects are frequently generated by aggregating variables from a lower level (L1). A major problem of contextual analyses in the social sciences is that there is no error-free measurement of constructs. In the present article, 2 types of error occurring in multilevel data when estimating contextual effects are distinguished: unreliability that is due to measurement error and unreliability that is due to sampling error. The fact that studies may or may not correct for these 2 types of error can be translated into a 2 × 2 taxonomy of multilevel latent contextual models comprising 4 approaches: an uncorrected approach, partial correction approaches correcting for either measurement or sampling error (but not both), and a full correction approach that adjusts for both sources of error. It is shown mathematically and with simulated data that the uncorrected and partial correction approaches can result in substantially biased estimates of contextual effects, depending on the number of L1 individuals per group, the number of groups, the intraclass correlation, the number of indicators, and the size of the factor loadings. However, the simulation study also shows that partial correction approaches can outperform full correction approaches when the data provide only limited information in terms of the L2 construct (i.e., small number of groups, low intraclass correlation). A real-data application from educational psychology is used to illustrate the different approaches.
 
Article
In addition to evaluating a structural equation model (SEM) as a whole, often the model parameters are of interest and confidence intervals for those parameters are formed. Given a model with a good overall fit, it is entirely possible for the targeted effects of interest to have very wide confidence intervals, thus giving little information about the magnitude of the population targeted effects. With the goal of obtaining sufficiently narrow confidence intervals for the model parameters of interest, sample size planning methods for SEM are developed from the accuracy in parameter estimation approach. One method plans for the sample size so that the expected confidence interval width is sufficiently narrow. An extended procedure ensures that the obtained confidence interval will be no wider than desired, with some specified degree of assurance. A Monte Carlo simulation study was conducted that verified the effectiveness of the procedures in realistic situations. The methods developed have been implemented in the MBESS package in R so that they can be easily applied by researchers.
 
Article
The quality of tools used in binary classification is evaluated by studies that assess the accuracy of the classification. The empirical evidence is summarized in 2 × 2 contingency tables. These provide the joint frequencies between the true status of a sample and the classification made by the test. The accuracy of the test is better estimated in a meta-analysis that synthesizes the results of a set of primary studies. The true status is determined by a reference that ideally is a gold standard, which means that it is error free. However, in psychology, it is rare that all the primary studies have employed the same reference, and often they have used an imperfect reference with suboptimal accuracy instead of an actual gold standard. An imperfect reference biases both the estimates of the accuracy of the test and the empirical prevalence of the target status in the primary studies. We discuss several strategies for meta-analysis when different references are employed. Special attention is paid to the simplest case, where the meta-analyst has 1 group of primary studies using a reference that can be considered a gold standard and a 2nd group of primary studies using an imperfect reference. A procedure is recommended in which the frequencies from the primary studies with the imperfect reference are corrected prior to the meta-analysis itself. Then, a hierarchical meta-analytic model is fitted. An example with actual data from SCOFF (Sick-Control-One-Fat-Food; Hill, Reid, Morgan, & Lacey, 2010; Morgan, Reid, & Lacey, 1999) a simple but efficient test for detecting eating disorders, is described. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
 
Top-cited authors
Duane T. Wegener
  • The Ohio State University
David MacKinnon
  • Arizona State University
Leandre Fabrigar
  • Queen's University
Virgil Sheets
  • Indiana State University
Jeanne M Hoffman
  • University of Washington Seattle