Article

Standard Errors in Quantitative Criminology: Taking Stock and Looking Forward

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... It is not hard for scientists, including criminologists, to get whatever research findings they want-evidence that a criminal justice policy or program is effective, support for a favored theory or new hypothesis, statistical significance for a surprising interaction effect (Ritchie, 2020;Sweeten, 2020). Sufficient use of questionable research practices (QRPs) (Simmons, Nelson, & Simonsohn, 2011) will do the trick. ...
... The high rate of QRP usage is disappointing because QRPs contribute to false and misleading findings. Our evidence is consistent with the conclusion that many findings in criminology are likely false positives (Sweeten, 2020;Gelman, Skardhamar, & Aaltonen, 2020;Wooditch et al., 2020). Existing efforts to make criminologists aware of the pitfalls of QRPs (see Burt, 2020) should be strengthened. ...
... From concerns over spurious influences in non-randomized studies, to model misspecification, to even poor measurement practices, scholars have long been critical of the ubiquitous usage of strong, often empirically unjustified, assumptions about data (Thomas et al. 2019). This skepticism can also be seen in the growing debate on the replication crisis, which, for the most part, has pinned the blame for the widespread elicitation of unreliable and fragile inferences on conventional testing procedures, the bulk of which rarely grapple with the uncertainties posed by sources outside of sampling variation (see : Sweeten 2020;Simmons et al. 2011). Research on partial identification has sought to address these issues by challenging this overreliance on stronger assumptions, instead prioritizing a ground-up style of scholarship which seeks to maximize the credibility of inference by employing assumptions which are only as strong as necessary in order to draw informative conclusions (Manski 2003). ...
... 5 By "weak," most scholars refer to what is a comparatively small effect size. These effects not only are not as strongly linked to the outcome variable as stronger ones, but they are also more susceptible to random fluctuations in empirical data(Sweeten 2020).Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Article
Full-text available
Objectives Test the effect of perceived likelihood of arrest on criminal behavior under a relaxed set of measurement assumptions. Specifically, responses that are commonly associated with inaccurate reporting practices–particularly, the 0%, 50%, and 100% categories–can be treated as partially identified. By doing so, scholars are able to bound the effect of perceived arrest risk on criminality, which provides more credible, although less precise, estimates of β1β1{\beta }_{1}. Scholars can use this approach to not only produce more defensible findings, on the whole, but also gain insight into the possible threat posed by measurement misspecification. Methods Point estimates of a perceived certainty effect were elicited via Quasi-Poisson regression using data derived from the Pathways to Desistance study. These estimates were subsequently bounded under progressively weaker measurement assumptions by a series of hill-climb algorithms. Results In nine out of seventeen total algorithms, the worst-case bound remained in the expected direction and was statistically significant. For as long as a relatively minor level of response inaccuracy is assumed, supportive conclusions can be drawn. Conclusions Support for a certainty effect can be found under relaxed measurement assumptions, up to a point. This not only provides further support for the deterrence hypothesis, but also implies the effect might be somewhat resilient to measurement error.
... Indeed, OSPs such as preregistration and open access to materials, data, and statistical code are increasingly common in the social sciences (Chin et al., 2021;Christensen et al., 2020). 2 Positive and novel findings are published and cited more often (Fanelli, 2010), and lead to inflated estimates of true effect sizes (Barnes et al., 2020;Gelman et al., 2020). The pressure to produce positive and novel findings likely contributes to QRPs and even outright fabrication of data or falsification of results (John et al., 2012;Sweeten, 2020). These "researcher degrees of freedom" occur during the design, analysis, and writeup process, typically going unreported in publication. ...
... In a recent issue of The Criminologist, the American Society of Criminology's Vice President noted, "science is under attack more than ever, and we need to get our side of the street clean so that our evidence-based recommendations are generated from soundly scrutinized scholarship that was rigorously reviewed and replicated" (Dugan, 2020, p. 1). Other prominent voices in the field have made similar appeals (e.g., Burt, 2020;Kulig et al., 2017;Sweeten, 2020). With this in mind, we conducted a direct replication of Pickett and Baker (2014) following OSPs. ...
Article
Full-text available
In 2014, Pickett and Baker cast doubt on the scholarly consensus that Americans are pragmatic about criminal justice. Previous research suggested this pragmaticism was evidenced by either null or positive relationships between seemingly opposite items (i.e., between dispositional and situational crime attributions and between punitiveness and rehabilitative policy support). Pickett and Baker argued that because these studies worded survey items in the same positive direction, respondents’ susceptibility to acquiescence bias led to artificially inflated positive correlations. Using a simple split-ballot experiment, they manipulated the direction of survey items and demonstrated bidirectional survey items resulted in negative relationships between attributions and between support for punitive and rehabilitative policies. We replicated Pickett and Baker’s methodology with a nationally representative sample of American respondents supplemented by a diverse student sample. Our results were generally consistent, and, in many cases, effect sizes were stronger than those observed in the original study. Americans appear much less pragmatic when survey items are bidirectional. Yet, we suggest the use of bidirectional over unidirectional survey items trades one set of problems for another. Instead, to reduce acquiescence bias and improve overall data quality, we encourage researchers to adopt item-specific questioning.
... Indeed, OSPs such as preregistration and open access to materials, data, and statistical code are increasingly common in the social sciences (Chin et al., 2021;Christensen et al., 2020). 2 Positive and novel findings are published and cited more often (Fanelli, 2010), and lead to inflated estimates of true effect sizes (Barnes et al., 2020;Gelman et al., 2020). The pressure to produce positive and novel findings likely contributes to QRPs and even outright fabrication of data or falsification of results (John et al., 2012;Sweeten, 2020). These "researcher degrees of freedom" occur during the design, analysis, and writeup process, typically going unreported in publication. ...
... In a recent issue of The Criminologist, the American Society of Criminology's Vice President noted, "science is under attack more than ever, and we need to get our side of the street clean so that our evidence-based recommendations are generated from soundly scrutinized scholarship that was rigorously reviewed and replicated" (Dugan, 2020, p. 1). Other prominent voices in the field have made similar appeals (e.g., Burt, 2020;Kulig et al., 2017;Sweeten, 2020). With this in mind, we conducted a direct replication of Pickett and Baker (2014) following OSPs. ...
Preprint
Full-text available
In 2014, Pickett and Baker cast doubt on the scholarly consensus that Americans are pragmatic about criminal justice. Previous research suggested this pragmatism was evidenced by either null or positive relationships between seemingly opposite items (i.e., between dispositional and situational crime attributions and between punitiveness and rehabilitative policy support). Pickett and Baker (2014) argued that because these studies worded survey items in the same positive direction, respondents’ susceptibility to acquiescence bias led to artificially inflated positive correlations. Using a simple split-ballot experiment, they manipulated the direction of survey items and demonstrated bidirectional survey items resulted in negative relationships between attributions and between support for punitive and rehabilitative policies. We replicated Pickett and Baker’s (2014) methodology with a nationally representative sample of American respondents supplemented by a diverse student sample. Our results were generally consistent, and, in many cases, effect sizes were stronger than those observed in the original study. Americans appear much less pragmatic when survey items are bidirectional. Yet, we suggest the use of bidirectional over unidirectional survey items trades one set of problems for another. Instead, to reduce acquiescence bias and improve overall data quality, we encourage researchers to adopt item-specific questioning.
... It is not hard for scientists, including criminologists, to get whatever research findings they want-evidence that a criminal justice policy or program is effective, support for a favored theory or new hypothesis, statistical significance for a surprising interaction effect (Ritchie 2020;Sweeten 2020). Sufficient use of questionable research practices (QRPs) (Simmons et al. 2011) will do the trick. ...
... The high rate of QRP usage is disappointing because QRPs contribute to false and misleading findings. Our evidence is consistent with the conclusion that many findings in criminology are likely false positives (Sweeten 2020;Gelman et al. 2020;Wooditch et al. 2020). Existing efforts to make criminologists aware of the pitfalls of QRPs (see Burt 2020) should be strengthened. ...
Article
Full-text available
Objectives Questionable research practices (QRPs) lead to incorrect research results and contribute to irreproducibility in science. Researchers and institutions have proposed open science practices (OSPs) to improve the detectability of QRPs and the credibility of science. We examine the prevalence of QRPs and OSPs in criminology, and researchers’ opinions of those practices.Methods We administered an anonymous survey to authors of articles published in criminology journals. Respondents self-reported their own use of 10 QRPs and 5 OSPs. They also estimated the prevalence of use by others, and reported their attitudes toward the practices.ResultsQRPs and OSPs are both common in quantitative criminology, about as common as they are in other fields. Criminologists who responded to our survey support using QRPs in some circumstances, but are even more supportive of using OSPs. We did not detect a significant relationship between methodological training and either QRP or OSP use. Support for QRPs is negatively and significantly associated with support for OSPs. Perceived prevalence estimates for some practices resembled a uniform distribution, suggesting criminologists have little knowledge of the proportion of researchers that engage in certain questionable practices.Conclusions Most quantitative criminologists in our sample have used QRPs, and many have used multiple QRPs. Moreover, there was substantial support for QRPs, raising questions about the validity and reproducibility of published criminological research. We found promising levels of OSP use, albeit at levels lagging what researchers endorse. The findings thus suggest that additional reforms are needed to decrease QRP use and increase the use of OSPs.
... Methodological and practical sources of the replication crisis in criminology have been discussed in depth (see Chin et al. 2023;Pridemore et al. 2018;Schumm et al. 2023;Sweeten 2020;Wooditch et al. 2020); we focus on the theoretical sources. Niemeyer et al.'s (2022) simulations examining possible contributions of weak theory to the field's burgeoning credibility crisis represented the first effort by criminologists to link longstanding theoretical concerns in criminology with false-positive research findings. ...
... Comparing the various models offers a basis for deriving an uncertainty interval for the mediating effects, which can readily be compared with the confidence intervals derived in each of the candidate models. This uncertainty perspective explicitly acknowledges the approximative character of model comparison tasks-models should be seen as approximations of the data-generating process, rather than strictly "correct" or "wrong" (Burnham & Anderson 2002;Sweeten 2020). Or as Cudeck and Henly (1991, p. 512) note: "Yet no model is completely faithful to the behavior under study. ...
Article
Full-text available
To identify potential mediating effects, researchers applying partial least squares structural equation modeling (PLS-SEM) typically contrast specific indirect and direct effects in a sequence of steps. Extending this standard procedure, we conceive mediation analysis as a type of model comparison, which facilitates quantifying the degree of the model effects’ uncertainty induced by the introduction of the mediator. By introducing a new dimension of quality assessment, the procedure offers a new means for deciding whether or not to introduce a mediator in a PLS path model, and improves the replicability of research results.
... In doing so, we attempt to be transparent about our research processes and to reduce bias (Frankenhuis & Nettle, 2018;Nosek, Ebersole, DeHaven, & Mellor, 2018;Sweeten, 2020). ...
Article
Full-text available
Background: Predominant explanations of the victim-offender overlap tend to focus on shared causes, such as (low) self-control or risky lifestyles. Such explanations bypass the possibility of a causal link between victimization and offending. We draw on evolutionary developmental psychology and criminological research to propose and test the hypothesis that victimization induces what we refer to as a short-term mindset, i.e., an orientation towards the here-and-now at the expense of considering the future, which in turn increases offending. Methods: We test this mediation hypothesis using structural equation modeling of longitudinal data from a representative sample of urban youth from the city of Zurich, Switzerland (N = 1675). Results: In line with our preregistered predictions, we find that short-term mindsets mediate the effect of victimization on offending, net of prior levels of offending and short-term mindsets, and other controls. Conclusions: We discuss implications for criminological theory and interventions.
... This assumption, in particular, is untenable in situations with multiple candidate models, since the models cannot all be correct at the same time. More likely, none of the models is anything more than a reasonable approximation (Burnham & Anderson, 2002;Sweeten, 2020). Because of this, metrics that depend on the model being correct are potentially misleading. ...
Article
Full-text available
Picking one “winner” model for researching a certain phenomenon while discarding the rest implies a confidence that may misrepresent the evidence. Multimodel inference allows researchers to more accurately represent their uncertainty about which model is “best.” But multimodel inference, with Akaike weights—weights reflecting the relative probability of each candidate model—and bootstrapping, can also be used to quantify model selection uncertainty, in the form of empirical variation in parameter estimates across models, while minimizing bias from dubious assumptions. This paper describes this approach. Results from a simulation example and an empirical study on the impact of perceived brand environmental responsibility on customer loyalty illustrate and provide support for our proposed approach.
... As more empirical research into compliance is conducted and published, it is important to be aware of problems that result in biased findings and conclusions. Across disciplines such as psychology and criminology, there has been much discussion and debate in recent years surrounding replication problems (i.e., the inability of different researchers to find the same results as a previous study), the causes of replication problems, and recommendations for addressing these problems (e.g., see Benjamin et al. 2018;Lakens et al. 2018;Open Science Collaboration 2015;Simmons et al. 2011;Sweeten, 2020;Wassertein et al. 2019). One recommendation made by Cumming (2014: 14) is to use "meta-analytic thinking", meaning that researchers should do three things: design studies with replicability in mind (i.e., make all of their methods transparent and easy to recreate in a different setting), ensure that they are presenting their results in such a way that they can be understood and used in future research, and regularly synthesize the results from various studies using meta-analyses. ...
Chapter
This chapter discusses how Monte Carlo Simulations (MCS) can be used to improve empirical studies of compliance. MCS are a form of stochastic simulation, which aim to imitate and represent real-world processes with the use of random variables. This chapter describes three applications of MCS using compliance-related examples, including a) estimating total costs of noncompliance, b) identifying the optimal sample size for a planned study, and c) demonstrating potential bias in model estimates. Ultimately, MCS can assist the field of compliance in navigating certain problems faced by many research domains, such as replication problems.
... Teniendo en cuenta las ventajas de publicar, argumentamos que, a pesar de los desafíos, las publicaciones en revistas científicas tienen un valor añadido para el investigador y para la ciencia, siempre que se respeten los requisitos éticos y de calidad, con especial atención a las limitaciones metodológicas de la investigación y la interpretación de los resultados (ver Díaz Fernández, 2019;Piquero & Weisburd, 2010;Sweeten, 2020). En las líneas siguientes trazamos unas sugerencias que, a nuestro entender, pueden servir de guía para jóvenes investigadores que deseen publicar. ...
Article
Full-text available
Esta edición especial del Boletín Criminológico recoge nueve de las mejores contribuciones presentadas durante el II Encuentro de Jóvenes Investigadores en Criminología, que tuvo lugar en la Facultad de Derecho de la Universidad de Málaga (España) los días 13 y 14 de febrero de 2020. Tras presentar las nueve contribuciones, este artículo discute el reto que supone publicar siendo joven investigador. En particular, revisaremos de manera breve los escritos relativos al fenómeno “publica o perece” a fin de exponer sus aspectos positivos, así como sus dificultades, y, por último, esbozaremos algunas recomendaciones para publicar de manera exitosa, preservando los principios éticos y metodológicos de nuestra ciencia.
... Teniendo en cuenta las ventajas de publicar, argumentamos que, a pesar de los desafíos, las publicaciones en revistas científicas tienen un valor añadido para el investigador y para la ciencia, siempre que se respeten los requisitos éticos y de calidad, con especial atención a las limitaciones metodológicas de la investigación y la interpretación de los resultados (ver Díaz Fernández, 2019;Piquero & Weisburd, 2010;Sweeten, 2020). En las líneas siguientes trazamos unas sugerencias que, a nuestro entender, pueden servir de guía para jóvenes investigadores que deseen publicar. ...
Article
Full-text available
Esta edición especial del Boletín Criminológico recoge nueve de las mejores contribuciones presentadas durante el II Encuentro de Jóvenes Investigadores en Criminología, que tuvo lugar en la Facultad de Derecho de la Universidad de Málaga (España) los días 13 y 14 de febrero de 2020. Tras presentar las nueve contribuciones, este artículo discute el reto que supone publicar siendo joven investigador. En particular, revisaremos de manera breve los escritos relativos al fenómeno “publica o perece” a fin de exponer sus aspectos positivos, así como sus dificultades, y, por último, esbozaremos algunas recomendaciones para publicar de manera exitosa, preservando los principios éticos y metodológicos de nuestra ciencia. This special issue of the Criminological Bulletin contains nine of the best contributions presented during the 2nd Meeting of Spanish Early-Career Researchers in Criminology, which took place at the Law School of the University of Malaga (Spain) on the 13th and 14th of February 2020. This article presents firstly the nine contributions of the special issue. As well, it discusses the challenge of publishing while being an early career researcher. In particular, we briefly review the literature on the "publish or perish" phenomenon to set out its assets, as well as its difficulties. Last, we outline several recommendations for publishing successfully, while preserving the ethical and methodological principles of our discipline.
... In severe cases, biased estimates can result in inaccurate significance tests and confidence intervals. When biased estimates are published, they can result in biased meta-analyses, inaccurate power analyses for future studies, and more broadly in reduced legitimacy of social scientific research (Ioannidis 2005;Sweeten and Pickett 2017). We argued that, although MCS have been increasingly used by criminologists in the past decade (e.g., Barnes et al. 2014;Hester and Hartmann 2017;Hipp and Kim 2017;Levin et al. 2017;Spelman 2013;Zimmerman 2016), they could be used by a broader body of researchers for a variety of purposes not normally seen in the criminal justice literature. ...
Article
Full-text available
Objectives When biased coefficient and standard error estimates are published, they can result in inaccurate findings which might motivate ineffective—or harmful—policy choices and reduce the legitimacy of social scientific research. In this paper, we demonstrate how Monte Carlo simulations (MCS) can be used to evaluate potential bias in estimates. Methods We define estimation bias and provide an overview of MCS, which involves three steps. First, data are generated according to model parameters and assumptions. Second, the data are analyzed and estimates are saved. Third, these two steps are repeated 2500 times to yield a distribution of estimates to compare with the original estimates. In our first example of using MCS to evaluate potential bias, we use data from a previous project to estimate an OLS regression model and then assess the consistency of estimates. In our second example, we evaluate published regression model estimates. In the third example, we employ experimental methodology with MCS to show how a correlation estimate would vary if there was a moderating effect of a third variable. Results Standard error estimates in the first example exhibited severe bias due to violated assumptions. The second example showed model estimates were largely unbiased. The third example showed that the strength of a moderating effect is positively related to correlation estimate bias. Conclusions Although MCS have been increasingly used by criminologists, they could be used by a broader body of researchers, instructors, and policymakers to assess and ensure the reliability of reported findings.
Article
Full-text available
To examine if police expressions of solidary with protesters improve public opinion toward the police, we embedded a picture- and information-based experiment in a YouGov survey ( N = 1,150), wherein respondents were randomly exposed to police expressions of solidarity with protesters. We also randomized whether the pictured officers were wearing riot gear. We find little evidence that expressions of solidarity or riot gear significantly affect public affinity for the police or support for accountability reforms in policing. Past studies show that outside of the context of protests, officers’ behavior toward civilians has asymmetric effects, such that positive actions matter less than negative ones. Our findings suggest that this may be true within the protest context as well.
Article
Full-text available
Objectives This study aims to assess the evidential value of the knowledgebase in criminology after accounting for the presence of potential Type I errors.Methods The present study examines the distribution of 1248 p-values (that inform 84 statistically significant outcomes across 26 systematic reviews) in meta-analyses on the topic of crime and justice published by the Campbell Collaboration (CC) using p-curve analysis.ResultsThe distribution of all CC p-values have a significant cluster of p-values immediately below 0.05, which is indicative of p-hacking. Evidential value (right skewed p-curves) is detected in most meta-analytic topic areas but not motivational interviewing (substance use outcome), sex offender treatment (sexual/general recidivism), police legitimacy (procedural justice), street-level drug law enforcement (total crime), and treatment effectiveness in secure corrections (juvenile recidivism).Conclusions More studies, especially carefully designed and implemented randomized experiments with sufficiently large sample sizes, are needed before we are able to affirm the presence of evidential value and replicability of studies in all CC topic areas with confidence.
Article
Full-text available
Objectives When biased coefficient and standard error estimates are published, they can result in inaccurate findings which might motivate ineffective—or harmful—policy choices and reduce the legitimacy of social scientific research. In this paper, we demonstrate how Monte Carlo simulations (MCS) can be used to evaluate potential bias in estimates. Methods We define estimation bias and provide an overview of MCS, which involves three steps. First, data are generated according to model parameters and assumptions. Second, the data are analyzed and estimates are saved. Third, these two steps are repeated 2500 times to yield a distribution of estimates to compare with the original estimates. In our first example of using MCS to evaluate potential bias, we use data from a previous project to estimate an OLS regression model and then assess the consistency of estimates. In our second example, we evaluate published regression model estimates. In the third example, we employ experimental methodology with MCS to show how a correlation estimate would vary if there was a moderating effect of a third variable. Results Standard error estimates in the first example exhibited severe bias due to violated assumptions. The second example showed model estimates were largely unbiased. The third example showed that the strength of a moderating effect is positively related to correlation estimate bias. Conclusions Although MCS have been increasingly used by criminologists, they could be used by a broader body of researchers, instructors, and policymakers to assess and ensure the reliability of reported findings.
Article
Full-text available
This article summarizes key points made in a session at the American Society of Criminology meeting in Philadelphia in November 2017, entitled “The replication issue in science and its relevance for criminology”, organized by Friedrich Lösel and Robert F. Boruch. In turn, this session was inspired by Friedrich Lösel’s (2018) article in this journal, based on his 2015 Joan McCord Award Lecture of the Academy of Experimental Criminology. In the present article, Friedrich Lösel introduces the topic of replication in criminology and summarizes his main arguments. Then, six leading criminologists present short papers on this topic. Robert F. Boruch points out the instability in social systems, David P. Farrington argues that systematic reviews are important, and Denise C. Gottfredson calls attention to the heterogeneity in conclusions across different studies. Lorraine Mazerolle reviews attempts to replicate experiments in procedural justice, Lawrence W. Sherman draws attention to enthusiasm bias in criminal justice experiments, and David Weisburd discusses the logic of null hypothesis significance testing and multi-center trials. Finally, some developments since November 2017 in research on replication in criminology are discussed.
Article
Full-text available
Objectives. This cohort study explores the prevalence and effect of suspected outcome reporting bias in clinical trials on substance use disorders. Methods. Protocols on the ClinicalTrials.gov registry are compared with the corresponding trial reports for 95 clinical trials across 3,162 outcomes. Variation in average effect size is examined by completeness and accuracy of reporting using ordinary least squares regression with robust standard errors (Eicker-Huber-White sandwich estimator). Results. Trials reports are frequently incomplete and inconsistent with their protocol. The most commonly practiced biased reporting practices are added outcomes not prespecified on the protocol, insufficiently pre-specifying outcomes, and omitting outcomes that were pre-specified on the protocol. There is a linear trend between the number of different biased reporting practices the trialist(s) engaged in and mean study-level Cohen’s d (+ 0.214 with each additional type of biased reporting practice). Trials with omitted pre-specified outcomes have a significantly higher Cohen’s d on average when compared to trials that did not omit such outcomes (+ 0.315). Added outcomes have a Cohen’s d that is 0.385 higher in comparison to reported outcomes that were pre-specified on the protocol. Conclusions. The magnitude of outcome reporting bias raises considerable concern regarding inflated type I error rates. Implications for clinical trials on substance abuse, and randomized experiments in criminology, more generally, are discussed.
Article
Full-text available
Introduction A key issue is how to interpret t-statistics when publication bias is present. In this paper we propose a set of rough rules of thumb to assist readers to interpret t-values in published results under publication bias. Unlike most previous methods that utilize collections of studies, our approach evaluates the strength of evidence under publication bias when there is only a single study. Methods We first re-interpret t-statistics in a one-tailed hypothesis test in terms of their associated p-values when there is extreme publication bias, that is, when no null findings are published. We then consider the consequences of different degrees of publication bias. We show that under even moderate levels of publication bias adjusting one’s p-values to insure Type I error rates of either 0.05 or 0.01 result in far higher t-values than those in a conventional t-statistics table. Under a conservative assumption that publication bias occurs 20 percent of the time, with a one-tailed test at a significance level of 0.05, a t-value equal or greater than 2.311 is needed. For a two-tailed test the appropriate standard would be equal or above 2.766. Both cutoffs are far higher than the traditional ones of 1.645 and 1.96. To achieve a p-value less than 0.01, the adjusted t-values would be 2.865 (one-tail) and 3.254 (two-tail), as opposed to the traditional values 2.326 (one-tail) and 2.576 (two-tail). We illustrate our approach by applying it to evaluate the hypothesis tests in recent issues of Criminology and Journal of Quantitative Criminology (JQC). Conclusion Under publication bias much higher t-values are needed to restore the intended p-value. By comparing the observed test scores with the adjusted critical values, this paper provides a rough rule of thumb for readers to evaluate the degree to which a reported positive result in a single publication reflects a true positive effect. Further measures to increase the reporting of robust null findings are needed to ameliorate the issue of publication bias.
Article
Full-text available
Objective Criminologists have long questioned how fragile our statistical inferences are to unobserved bias when testing criminological theories. This study demonstrates that sensitivity analyses offer a statistical approach to help assess such concerns with two empirical examples—delinquent peer influence and school commitment. Methods Data from the Gang Resistance Education and Training are used with models that: (1) account for theoretically-relevant controls; (2) incorporate lagged dependent variables and; (3) account for fixed-effects. We use generalized sensitivity analysis (Harada in ISA: Stata module to perform Imbens’ (2003) sensitivity analysis, 2012; Imbens in Am Econ Rev 93(2):126–132, 2003) to estimate the size of unobserved heterogeneity necessary to render delinquent peer influence and school commitment statistically non-significant and substantively weak and compare these estimates to covariates in order to gauge the likely existence of such bias. ResultsUnobserved bias would need to be unreasonably large to render the peer effect statistically non-significant for violence and substance use, though less so to reduce it to a weak effect. The observed effect of school commitment on delinquency is much more fragile to unobserved heterogeneity. Conclusion Questions over the sensitivity of inferences plague criminology. This paper demonstrates the utility of sensitivity analysis for criminological theory testing in determining the robustness of estimated effects.
Article
Full-text available
Objective We address several issues concerning standard error bias in pooled time-series cross-section regressions. These include autocorrelation, problems with unit root tests, nonstationarity in levels regressions, and problems with clustered standard errors. Methods We conduct unit root test for crimes and other variables. We use Monte Carlo procedures to illustrate the standard error biases caused by the above issues in pooled studies. We replicate prior research that uses clustered standard errors with difference-in-differences regressions and only a small number of policy changes. Results Standard error biases in the presence of autocorrelation are substantial when standard errors are not clustered. Importantly, clustering greatly mitigates bias resulting from the use of nonstationary variables in levels regressions, although in some circumstances clustering can fail to correct for standard error biases due to other reasons. The “small number of policy changes” problem can cause extreme standard error bias, but this can be corrected using “placebo laws”. Other biases are caused by weighting regressions, having too few units, and having dissimilar autocorrelation coefficients across units. Conclusions With clustering, researchers can usually conduct regressions in levels even with nonstationary variables. They should, however, be leery of pitfalls caused by clustering, especially when conducting difference-in-differences analyses.
Article
Full-text available
Objectives Simple calculations seem to show that larger studies should have higher statistical power, but empirical meta-analyses of published work in criminology have found zero or weak correlations between sample size and estimated statistical power. This is “Weisburd’s paradox” and has been attributed by Weisburd et al. (in Crime Justice 17:337–379, 1993) to a difficulty in maintaining quality control as studies get larger, and attributed by Nelson et al. (in J Exp Criminol 11:141–163, 2015) to a negative correlation between sample sizes and the underlying sizes of the effects being measured. We argue against the necessity of both these explanations, instead suggesting that the apparent Weisburd paradox might be explainable as an artifact of systematic overestimation inherent in post-hoc power calculations, a bias that is large with small N. Methods We discuss Weisburd’s paradox in light of the concepts of type S and type M errors, and re-examine the publications used in previous studies of the so-called paradox. Results We suggest that the apparent Weisburd paradox might be explainable as an artifact of systematic overestimation inherent in post-hoc power calculations, a bias that is large with small N. Conclusions Speaking more generally, we recommend abandoning the use of statistical power as a measure of the strength of a study, because implicit in the definition of power is the bad idea of statistical significance as a research goal.
Article
Full-text available
We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.
Article
Full-text available
Objectives Conventional statistical modeling in criminology assumes proper model specification. Very strong and unrebutted criticisms have existed for decades. Some respond that although the criticisms are correct, there is for observational data no alternative. In this paper, we provide an alternative. Methods We draw on work in econometrics and statistics from several decades ago, updated with the most recent thinking to provide a way to properly work with misspecified models. ResultsWe show how asymptotically, unbiased regression estimates can be obtained along with valid standard errors. Conventional statistical inference can follow. Conclusions If one is prepared to work with explicit approximations of a “true” model, defensible analyses can be obtained. The alternative is working with models about which all of the usual criticisms hold.
Article
Full-text available
Data fraud and selective reporting both present serious threats to the credibility of science. However, there remains considerable disagreement among scientists about how best to sanction data fraud, and about the ethicality of selective reporting. The public is arguably the largest stakeholder in the reproducibility of science; research is primarily paid for with public funds, and flawed science threatens the public’s welfare. Members of the public are able to make meaningful judgments about the morality of different behaviors using moral intuitions. Legal scholars emphasize that to maintain legitimacy, social control policies must be developed with some consideration given to the public’s moral intuitions. Although there is a large literature on popular attitudes toward science, there is no existing evidence about public opinion on data fraud or selective reporting. We conducted two studies—a survey experiment with a nationwide convenience sample (N = 821), and a follow-up survey with a representative sample of U.S. adults (N = 964)—to explore community members’ judgments about the morality of data fraud and selective reporting in science. The findings show that community members make a moral distinction between data fraud and selective reporting, but overwhelming judge both behaviors to be morally wrong. Community members believe that scientists who commit data fraud or selective reporting should be fired and banned from receiving funding. For data fraud, most Americans support criminal penalties. Results from an ordered logistic regression analysis reveal few demographic and no significant partisan differences in punitiveness toward data fraud.
Article
Full-text available
Improving the reliability and efficiency of scientific research will increase the credibility of the published scientific literature and accelerate discovery. Here we argue for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives. There is some evidence from both simulations and empirical studies supporting the likely effectiveness of these measures, but their broad adoption by researchers, institutions, funders and journals will require iterative evaluation and improvement. We discuss the goals of these measures, and how they can be implemented, in the hope that this will facilitate action toward improving the transparency, reproducibility and efficiency of scientific research.
Article
Full-text available
Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so-and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.
Article
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Article
Full-text available
Empirically analyzing empirical evidence One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study. Science , this issue 10.1126/science.aac4716
Article
Full-text available
Transparency, openness, and reproducibility are readily recognized as vital features of science (1, 2). When asked, most scientists embrace these features as disciplinary norms and values (3). Therefore, one might expect that these valued features would be routine in daily practice. Yet, a growing body of evidence suggests that this is not the case (4–6).
Article
Full-text available
Although researchers acknowledge the importance of replication in building scientific knowledge, replication studies seem to be published infrequently. The present study examines the extent to which replications are conducted in criminology. We conduct a content analysis of the five most influential journals in criminology. We also compare the replication rate in criminology with that in the social sciences and natural sciences. The results show that replication research is rarely published in these disciplines. In criminology journals in particular, replication studies constitute just over 2 percent of the articles published between 2006 and 2010. Further, those replication studies that were published in criminology journals in that period tended to conflict with the original studies. These findings call into question the utility of empirical results published in criminology journals for developing theory and policy. Strategies for promoting replication research in criminology are suggested.
Article
Full-text available
A decade ago, the Society of Prevention Research (SPR) endorsed a set of standards for evidence related to research on prevention interventions. These standards (Flay et al., Prevention Science 6:151-175, 2005) were intended in part to increase consistency in reviews of prevention research that often generated disparate lists of effective interventions due to the application of different standards for what was considered to be necessary to demonstrate effectiveness. In 2013, SPR's Board of Directors decided that the field has progressed sufficiently to warrant a review and, if necessary, publication of "the next generation" of standards of evidence. The Board convened a committee to review and update the standards. This article reports on the results of this committee's deliberations, summarizing changes made to the earlier standards and explaining the rationale for each change. The SPR Board of Directors endorses "The Standards of Evidence for Efficacy, Effectiveness, and Scale-up Research in Prevention Science: Next Generation."
Article
Full-text available
Recent controversies have questioned the quality of scientific practice in the field of psychology, but these concerns are often based on anecdotes and seemingly isolated cases. To gain a broader perspective, this article applies an objective test for excess success to a large set of articles published in the journal Psychological Science between 2009 and 2012. When empirical studies succeed at a rate much higher than is appropriate for the estimated effects and sample sizes, readers should suspect that unsuccessful findings have been suppressed, the experiments or analyses were improper, or the theory does not properly account for the data. In total, problems appeared for 82 % (36 out of 44) of the articles in Psychological Science that had four or more experiments and could be analyzed.
Article
Full-text available
The issue of a published literature not representative of the population of research is most often discussed in terms of entire studies being suppressed. However, alternative sources of publication bias are questionable research practices (QRPs) that entail post hoc alterations of hypotheses to support data or post hoc alterations of data to support hypotheses. Using general strain theory as an explanatory framework, we outline the means, motives, and opportunities for researchers to better their chances of publication independent of rigor and relevance. We then assess the frequency of QRPs in management research by tracking differences between dissertations and their resulting journal publications. Our primary finding is that from dissertation to journal article, the ratio of supported to unsupported hypotheses more than doubled (.82 to 1.00 versus 1.94 to 1.00). The rise in predictive accuracy resulted from the dropping of statistically non-significant hypotheses, the addition of statistically significant hypotheses, the reversing of predicted direction of hypotheses, and alterations to data. We conclude with recommendations to help mitigate the problem of an unrepresentative literature that we label, the Chrysalis Effect.
Article
Full-text available
Null hypothesis significance testing uses the seemingly arbitrary probability of .05 as a means of objectively determining if a tested effect is reliable. Within recent psychological articles, research has found an over-representation of p values around this cut-off. The present study examined whether this over-representation is a product of recent pressure to publish or if it has existed throughout psychological research. Articles published in 1965 and 2005 from two prominent psychology journals were examined. Like previous research, the frequency of p values at and just below .05 was greater than expected compared to p frequencies in other ranges. While this over-representation was found for values published in both 1965 and 2005, it was much greater in 2005. Additionally, p values close to but over .05 were more likely to be rounded down to, or incorrectly reported as, significant in 2005 compared to 1965. Modern statistical software and an increased pressure to publish may explain this pattern. The problem may be alleviated by reduced reliance on p values and increased reporting of confidence intervals and effect sizes.
Article
Full-text available
Because scientists tend to report only studies (publication bias) or analyses (p-hacking) that “work,” readers must ask, “Are these effects true, or do they merely reflect selective reporting?” We introduce p-curve as a way to answer this question. P-curve is the distribution of statistically significant p values for a set of studies (ps < .05). Because only true effects are expected to generate right-skewed p-curves—containing more low (.01s) than high (.04s) significant p values—only right-skewed p-curves are diagnostic of evidential value. By telling us whether we can rule out selective reporting as the sole explanation for a set of findings, p-curve offers a solution to the age-old inferential problems caused by file-drawers of failed studies and analyses.
Article
Full-text available
Interest in randomized experiments with criminal justice subjects has grown, in recognition that experiments are much better suited for identifying and isolating program effects than are quasi-experimental or nonexperimental research designs. Relatively little attention, however, has been paid to methodological issues. Using the statistical concept of power-the likelihood that a test will lead to the rejection of a hypothesis of no effect, a survey examines the design sensitivity of experiments on sanctions. Contrary to conventional wisdom advocating large sample designs, little relationship is found in practice between sample size and statistical power. Difficulty in maintaining the integrity of treatments and the homogeneity of samples or treatments employed offsets the design advantages of larger investigations.
Article
Full-text available
The present article suggests a possible way to reduce the file drawer problem in scientific research (Rosenthal, 1978, 1979), that is, the tendency for “nonsignificant” results to remain hidden in scientists’ file drawers because both authors and journals strongly prefer statistically significant results. We argue that peer-reviewed journals based on the principle of rigorous evaluation of research proposals before results are known would address this problem successfully. Even a single journal adopting a result-blind evaluation policy would remedy the persisting problem of publication bias more efficiently than other tools and techniques suggested so far. We also propose an ideal editorial policy for such a journal and discuss pragmatic implications and potential problems associated with this policy. Moreover, we argue that such a journal would be a valuable addition to the scientific publication outlets, because it supports a scientific culture encouraging the publication of well-designed and technically sound empirical research irrespective of the results obtained. Finally, we argue that such a journal would be attractive for scientists, publishers, and research agencies.
Article
Full-text available
Most of the methods we use in criminology to infer relationships are based on mean values of distributions. This essay explores the historical origins of this issue and some counterproductive consequences: relying too heavily on sampling as a means of ensuring "statistical significance "; ignoring the implicit assumptions of regression modeling; and assuming that all data sets reflect a single mode of behavior for the entire population under study. The essay concludes by suggesting that we no longer "make do " with the standard methodologies used to study criminology and criminal justice, and recommends developing categories that more accurately reflect behavior and groupings than the ones we currently use; looking at alternative sources of data, including qualitative data such as narrative accounts; and developing alternative methods to extract and analyze the data from such sources.
Article
Full-text available
The number of retracted scholarly articles has risen precipitously in recent years. Past surveys of the retracted literature each limited their scope to articles in PubMed, though many retracted articles are not indexed in PubMed. To understand the scope and characteristics of retracted articles across the full spectrum of scholarly disciplines, we surveyed 42 of the largest bibliographic databases for major scholarly fields and publisher websites to identify retracted articles. This study examines various trends among them. We found, 4,449 scholarly publications retracted from 1928-2011. Unlike Math, Physics, Engineering and Social Sciences, the percentages of retractions in Medicine, Life Science and Chemistry exceeded their percentages among Web of Science (WoS) records. Retractions due to alleged publishing misconduct (47%) outnumbered those due to alleged research misconduct (20%) or questionable data/interpretations (42%). This total exceeds 100% since multiple justifications were listed in some retraction notices. Retraction/WoS record ratios vary among author affiliation countries. Though widespread, only miniscule percentages of publications for individual years, countries, journals, or disciplines have been retracted. Fifteen prolific individuals accounted for more than half of all retractions due to alleged research misconduct, and strongly influenced all retraction characteristics. The number of articles retracted per year increased by a factor of 19.06 from 2001 to 2010, though excluding repeat offenders and adjusting for growth of the published literature decreases it to a factor of 11.36. Retracted articles occur across the full spectrum of scholarly disciplines. Most retracted articles do not contain flawed data; and the authors of most retracted articles have not been accused of research misconduct. Despite recent increases, the proportion of published scholarly literature affected by retraction remains very small. Articles and editorials discussing retractions, or their relation to research integrity, should always consider individual cases in these broad contexts. However, better mechanisms are still needed for raising researchers' awareness of the retracted literature in their field.
Article
Full-text available
Publication bias, including prejudice against the null hypothesis, and other biasing filters may operate on researchers as well as journal editors and reviewers. A survey asked 33 psychology researchers to describe the fate of 159 studies approved by their departmental human subjects review committee. About two thirds of completed studies did not result in published summaries. About half of the unpublished studies fell out of the process for reasons other than methodological quality. Among these, lack of interest and aims that did not include publication were cited more often than nonsignificant results as the reasons why publication was not pursued. However, significant findings were more likely than nonsignificant findings to be submitted for meeting presentation or publication. These results indicate attention needs to be paid to improving how psychological scientists communicate, especially to the creation of prospective research registers. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
In null hypothesis significance testing (NHST), p values are judged relative to an arbitrary threshold for significance (.05). The present work examined whether that standard influences the distribution of p values reported in the psychology literature. We examined a large subset of papers from three highly regarded journals. Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges. This prevalence of p values just below the arbitrary criterion for significance was observed in all three journals. We discuss potential sources of this pattern, including publication bias and researcher degrees of freedom.
Article
Full-text available
Null Hypothesis Significance Testing (NHST) has been a mainstay of the social sciences for empirically examining hypothesized relationships, and the main approach for establishing the importance of empirical results. NHST is the foundation of classical or frequentist statistics. The approach is designed to test the probability of generating the observed data if no relationship exists between the dependent and independent variables of interest, recognizing that the results will vary from sample to sample. This paper is intended to evaluate the state of the criminological and criminal justice literature with respect to the correct application of NHST. We apply a modified version of the instrument used in two reviews of the economics literature by McCloskey and Ziliak to code 82 articles in criminology and criminal justice. We have selected three sources of papers: Criminology, Justice Quarterly, and a recent review of experiments in criminal justice by Farrington and Welsh. We find that most researchers provide the basic information necessary to understand effect sizes and analytical significance in tables which include descriptive statistics and some standardized measure of size (e.g., betas, odds ratios). On the other hand, few of the articles mention statistical power and even fewer discuss the standards by which a finding would be considered large or small. Moreover, less than half of the articles distinguish between analytical significance and statistical significance, and most articles used the term ‘significance’ in ambiguous ways.
Article
This article considers a practice in scientific communication termed HARKing (Hypothesizing After the Results are Known). HARKing is defined as presenting a post hoc hypothesis (i.e., one based on or informed by one's results) in one's research report as if it were, in fact, an a priori hypotheses. Several forms of HARKing are identified and survey data are presented that suggests that at least some forms of HARKing are widely practiced and widely seen as inappropriate. I identify several reasons why scientists might HARK. Then I discuss several reasons why scientists ought not to HARK. It is conceded that the question of whether HARKing's costs exceed its benefits is a complex one that ought to be addressed through research, open discussion, and debate. To help stimulate such discussion (and for those such as myself who suspect that HARKing's costs do exceed its benefits), I conclude the article with some suggestions for deterring HARKing.
Article
Research misconduct is harmful because it threatens public health and public safety, and also undermines public confidence in science. Efforts to eradicate ongoing and prevent future misconduct are numerous and varied, yet the question of “what works” remains largely unanswered. To shed light on this issue, this study used data from both mail and online surveys administered to a stratified random sample of tenured and tenure-track faculty members (N = 613) in the social, natural, and applied sciences at America’s top 100 research universities. Participants were asked to gauge the effectiveness of various intervention strategies: formal sanctions (professional and legal), informal sanctions (peers), prevention efforts (ethics and professional training), and reducing the pressures associated with working in research-intensive units. Results indicated that (1) formal sanctions received the highest level of support, (2) female scholars and researchers working in the applied sciences favored formal sanctions, and (3) a nontrivial portion of the sample supported an integrated approach that combined elements of different strategies. A key takeaway for university administrators is that a multifaceted approach to dealing with the problem of research misconduct, which prominently features enhanced formal sanctions, will be met with the support of university faculty.
Article
Little research has investigated the conditions that lead to research misconduct. To develop effective intervention/prevention strategies, this void must be filled. This study administered a mixed-mode survey (i.e. mail and online) to a stratified random sample of tenured and tenuretrack faculty in the natural, social, and applied sciences (N = 613) during the 2016–17 academic year. The sample includes scholars from 100 universities in the United States. Participants were asked about the extent to which they believe a variety of known criminogenic factors contribute to research misconduct in their field. Descriptive results show that professional strains and stressors (e.g. pressure to secure external funds and publish in top-tier journals) are most widely perceived to cause misconduct, followed by the low probability of detecting misbehavior. Results from the MANOVA model show that this pattern of perceived causes remains the same for scholars across scientific fields. Implications for future research are discussed.
Article
A crisis of confidence has struck the behavioral and social sciences. A key factor driving the crisis is the low levels of statistical power in many studies. Low power is problematic because it leads to increased rates of false-negative results, inflated false-discovery rates, and over-estimates of effect sizes. To determine whether these issues impact criminology, we computed estimates of statistical power by drawing 322 mean effect sizes and 271 average sample sizes from 81 meta-analyses. The results indicated criminological studies, on average, have a moderate level of power (mean = 0.605), but there is variability. This variability is observed across general studies as well as those designed to test interventions. Studies using macro-level data tend to have lower power than studies using individual-level data. To avoid a crisis of confidence, criminologists must not ignore statistical power and should be skeptical of large effects found in studies with small samples.
Article
Disparities in historical and contemporary punishment of Blacks have been well documented. Racial threat has been proffered as a theoretical explanation for this phenomenon. In an effort to understand the factors that influence punishment and racial divides in America, we draw on racial threat theory and prior scholarship to test three hypotheses. First, Black punitive sentiment among Whites will be greater among those who reside in areas where lynching was more common. Second, heightened Black punitive sentiment among Whites in areas with more pronounced legacies of lynching will be partially mediated by Whites’ perceptions of Blacks’ criminality and of Black‐on‐White violence in these areas. Third, the impact of lynching on Black punitive sentiment will be amplified by Whites’ perceptions of Blacks as criminals and as threatening more generally. We find partial support for these hypotheses. The results indicate that lynchings are associated with punitive sentiment toward Black offenders, and these relationships are partially mediated by perceptions of Blacks as criminals and as threats to Whites. In addition, the effects of lynchings on Black punitiveness are amplified among White respondents who view Blacks as a threat to Whites. These results highlight the salience of historical context for understanding contemporary views about punishment.
Article
Replication is a hallmark of science. In recent years, some medical sciences and behavioral sciences struggled with what came to be known as replication crises. As a field, criminology has yet to address formally the threats to our evidence base that might be posed by large-scale and systematic replication attempts, although it is likely we would face challenges similar to those experienced by other disciplines. In this review, we outline the basics of replication, summarize reproducibility problems found in other fields, undertake an original analysis of the amount and nature of replication studies appearing in criminology journals, and consider how criminology can begin to assess more formally the robustness of our knowledge through encouraging a culture of replication. Expected final online publication date for the Annual Review of Criminology Volume 1 is January 13, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Across the medical and social sciences, new discussions about replication have led to transformations in research practice. Sociologists, however, have been largely absent from these discussions. The goals of this review are to introduce sociologists to these developments, synthesize insights from science studies about replication in general, and detail the specific issues regarding replication that occur in sociology. The first half of the article argues that a sociologically sophisticated understanding of replication must address both the ways that replication rules and conventions evolved within an epistemic culture and how those cultures are shaped by specific research challenges. The second half outlines the four main dimensions of replicability in quantitative sociology—verifiability, robustness, repeatability, and generalizability—and discusses the specific ambiguities of interpretation that can arise in each. We conclude by advocating some commonsense changes to promote replication while acknowledging the epistemic diversity of our field. Expected final online publication date for the Annual Review of Sociology Volume 43 is July 30, 2017. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
There needs to be a balance between maintaining the strictest statistical controls and allowing researchers some flexibility to pursue analysis of unexpected trends observed in a study beyond the limits of pre-registered primary analysis. Given a particular data set, it can seem entirely appropriate to look at the data and construct reasonable rules for data exclusion, coding, and analysis that can lead to statistical significance. In such a case, researchers need to perform only one test, but that test is conditional on the data. If data are gathered with no preconceptions at all, statistical significance can obviously be obtained even from pure noise by the simple means of repeatedly performing comparisons, excluding data in different ways, examining different interactions, controlling for different predictors, and so forth.
Article
Despite great attention to the quality of research methods in individual studies, if publication decisions of journals are a function of the statistical significance of research findings, the published literature as a whole may not produce accurate measures of true effects. This article examines the two most prominent sociology journals (the American Sociological Review and the American Journal of Sociology) and another important though less influential journal (The Sociological Quarterly) for evidence of publication bias. The effect of the .05 significance level on the pattern of published findings is examined using a ``caliper'' test, and the hypothesis of no publication bias can be rejected at approximately the 1 in 10 million level. Findings suggest that some of the results reported in leading sociology journals may be misleading and inaccurate due to publication bias. Some reasons for publication bias and proposed reforms to reduce its impact on research are also discussed.
Article
In this article, the authors examine common practices of reporting statistically nonsignificant findings in criminal justice evaluation studies. They find that criminal justice evaluators often make formal errors in the reporting of statistically nonsignificant results. Instead of simply concluding that the results were not statistically significant, or that there is not enough evidence to support an effect of treatment, they often mistakenly accept the null hypothesis and state that the intervention had no impact or did not work. The authors propose that researchers define a second null hypothesis that sets a minimal threshold for program effectiveness. In an illustration of this approach, they find that more than half of the studies that had no statistically significant finding for a traditional, no difference null hypothesis evidenced a statistically significant result in the case of a minimal worthwhile treatment effect null hypothesis.
Article
There is some evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant results is not published. Such research being unknown to other investigators may be repeated independently until eventually by chance a significant result occurs—an “error of the first kind”—and is published. Significant results published in these fields are seldom verified by independent replication. The possibility thus arises that the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance.* The author wishes to express his thanks to Sir Ronald Fisher whose discussion on related topics stimulated this research in the first place, and to Leo Katz, Oliver Lacey, Enders Robinson, and Paul Siegel for reading and criticizing earlier drafts of this manuscript.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Differentiates between mathematical and scientific methods. The differences between scientific intuition and mathematical results have been attributed to the fact that scientific generalization is broader than mathematical description. While scientific methods deal with samples which are representative of the total whole, the mathematical methods measure the differences between the particular samples observed. Science begins with description but ends in generalization. Mathematical measures are too high and may need to be discounted in arriving at a scientific conclusion. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Although replications are vital to scientific progress, psychologists rarely engage in systematic replication efforts. In this article, we consider psychologists' narrative approach to scientific publications as an underlying reason for this neglect and propose an incentive structure for replications within psychology. First, researchers need accessible outlets for publishing replications. To accomplish this, psychology journals could publish replication reports in files that are electronically linked to reports of the original research. Second, replications should get cited. This can be achieved by cociting replications along with original research reports. Third, replications should become a valued collaborative effort. This can be realized by incorporating replications in teaching programs and by stimulating adversarial collaborations. The proposed incentive structure for replications can be developed in a relatively simple and cost-effective manner. By promoting replications, this incentive structure may greatly enhance the dependability of psychology's knowledge base. © The Author(s) 2012.
Article
Research on social inequality in punishment has focused for a long time on the complex relationship among race, ethnicity, and criminal sentencing, with a particular interest in the theoretical importance that group threat plays in the exercise of social control in society. Prior research typically relies on aggregate measures of group threat and focuses on racial rather than on ethnic group composition. The current study uses data from a nationally representative sample of U.S. residents to investigate the influence of more proximate and diverse measures of ethnic group threat, examining public support for the judicial use of ethnic considerations in sentencing. Findings indicate that both aggregate and perceptual measures of threat influence popular support for ethnic disparity in punishment and that individual perceptions of criminal and economic threat are particularly important. Moreover, we find that perceived threat is conditioned by aggregate group threat contexts. Findings are discussed in relation to the growing Hispanic population in the rapidly changing demographic structure of U.S. society.
Article
Replication is fundamental to science, so statistical analysis should give information about replication. Because p values dominate statistical analysis in psychology, it is important to ask what p says about replication. The answer to this question is "Surprisingly little." In one simulation of 25 repetitions of a typical experiment, p varied from <.001 to .76, thus illustrating that p is a very unreliable measure. This article shows that, if an initial experiment results in two-tailed p = .05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p <.00008, and fully a 10% chance that p >.44. Remarkably, the interval-termed a p interval-is this wide however large the sample size. p is so unreliable and gives such dramatically vague information that it is a poor basis for inference. Confidence intervals, however, give much better information about replication. Researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking. © 2008 Association for Psychological Science.
Article
The promise of experimental criminology is finding ways to reduce harm from crime and injustice. The problem of experimental criminology is that so few experiments produce evidence of big effects from the interventions they test. One solution to this problem may be concentrating scarce resources for experiments on the “power few:” the small percentage of places, victims, offenders, police officers or other units in any distribution of crime or injustice which produces the greatest amount of harm. By increasing the homogeneity and base rates of the samples enrolled in each experiment, the power few hypothesis predicts increased statistical power to detect program effects. With greater investment of resources, and possibly less variant responses to greater dosages of intervention—especially interventions of support, as distinct from punishment—we may also increase our chances of finding politically acceptable interventions that will work.
Article
Confirmatory bias is the tendency to emphasize and believe experiences which support one's views and to ignore or discredit those which do not. The effects of this tendency have been repeatedly documented in clinical research. However, its ramifications for the behavior of scientists have yet to be adequately explored. For example, although publication is a critical element in determining the contribution and impact of scientific findings, little research attention has been devoted to the variables operative in journal review policies. In the present study, 75 journal reviewers were asked to referee manuscripts which described identical experimental procedures but which reported positive, negative, mixed, or no results. In addition to showing poor interrater agreement, reviewers were strongly biased against manuscripts which reported results contrary to their theoretical perspective. The implications of these findings for epistemology and the peer review system are briefly addressed.