PreprintPDF Available

Why I Asked the Editors of Criminology to Retract Johnson, Stewart, Pickett, and Gertz (2011)

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

My coauthors and I were informed about data irregularities in Johnson, Stewart, Pickett, and Gertz (2011), and in my coauthors’ other articles. Subsequently, I examined my limited files and found evidence that we: 1) included hundreds of duplicates, 2) underreported the number of counties, and 3) somehow added another 316 respondents right before publication (and over a year after the survey was conducted) without changing nearly any of the reported statistics (means, standard deviations, regression coefficients). The survey company confirmed that it sent us only 500 respondents, not the 1,184 reported in the article. I obtained and reanalyzed those data. This report presents the findings from my reanalysis, which suggest that the sample was not just duplicated. The data were also altered—intentionally or unintentionally—in other ways, and those alterations produced the article’s main findings. Additionally, we misreported data characteristics as well as aspects of our analysis and findings, and we failed to report the use of imputation for missing data. The following eight findings emerged from my reanalysis: 1) The article reports 1,184 respondents, but actually there are 500. 2) The article reports 91 counties, but actually there are 326. 3) The article describes respondents that differ substantially from those in the data. 4) The article reports two significant interaction effects, but actually there are none. 5) The article reports the effect of Hispanic growth is significant and positive, but actually it is non-significant and negative. 6) The article reports many other findings that do not exist in the data. 7) The standard errors are stable in our published article, but not in the actual data or in articles published by other authors using similar modeling techniques with large samples. 8) Although never mentioned in the article, 208 of the 500 respondents in the data (or 42%) have imputed values.
Content may be subject to copyright.
Why I Asked the Editors of Criminology to Retract
Johnson, Stewart, Pickett, and Gertz (2011)
Justin T. Pickett
School of Criminal Justice
University at Albany, SUNY
ABSTRACT
My coauthors and I were informed about data irregularities in Johnson, Stewart, Pickett, and
Gertz (2011), and in my coauthors other articles. Subsequently, I examined my limited files and
found evidence that we: 1) included hundreds of duplicates, 2) underreported the number of
counties, and 3) somehow added another 316 respondents right before publication (and over a
year after the survey was conducted) without changing nearly any of the reported statistics
(means, standard deviations, regression coefficients). The survey company confirmed that it sent
us only 500 respondents, not the 1,184 reported in the article. I obtained and reanalyzed those
data. This report presents the findings from my reanalysis, which suggest that the sample was not
just duplicated. The data were also alteredintentionally or unintentionallyin other ways, and
those alterations produced the article’s main findings. Additionally, we misreported data
characteristics as well as aspects of our analysis and findings, and we failed to report the use of
imputation for missing data. The following eight findings emerged from my reanalysis:
1) The article reports 1,184 respondents, but actually there are 500.
2) The article reports 91 counties, but actually there are 326.
3) The article describes respondents that differ substantially from those in the data.
4) The article reports two significant interaction effects, but actually there are none.
5) The article reports the effect of Hispanic growth is significant and positive, but actually it
is non-significant and negative.
6) The article reports many other findings that do not exist in the data.
7) The standard errors are stable in our published article, but not in the actual data or in
articles published by other authors using similar modeling techniques with large samples.
8) Although never mentioned in the article, 208 of the 500 respondents in the data (or 42%)
have imputed values.
*Direct correspondence to Justin T. Pickett, School of Criminal Justice, University at Albany,
SUNY, 135 Western Avenue, Albany, NY 12222 e-mail (jpickett@albany.edu).
BACKGROUND
On May 5, 2019, my coauthors and I received an email that identified data irregularities
in our study, Johnson, Stewart, Pickett, and Gertz (2011), and in four other articles by my
coauthors. I asked my coauthors to send me the full data for Johnson et al. (2011), but
encountered difficulties getting them. Consequently, I examined the limited files I already had,
and discovered 500 unique respondents and 500 duplicates, as well as several other problems,
like inexplicable changes in sample size from manuscript draft (N = 868) to published article (N
= 1,184) that did not affect means, standard deviations, or regression coefficients. I sent an email
to my coauthors on June 6 that listed these issues and provided my files (see Appendix A). One
of my coauthors, Dr. Gertz, then contacted the former director of the Research Network, who
confirmed that the survey he ran for us included only 500 respondents. At that point, Dr. Stewart
sent me a copy of the data for our article (N = 500).
In the article, we claim to have 1,184 respondents nested in 91 counties. In the actual
data, there are only 500 respondents, and they are nested in 326 counties. Dr. Stewart
acknowledged that both the sample size and county number reported in the article were wrong.
He said the explanations for the differences were that: 1) he accidently doubled the sample, and
2) he created 91 “county clusters” for the analysis by grouping together the 326 counties. The
published article never mentions county clusters, or grouping together counties. It is also unclear
how the sample of 500 grew to 1,184 in our article, and then to 1,379 in Dr. Stewart’s later
Social Problems article (Stewart et al., 2015), which uses the same data (with the same 54.8%
response rate, and same $62,700 mean family income). Duplicating the 500 respondents, as Dr.
Stewart said he did, would lead to a sample size of only 1,000.
I should note that my coauthors are working on an erratum. Dr. Stewart now says there
were two surveys conducted for our study, one with 500 respondents and one with 425, and that
the results for the combined sample (N = 925) are similar to those in the published article.
However, I am uncomfortable with the new results for four reasons. First, I have not seen them.
Dr. Stewart has not sent me the data for the second sample, and although he has sent Stata output
for the combined sample to the lead author, Dr. Johnson, he has asked him not to share it.
Second, the published article reports 1,184 respondents, not 925. Third, our published article lists
only one survey companythe Research Networkand one survey. Fourth, Dr. Stewart has
refused to tell me who conducted the second survey, and Dr. Johnson has said he does not know
who conducted it. This lack of transparency and accountability is why I have decided not to wait
for my coauthors to finish their reanalysis before asking for a retraction.
Before I discovered the duplicates in our data, Dr. Stewart had never mentioned a second
survey or any survey company besides the Research Network. The email I have from the
Research Network says it conducted one survey for Dr. Stewart (N = 500), sent that data to him
in January 2008, and then, at his request, sent the county identifiers for 420 of those same
respondents to him in May 2008. It did not conduct a second survey. Therefore, the remainder of
this report focuses on the findings from my reanalysis of the data I can account for (N = 500),
which are from the sample that our article describes (albeit inaccurately) and that Dr. Stewart
initially said he doubled accidently. I did ask Dr. Stewart for the full sample (with duplicates
included) of 1,184 respondents that he initially said he used in his original analysis for our
article. Unfortunately, he doesn’t have it, because he saved over it after dropping the duplicates.
REANALYSIS FINDINGS
The evidence suggests the data were altered in ways besides duplication. The descriptive
statistics in the published article should match those in the data, even if the 500 respondents were
accidently doubled. Doubling the sample would increase the sample size, but it would not change
its composition. However, the descriptive statistics in the published article differ substantially
from the actual data. The outcome variable in our analysis is public support for the use of
defendants’ ethnicity in sentencing decisions. The distribution of the outcome variable by
respondents’ race is shown in Figure 1 of the published article (Johnson et al., 2011: 419). Here
is how it compares to the actual data:
Figure 1. Published Article Vs. Actual Data
27%
3% 1%
38% 38%
17%
0%
10%
20%
30%
40%
50%
White Black Hispanic White Black Hispanic
Published Article Actual Data
Percentage of Respondents
Doubling the sample and then adding another 184 respondents (to get the reported sample
size of 1,184) cannot explain these discrepancies. For example, even adding 184 Black
respondents who ALL oppose ethnicity-based sentencing would reduce the percent of Blacks
supporting it from 38% to 13%, not to the article’s 3%. And this is not the only noteworthy
distributional difference. The table below compares all of the descriptive statistics in the
published article to those in the actual data.
Table 1. Descriptive Statistics: Published Article vs. Actual Data
Published Article
Actual Data
Variables
Mean
SD
Mean
SD
p-value for
Difference
Use of ethnicity in punishment
.31
.46
.37
.48
.007
Hispanic criminal threat
4.93
1.66
2.85
2.67
.000
Hispanic economic threat
1.72
1.13
1.64
1.13
.137
Hispanic political threat
4.38
1.41
1.95
1.59
.000
Percent Hispanic
.12
.11
9.52
11.61
.000
Hispanic growth
.26
1.53
3.18
3.11
.000
Homicide rate (per 100,000)
3.96
4.37
3.92
4.25
.819
Concentrated disadvantage
1.09
1.53
1.40
.94
.000
Percent Republican
53.04
13.02
52.56
13.07
.415
Percent Black
.10
.14
11.63
11.98
.000
Population structure
5.39
.70
5.11
1.00
.000
White
.86
.41
.85
.36
.606
Black
.10
.33
.10
.30
1.000
Hispanic
.04
.22
.05
.21
.494
Age
47.12
19.72
46.41
16.98
.352
Male
.47
.50
.46
.50
.591
Married
.59
.31
.61
.49
.275
Education level (college graduate)
.42
.31
.42
.49
.902
Family income
$62,700
$14,210
$61,196
$22,593
.137
Employed
.46
.50
.55
.50
.000
Political conservative
.43
.31
.70
.46
.000
Own home
.78
.33
.78
.41
.829
Southwest
.17
.41
.16
.37
.721
Northeast
.15
.35
.15
.36
.802
Midwest
.24
.43
.24
.43
.917
West
.17
.38
.17
.38
.812
South
.44
.39
.43
.50
.787
General punitive attitudes
6.84
2.16
4.69
3.57
.000
The means for 11 of the 28 variables (or 39%) differ significantly between the article and
data. For example, the published article claims that 43% of respondents are political
conservatives. In the actual data, 70% are political conservatives. Again, even doubling the
sample to 1,000, and then adding 184 liberals, would only drop the percentage of conservatives
in the sample to 59%, not to the 43% reported in the article. Similarly, the mean for Hispanic
criminal threat is almost two points higher (mean = 4.93 vs. 2.85), and the mean for general
punitive attitudes is over two points higher (mean = 6.84 vs. 4.69), in the published article than
in the actual data. Even doubling the sample and then adding 184 respondents with the highest
possible value (a value of 9) on these two variables would only increase their means to 3.81 and
5.36, respectively; both still a point lower than in the article.
Accidently doubling the sample of 500 respondents would leave the regression
coefficients unscathedthey would be identical if all respondents were duplicated. However, the
regression results in the published article differ substantially from those in the actual data. Most
notably, the main findings in the articlethe interaction effect of perceived Hispanic threat
(criminal and economic) and Hispanic growthdo not emerge with the actual data. Those
findings are reported in Table 3 of Johnson et al. (2011: 423). In the actual data, none of the
coefficients for the interaction terms are statistically significant and they are all much smaller.
This is shown in the table below.
Table 2. Interaction Effects: Published Article vs. Actual Data
Published Article
Actual Data
Variables
b
SE
Exp(b)
b
SE
Exp(b)
Variables
Perceived Hispanic threat
Criminal threat
.183*
.079
1.201
.212***
.048
1.236
Economic threat
.272*
.111
1.312
.632***
.124
1.881
Political threat
.008
.116
1.009
.045
.072
.956
Aggregate Hispanic threat
Percent Hispanic
.089
.766
.815
.032*
.015
1.032
Hispanic growth
.334**
.127
1.396
.121
.064
.886
Interactions
Criminal * His. Growth
.126*
.051
1.134
.026
.017
1.026
Economic * His. Growth
.175*
.073
1.191
.015
.055
1.015
Political * His. Growth
.101
.087
.904
.007
.022
1.007
Intercept
.848***
.119
.776***
.121
*p < .05; **p < .01; ***p < .001 (two-tailed).
The main effect of Hispanic growth also fails to replicate in the actual data; indeed, the
coefficient is in the opposite direction. This is the case even when the interaction terms are
removed from the model. In the published article, the main effect of Hispanic growth is shown
Model 2 of Table 2, and is positive and highly statistically significant (b = .288, p < .01). In the
actual data, the coefficient is negative and non-significant. The table below compares the
estimates in Johnson et al. (2011: 420) to those from the actual data. The differences are striking,
extending to many other variables besides Hispanic growth. For example, the coefficient for
Black in the published article is negative and significant (b = .628, p < .05), but it is positive
and significant in the actual data (b = .949, p < .01).
Table 3. Model 2 in Published Article vs. Actual Data
Published Article
Actual Data
Variables
b
SE
Exp(b)
b
SE
Exp(b)
Percent Hispanic
.104
.662
.901
.021
.020
1.021
Hispanic growth
.288**
.102
1.334
.070
.053
.933
Homicide rate (per 100,000)
.013
.033
.987
.031
.030
.970
Concentrated disadvantage
.051
.086
1.052
.033
.144
1.034
Percent Republican
.001
.005
1.000
.009
.009
1.009
Percent Black
.038
.173
.963
.003
.012
1.003
Population structure
.051
.174
.950
.068
.126
1.070
Black
.628*
.248
.534
.949**
.361
2.582
Hispanic
.982**
.359
.375
.752
.547
.472
Age
.016**
.005
.016
.004
.006
1.004
Male
.164**
.056
1.178
.111
.199
.895
Married
.208
.201
1.231
.425*
.210
1.529
Education level
.099
.198
.906
.215
.226
1.240
Family income
.026
.061
.974
.062
.096
1.064
Employed
.191*
.082
.826
.058
.202
.944
Political conservative
.367*
.146
1.443
.360
.218
1.434
Own home
.137
.246
.872
.158
.281
1.171
Southwest
.111
.274
.895
.721
.369
2.057
Northeast
.183
.281
.832
.522
.334
1.686
Midwest
.054
.230
1.055
.102
.291
1.107
West
.082
.264
1.085
.350
.362
.705
General punitive attitudes
.191**
.062
1.210
.175***
.031
1.191
Intercept
.858***
.105
.640***
.097
*p < .05; **p < .01; ***p < .001 (two-tailed).
There are other issues with the data that are concerning. For example, one of the
irregularities raised in the email we received was the high degree of stability in standard errors
across the three models in Table 2 of our published article (Johnson et al., 2011: 420-421). In the
article, 21 regressors are included in all three models and 19 of them (or 90%) have standard
errors that are perfectly stable. In the actual data, not a single regressor has a standard error that
is stable across the models. As the table below shows, this is the case regardless of whether the
models are estimated using logistic regression (with clustered standard errors) or multilevel
modeling. In the table, there are boxes around the stable standard errors.
Table 4. Standard-Error Stability: Published Article vs. Actual Data
Published Article
Actual Data: Logistic
Actual Data: Multilevel
Variables
Model 1:
SE
Model 2:
SE
Model 3:
SE
Model 1:
SE
Model 2:
SE
Model 3:
SE
Model 1:
SE
Model 2:
SE
Model 3:
SE
Criminal threat
.084
.048
.049
Economic threat
.111
.121
.117
Political threat
.107
.073
.073
Percent Hispanic
.662
.669
.020
.016
.018
.019
Hispanic growth
.102
.101
.053
.053
.057
.064
Homicide rate
.033
.033
.033
.030
.030
.029
.028
.028
.031
Concentrated dis.
.086
.086
.086
.134
.144
.148
.129
.146
.158
Percent Repub.
.005
.005
.005
.009
.009
.011
.010
.010
.011
Percent Black
.173
.173
.173
.012
.012
.013
.012
.013
.014
Population structure
.173
.174
.185
.119
.126
.141
.118
.125
.144
Black
.248
.248
.248
.359
.361
.389
.364
.365
.411
Hispanic
.359
.359
.359
.540
.547
.518
.604
.607
.666
Age
.005
.005
.005
.006
.006
.007
.006
.006
.007
Male
.056
.056
.056
.200
.199
.227
.202
.204
.228
Married
.201
.201
.201
.206
.210
.266
.226
.228
.255
Education level
.198
.198
.198
.225
.226
.247
.218
.218
.243
Family income
.061
.061
.061
.096
.096
.104
.096
.097
.108
Employed
.082
.082
.082
.201
.202
.228
.213
.214
.237
Political conservative
.146
.146
.146
.217
.218
.260
.228
.229
.264
Own home
.246
.246
.246
.277
.281
.313
.278
.279
.316
Southwest
.274
.274
.274
.287
.369
.369
.310
.377
.425
Northeast
.281
.281
.281
.327
.334
.384
.341
.348
.387
Midwest
.230
.230
.230
.287
.291
.332
.293
.300
.333
West
.264
.264
.264
.360
.362
.379
.366
.369
.413
General punitive
.062
.062
.062
.031
.031
.035
.033
.033
.036
Intercept
.102
.105
.119
.097
.097
.116
.101
.102
.117
NOTES: Stable standard errors are boxed in.
The observed differences in standard-error stability between our published article and the
actual data are so startling that I searched for other articles to compare. I found several published
by other prominent scholars in top journals that, in design and analysis, are comparable to ours.
Specifically, they all have large samples, include a series of multilevel regression models
examining one outcome variable (stepwise, building from a baseline model), and report standard
errors to the third decimal place:
Hagan, Shedd, and Payne (2005), American Sociological Review
Kirk (2008), Demography
Kirk and Matsuda (2011), Criminology
Sampson, Morenoff, and Raudenbush (2005), American Journal of Public Health
Slocum, Taylor, Brick, and Esbensen (2010), Criminology
Xie, Lauritsen, and Heimer (2012), Criminology
I have included the relevant regression tables from all of these articles in Appendix B of this
report. None of the articles exhibit the degree of standard-error stability we report in Johnson et
al. (2011). Instead, they all are consistent with the actual data we have, and show that standard
errors normally vary across stepwise multilevel models, the main exception being standard errors
with two leading zeros (e.g., SE = .005).
Another issue with our data is item missing values. In the file Dr. Stewart sent me, all 500
respondents have complete data on every variable. In most surveys, a substantial number of
respondents have missing values on some of the variables (e.g., income). Closer inspection of the
data reveals that mean imputation was used to impute missing values for 208 respondents (or
42% of the sample). This is problematic because in our published article we never mention
imputation, much less inform readers that we used mean imputation specifically. Dropping cases
with imputed values results in findings that look even less like those reported in the published
article (e.g., the negative coefficient for Hispanic growth becomes larger and statistically
significant [b = .202, p = .028], opposite the articles positive and significant coefficient).
CONCLUSION
There is only one possible conclusion from reanalyzing the data I have: the sample was
not just duplicated in the analysis for the published article; the data were also altered, whether
intentionally or unintentionally, and those alterations produced the article’s main findings. It may
be that appending the data I have and the data Dr. Stewart has for the second sample of 425
respondents would change this conclusion. Unfortunately, the data for the second survey are not
forthcoming; neither is an answer about who conducted that survey. Regardless, our published
article did not report a second survey, or a sample of 925; it reported one survey of 1,184
respondents by the Research Network.
Three other things are incontrovertible. First, we omitted important information that must
be reported to journal referees and readers, like the use of imputation for item missing data.
Second, we misreported data characteristics, like the number of countiesthere were at least 326
counties, not 91. Third, if Dr. Stewart grouped counties into county clusters, then we
consistently misreported our measurement of variables, analysis, and findings (emphasis added):
we matched respondents to the 91 counties where they resided to assess objective measures of
aggregate threat characteristics as well as county-level controls (pp. 412); model 1 of table 2
provides baseline estimates for the effects of county and demographic controls on the dependent
variable (p. 418); we also estimated a series of interactions between Hispanic population
growth and county-level controls (p. 423); both criminal and economic ethnic threat measures
became stronger as the Hispanic growth rate increased in the county (p. 428). It is also unclear
how exactly Dr. Stewart grouped the counties. He has not provided the code for generating
county clusters or any cluster-level data.
1
My coauthors are working on an erratum, and Dr. Stewart’s reanalysis is ongoing.
However, given 1) the number and severity of the data discrepancies, 2) the lack of transparency
about the second sample, which was not reported in the published article, and 3) the misreporting
of data characteristics, methodology, and findings, I believe we should retract our study
altogether. I have asked the editors of Criminology to do so.
1
There seems to be little reason to use county clusters in our study. It is not desirable to group together counties,
because it throws away geographic detail and creates meaningless socio-political entities (Hagen et al., 2013:
770). Typically, researchers only group together counties in historical studies that examine data over a large number
of decades or across centuries, and even in those studies they only create county clusters for those specific counties
that have boundaries that changed during the time period examined (e.g., King et al., 2009; Messner et al., 2005).
REFERENCES
Hagan, John, Carla Shedd, and Monique R. Payne. 2005. Race, Ethnicity, and Youth Perceptions
of Criminal Injustice. American Sociological Review 70:381-407.
Hagen, Ryan, Kinga Makovi, and Peter Bearman. 2013. The influence of political dynamics on
southern lynch mob formation and lethality. Social Forces 92:757-787.
Johnson, Brian D., Eric A. Stewart, Justin Pickett, and Marc Gertz. 2011. Ethnic Threat and
Social Control: Examining Public Support for Judicial Use of Ethnicity in Punishment.
Criminology 49: 401-441.
King, Ryan D., Steven F. Messner, and Robert D. Baller. 2009. Contemporary hate crimes, law
enforcement, and the legacy of racial violence. American Sociological Review 74:291-
315.
Kirk, David S. 2008. The Neighborhood Context of Racial and Ethnic Disparities in Arrest.
Demography 45:55-77.
Kirk, David S., and Mauri Matsuda. 2011. Legal Cynicism, Collective Efficacy, and the Ecology
of Arrest. Criminology 49:443-472.
Messner, Steven F., Robert D. Baller, and Matthew P. Zevenbergen. 2005. The legacy of
lynching and southern homicide. American Sociological Review 70:633-655.
Sampson, Robert J., Jeffrey D. Morenoff, and Stephen Raudenbush. 2005. Social Anatomy of
Racial and Ethnic Disparities in Violence. American Journal of Public Health 95:224-
232.
Slocum, Lee A., Terrance J. Taylor, Bradley T. Brick, and Finn-Aage Esbensen. 2010.
Neighborhood Structural Characteristics, Individual-Level Attitudes, and Youths’ Crime
Reporting Intentions. Criminology 48:1063-1100.
Stewart, Eric A., Ramiro Martinez, Jr., Eric P. Baumer, and Marc Gertz. (2015) The Social
Context of Latino Threat and Punitive Latino Sentiment. Social Problems 62: 69-92.
Xie, Min, Janet L. Lauritsen, and Karen Heimer. 2012. Intimate Partner Violence in U.S.
Metropolitan Areas: The Contextual Influences of Police and Social Services.
Criminology 50:961-992.
APPENDIX A: “FILES AND CONCERNS” EMAIL
Brian, Eric and Marc,
I have spent the day going back through all of my records for our 2011 Criminology article. I
located files and emails that, without additional data, I have difficulty attributing to any benign
explanation.
Here is the background: the data for our 2011 paper were collected in early 2008, and the paper
was written in 2009. In fall 2009, after the analysis was finished and the paper was written, we
sent it out for feedback from colleagues. One of them suggested that we control for county
political climate. In response, Eric sent me a limited version of the data, so that I could add in
county voting percentages. I was a graduate student at the time. I have attached the limited data
Eric sent me (justin_voting_data.xls), as well as the data I sent back (Voting Data.xls). Three
things concern me.
The first, and the one that worries me the least, is that there are 1,000 respondents in the data, not
1,184 as we say in the published article. Eric sent me this data in December 2009, the same
month we sent the manuscript to Criminology. Therefore, the sample size should match what we
report in the article.
What is more concerning is that half of these 1,000 respondents are duplicates. That is, the data
include two exact copies of the same 500 respondents. If you open the data set and sort on
“case,” you can see that cases 501 to 1,000 are exact copies of cases 1 to 500. To make it easier
to see, I have included another excel sheet (Case_comparison.xls) with these cases placed next to
each other. This suggests someone may have copied and pasted the rows in the original data, and
then given them new case numbers (501 to 1,000).
Just as concerning, the 500 unique respondents in the data live in 256 counties. In our published
article, however, we say that the sample includes 1,184 respondents nested within 91 counties.
Obviously, that is a big difference. We are claiming in the article that there is an average of 13
respondents per county, when the data actually include fewer than two per county. I also looked
at other studies that have used data from the Research Network. Consistent with our data (but not
with our article), they have reported fewer than two respondents per county, on average (Pickett
et al., 2014; Stupi et al., 2014).
I initially thought this must be the wrong data, even though Eric sent it to me for our paper. Then
I looked at the mean (53.04) and standard deviation (13.02) for the variable I collected
(Rep_2004). They both are identical to those we report in the published paper. Thus, the
variable’s descriptives match the published statistics, even though the data are nothing like what
we claim, either in terms of the number of respondents or the number of counties.
This led me to dig back through all of the emails I have from Eric in my old FSU account to see
if I missed something. I found two emails that worry me. The first is his response when I
apparently asked him (on September 11, 2010) about the large number of counties. He said:
To answer your question, I created an aggregate shape file that resulted in 91 usable counties.
Additionally, I also had to fix the errors from the data Jake and Gertz gave me for the counties.
The counties and codes were wrong. Once I got things clean, I was better able to use the
data. Anyway that is how I got the 91 counties.”
At the time, as a graduate student, I found this explanation convincing. Now, I do not, for several
reasons. First, a shapefile wouldn’t change the number of counties. Second, although the FIPS
codes (Full_FIPS) included in the data Eric sent were wrong (and missing for 365 respondents),
the county names (county_n) matched both the city names (city) and the core-based statistical
areas (cbsa_tit). When I collected the voting information, I used all of these variables to: 1)
double-check the counties, and then 2) distinguish counties with the same name in different
states (Adams, CO vs. Adams, NE, and Harrison, MS vs. Harrison, WV). Once I did this, the
total number of counties increased from 256 to 292. Now, maybe these county names were
wrong also, and Eric fixed them later that month (December 2009) before we submitted the
paper. If so, the percent Republican (mean, SD) would necessarily have changed when he fixed
them. You cannot go from 292 counties to 91 without some other numbers also changing.
In another email, Eric sent me the 2009 ASC presentation for our paper (attached). I opened it
and looked at the data and findings. In it we say we have a total of 868 Americans, not the 1,000
in the data, or the 1,184 reported in the published article. But many of the descriptive statistics,
regression coefficients, and standard errors are identical to those in the published article. I also
found a draft of the paper from a few months before we sent the manuscript to Criminology (also
attached). It also lists 868 respondents, not 1,184. Again, most of the descriptive statistics,
regression coefficients, and standard errors match those in the published article. I counted. There
are 293 statistics in the published article, and 258 (88%) are perfectly identical to those in the
unpublished draft, even though the article has 316 additional respondents. Where did these
respondents come from? Our published article doesn’t say anything about imputation, or adding
a later survey. Why didn’t the inclusion of 316 new respondents have a larger impact on the
statistics (means, standard deviations, regression coefficients, and standard errors)?
I remain hopeful that one of you will share a copy of the data with me that clears up these issues.
They appear quite serious to me. Without additional data and a convincing explanation, I will
have to retract my name from the article.
Best,
Justin
APPENDIX B: MULTILEVEL REGRESSION TABLES FROM
ARTICLES PUBLISHED BY OTHER AUTHORS
... Thus begun a series of events that turned into the most embarrassing scandal in the history of American criminology. The details of this episode have been documented elsewhere, most notably by Dr. Pickett (2019Pickett ( , 2020, but also by The Chronicle of Higher Education (Bartlett, 2019), Science (Chawla, 2019), and The Washington Times (Richardson, 2020), among other outlets. ...
Article
Full-text available
Disciplines in social and behavioral science have become increasingly committed to promoting social justice activism at the expense of viewpoint diversity. The present study documents the impact of this ideological project on the literature on racial disparities in criminal justice outcomes. Evidence from a recent high-profile retraction suggests that elite gatekeepers are willing to disregard ethical guidelines to censor an article embraced by conservatives. Evidence from additional case studies illustrates selective application of methodological standards based on the political implications of the findings. Serious errors are less likely to be treated as fatal if the contribution supports the activist agenda. By contrast, methodologically sound contributions are labeled as flawed if they challenge the dominant narrative. Although this manner of conduct is consistent with the evolved morality of human tribes, it violates the established norms of science. Observations from recent literature suggest that the integrity of social science research is compromised by our collective failure to value and protect contrarian perspectives. To remedy the situation, I propose that the open science movement – which has been successful at reducing questionable research practices – expand its scope from the craft of doing science to the realm of ideas, assumptions, and viewpoints. In short, I advocate for an “open minds” initiative for the improvement of social science research.
... The current study explores the generalizability of Porter and colleagues' (2019) results in the United States and expands upon Phillips and colleagues' (2021) quasi-experimental study. The replication of previous research is an essential requirement to determine the validity of findings; this process has even been considered "the Supreme Court of science" (see Collins, 1985, p. 21;Campbell & Stanley, 1963;Pickett, 2019). Our study is designed to establish a baseline understanding of officers' memory recall after an OIS scenario. ...
Article
The timing of an investigation after an officer-involved shooting (OIS) is influenced by conflicting forces. The public demands expedited resolution, but police officers are provided several protections that can delay investigations of their actions. This study conducts a randomized experiment to determine the impact of question timing after an OIS on the accuracy of police officers’ memory recall. Officers were randomly assigned to one of two groups. The treatment group completed a questionnaire after participating in a live-action, active shooter training scenario and again 2 days later, whereas the control group only completed the questionnaire 2 days later. Our findings suggest the timing of interviews after training did not influence officers’ recall of the scenario. There is little empirical understanding of how police officers reconstruct OIS events; further interdisciplinary research can help clarify these cognitive processes. This research could strengthen a traditional pathway to provide accountability for officers through investigations.
Article
Criminology produces policy-relevant research and criminologists often seek to influence practice, but most criminological research is confined to expensive subscription journals. This disadvantages researchers in the global south, policy makers and practitioners who have the skills to use research findings but do not have journal subscriptions. Open access seeks to increase availability of research, but take-up among criminologists has been low. This study used a sample of 12,541 articles published in criminology journals between 2017 and 2019 to estimate the proportion of articles available via different types of open access. Overall 22% of research was available to non-subscribers, about half that found in other disciplines, even though authors had the right to make articles open without payment in at least 95% of cases. Open access was even less common in many leading journals and among researchers in the United States. Open access has the potential to increase access to research for those outside academia, but few scholars exercise their existing rights to distribute freely the submitted or accepted versions of their articles online. Policies to incentivise authors to make research open access where possible are needed unlock the benefits of greater access to criminological research.
Article
Full-text available
Existing literature focuses on economic competition as the primary causal factor in Southern lynching. Political drivers have been neglected, as findings on their effects have been inconclusive. We show that these consensus views arise from selection on a contingent outcome variable: whether mobs intent on lynching succeed. We constructed an inventory of averted lynching events in Georgia, Mississippi, and North Carolina—instances in which lynch mobs formed but were thwarted, primarily by law enforcement. We combined these with an inventory of lynching and analyzed them together to model the dynamics of mob formation, success, and intervention. We found that low Republican vote share is associated with a higher lethality rate for mobs. Lynching is better understood as embedded in a post-conflict political system, wherein all potential lynching events, passing through the prism of intervention, are split into successful and averted cases.
Article
Full-text available
Ethnographic evidence reveals that many crimes in poor minority neighborhoods evade criminal justice sanctioning, thus leading to a negative association between the proportion of minority residents in a neighborhood and the arrest rate. To explain this finding, we extend recent theoretical explications of the concept of legal cynicism. Legal cynicism refers to a cultural orientation in which the law and the agents of its enforcement are viewed as illegitimate, unresponsive, and ill equipped to ensure public safety. Crime might flourish in neighborhoods characterized by legal cynicism because individuals who view the law as illegitimate are less likely to comply with it; yet because of legal cynicism, these crimes might go unreported and therefore unsanctioned. This study draws on data from the Project on Human Development in Chicago Neighborhoods to test the importance of legal cynicism for understanding geographic variation in the probability of arrest. We find that, in neighborhoods characterized by high levels of legal cynicism, crimes are much less likely to lead to an arrest than in neighborhoods where citizens view the police more favorably. Findings also reveal that residents of highly cynical neighborhoods are less likely to engage in collective efficacy and that collective efficacy mediates the association between legal cynicism and the probability of arrest.
Article
Full-text available
We analyzed key individual, family, and neighborhood factors to assess competing hypotheses regarding racial/ethnic gaps in perpetrating violence. From 1995 to 2002, we collected 3 waves of data on 2974 participants aged 18 to 25 years living in 180 Chicago neighborhoods, augmented by a separate community survey of 8782 Chicago residents. The odds of perpetrating violence were 85% higher for Blacks compared with Whites, whereas Latino-perpetrated violence was 10% lower. Yet the majority of the Black–White gap (over 60%) and the entire Latino–White gap were explained primarily by the marital status of parents, immigrant generation, and dimensions of neighborhood social context. The results imply that generic interventions to improve neighborhood conditions and support families may reduce racial gaps in violence.
Article
Full-text available
This study assesses the role of social context in explaining racial and ethnic disparities in arrest, with afocus on how distinct neighborhood contexts in which different racial and ethnic groups reside explain variations in criminal outcomes. To do so, I utilize a multilevel, longitudinal research design, combining individual-level data with contextual data from the Project on Human Development in Chicago Neighborhoods (PHDCN). Findings reveal that black youths face multiple layers of disadvantage relative to other racial and ethnic groups, and these layers work to create differences in arrest. At the family level, results show that disadvantages in the form of unstable family structures explain much of the disparities in arrest across race and ethnicity. At the neighborhood level, black youths tend to reside in areas with both significantly higher levels of concentrated poverty than other youths as well as lower levels of collective efficacy than white youths. Variations in neighborhood tolerance of deviance across groups explain little of the arrest disparities, yet tolerance of deviance does influence the frequency with which a crime ultimately ends in an arrest. Even after accounting for relevant demographic, family, and neighborhood-level predictors, substantial residual arrest differences remain between black youths and youths of other racial and ethnic groups.
Article
Prior research on the racial threat perspective and social control typically relies on aggregate-level demographic measures and focuses on racial, rather than on Latino group, composition. This predominant focus in research on racial threat and social control makes it unclear whether the assumed linkages are confined to one subordinate group or whether other groups, such as Latinos, are viewed as threatening and elicit heightened social control reactions as well. In the current study, we use data from the Punitive Attitudes Toward Hispanic (PATH) Study, a national sample of U.S. residents to investigate the influence of macro-and micro-level measures of Latino group threat on punitive Latino sentiment. More specifically, we use multilevel models to detect direct and interactive relationships between Latino presence and perceived Latino threat on punitive sentiment. The findings show that Latino population growth and perceived Latino criminal and economic threat significantly predict punitive Latino sentiment. Additionally, multiplicative models suggest that the effect of perceived criminal threat on punitive Latino sentiment is most pronounced in settings that have experienced recent growth in the size of the Latino population.
Article
This paper advances a comparative conflict theory of racial and ethnic similarities and differences in youth perceptions of criminal injustice. We use HLM models to test six conflict hypotheses with data from more than 18,000 Chicago public school students. At the micro-level African American youth are more vulnerable to police contacts than are Latinos, who are more at risk than whites, and there is a corresponding gradient in minority group perceptions of injustice. When structural sources of variation in adolescents' experiences are taken into account, however, minority youth perceptions of criminal injustice appear more similar to one another, while remaining distinct from those of white youth. At the micro-level, Latino youth respond more strongly and negatively to police contacts, even though they experience fewer of them. At the macrolevel, as white students in schools increase cross-sectionally, perceptions of injustice among both African American and Latino youth at first intensify and then ultimately abate. Although there are again signs of a gradient, African American and Latino responses to school integration also are as notable in their similarities as in their differences. Reduced police contacts and meaningful school integration are promising mechanisms for diminishing both adolescent African American and Latino perceptions of criminal injustice.
Article
This article investigates the association between past lynchings (1882 to 1930) and contemporary law enforcement responses to hate crimes in the United States. While prior research indicates a positive correlation between past levels of lynching and current social control practices against minority groups, we posit an inverse relationship for facets of social control that are protective of minorities. Specifically, we hypothesize that contemporary hate crime policing and prosecution will be less vigorous where lynching was more prevalent prior to 1930. Analyses show that levels of past lynching are associated with three outcome variables germane to hate crime policing and prosecution, but the effect of lynching is partly contingent on the presence of a minority group threat. That is, past lynching combined with a sizeable black population largely suppresses (1) police compliance with federal hate crime law, (2) police reports of hate crimes that target blacks, and in some analyses (3) the likelihood of prosecuting a hate crime case. Our findings have implications for research on law and intergroup conflict, historical continuity in the exercise of state social control, and theories that emphasize minority group threat.
Article
This article assesses the influence of the legacy of lynching on homicide levels within the contemporary South. Drawing upon literature relating to the brutalizing effects of capital punishment and "self-help" in the absence of access to formal law, this study hypothesizes that a measure of the frequency of lynching in the past will exhibit positive effects on contemporary homicide levels for the overall population and for race-specific populations (white and black offending). The results of negative binomial regression analyses of counties and county-clusters in the South are generally consistent with expectations. The measure of lynching exhibits consistently positive effects on overall homicide levels and levels of black offending in models with controls for other theoretically relevant covariates. For whites, the effect of lynching emerges for a particular type of homicide: interracial homicides that evolve out of interpersonal conflicts. At a general level, our findings underscore the relevance of the historical context for understanding variation in contemporary levels of homicide.
Article
Although community responses to the problem of intimate partner violence typically focus on increasing and improving policing and social services, few studies have examined the relationship among police force size, social service providers, and women's safety at home. To address this issue, we use data from the National Crime Victimization Survey to examine patterns of intimate partner violence for 40 metropolitan statistical areas (MSAs) over a 16‐year period (1989–2004). We analyze the data using three‐level multilevel models, with individual respondents (N = 487,166) nested within years, nested within MSAs. Net of other important individual and contextual factors, the results show that women's likelihood of victimization is significantly lower in MSAs that employ more sworn officers per capita, whereas the states’ mandatory arrest laws are not found to have significant independent effects. Above and beyond the effects of police force size, we also find a significant negative relationship between the size of the social service workforce and intimate partner violence. Future research should develop collaborative data collection efforts to examine the specific activities of police and social service workers in dealing with intimate partner violence so that the mechanisms underlying these significant relationships can be understood more clearly.
Article
Research on social inequality in punishment has focused for a long time on the complex relationship among race, ethnicity, and criminal sentencing, with a particular interest in the theoretical importance that group threat plays in the exercise of social control in society. Prior research typically relies on aggregate measures of group threat and focuses on racial rather than on ethnic group composition. The current study uses data from a nationally representative sample of U.S. residents to investigate the influence of more proximate and diverse measures of ethnic group threat, examining public support for the judicial use of ethnic considerations in sentencing. Findings indicate that both aggregate and perceptual measures of threat influence popular support for ethnic disparity in punishment and that individual perceptions of criminal and economic threat are particularly important. Moreover, we find that perceived threat is conditioned by aggregate group threat contexts. Findings are discussed in relation to the growing Hispanic population in the rapidly changing demographic structure of U.S. society.