ChapterPDF Available

Correctional Offender Management Profiles for Alternative Sanctions (COMPAS)

Authors:

Abstract and Figures

This chapter describes technical design, validation, guiding values and purposes of the COMPAS system. It is used at sequential stages of criminal justice, including: pretrial and community corrections, probation, jail, prisons, and parole. Its goals include accurate risk assessment, comprehensive needs assessment, public safety, institutional safety, fairness and racial equity, and ease of use. For incarcerated offenders, it supports decisions regarding internal management, treatment, and referrals to community programs as alternatives to incarceration. Periodic re-validation, re-norming, and calibration studies are conducted. COMPAS selection of risk and need factors is based on a scientific foundation of explanatory theories of crime, meta-analytic research, and current evidence on predictive validity. Many output reports are designed to guide staff in the use of the Risk-Need-Responsivity (RNR) principles. A large body of research supports the predictive validity and scale reliability of COMPAS including both internal validation studies and external studies by independent research teams in diverse states. In addition to the adult core version of COMPAS, customized versions are developed for youth, women, and long-term incarcerated prisoners to support internal management in jails and prisons, reentry planning, and post-release arrangements. Current developments involve advanced machine learning (ML) tools for risk prediction, and a theory-guided internal classification (IC) to address specific responsivity among incarcerated prisoners.
Content may be subject to copyright.
http://cjb.sagepub.com
Criminal Justice and Behavior
DOI: 10.1177/0093854808326545
2009; 36; 21 Criminal Justice and Behavior
Tim Brennan, William Dieterich and Beate Ehret
System
Evaluating the Predictive Validity of the Compas Risk and Needs Assessment
http://cjb.sagepub.com/cgi/content/abstract/36/1/21
The online version of this article can be found at:
Published by:
http://www.sagepublications.com
On behalf of: International Association for Correctional and Forensic Psychology
can be found at:Criminal Justice and Behavior Additional services and information for
http://cjb.sagepub.com/cgi/alerts Email Alerts:
http://cjb.sagepub.com/subscriptions Subscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://cjb.sagepub.com/cgi/content/refs/36/1/21 Citations
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
EVALUATING THE PREDICTIVE
VALIDITY OF THE COMPAS RISK AND
NEEDS ASSESSMENT SYSTEM
TIM BRENNAN
WILLIAM DIETERICH
BEATE EHRET
Northpointe Institute for Public Management Inc.
This study examines the statistical validation of a recently developed, fourth-generation (4G) risk–need assessment system
(Correctional Offender Management Profiling for Alternative Sanctions; COMPAS) that incorporates a range of theoretically
relevant criminogenic factors and key factors emerging from meta-analytic studies of recidivism. COMPAS’s automated scor-
ing provides decision support for correctional agencies for placement decisions, offender management, and treatment planning.
The article describes the basic features of COMPAS and then examines the predictive validity of the COMPAS risk scales by
fitting Cox proportional hazards models to recidivism outcomes in a sample of presentence investigation and probation intake
cases (N=2,328). Results indicate that the predictive validities for the COMPAS recidivism risk model, as assessed by the area
under the receiver operating characteristic curve (AUC), equal or exceed similar 4G instruments. The AUCs ranged from .66
to .80 for diverse offender subpopulations across three outcome criteria, with a majority of these exceeding .70.
Keywords: COMPAS; predictive validity; survival analysis; risk assessment; probation; area under the curve (AUC); crim-
inogenic needs
In a recent review of the state of the art in correctional assessment, Andrews, Bonta, and
Wormith (2006) identified Correctional Offender Management Profiling for Alternative
Sanctions (COMPAS; Northpointe Institute for Public Management, 1996) as an example
of an emerging fourth-generation (4G) approach to correctional assessment. They also
noted that the available 4G approaches, with the exception of the Level of Service/Case
Management Inventory (LS/CMI; Andrews, Bonta, & Wormith, 2004), are relatively new
and that validation evidence is still required for these newer approaches. This article
assesses several key aspects of scale reliability and validity for the COMPAS system.
TRENDS IN CORRECTIONAL ASSESSMENT
The past three decades in correctional practice have seen a progression from first-generation
(1G) to currently emerging 4G assessment approaches (Andrews et al., 2006; Blanchette &
Brown, 2006; Bonta, 1996; Clements, 1996). These developments occurred as successive
generations of assessment and classification methods addressed the more obvious weaknesses
of prior phases. These phases and their main characteristics are described below.
The 1G approach relied on clinical and professional judgment in the absence of any
explicit or objective scoring rules. It dominated corrections for several decades and remains
preferred by many correctional decision makers (Boothby & Clements, 2000; Wormith,
2001). Its weaknesses include excessive subjectivity, inconsistency, bias and potential
21
CRIMINAL JUSTICE AND BEHAVIOR, Vol. 36 No. 1, January 2009 21-40
DOI: 10.1177/0093854808326545
© 2009 International Association for Correctional and Forensic Psychology
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
stereotyping, legal vulnerability, and lower predictive validity than structured objective
methods (Brennan, 1987; Grove & Meehl, 1996; Hastie & Dawes, 2001).
Second-generation (2G) assessments adopted an empirical approach that mainly relied
on simple additive point scales, often with only a few standardized factors (e.g., Austin,
1983; S. D. Gottfredson, 1987; Hoffman, 1994). These mostly reflected Dawes’s (1979)
description of “improper” linear models (p. 571) because the selected factors and weight-
ings were often established by common sense or professional consensus rather than by sta-
tistical methods. These methods primarily focused on risk prediction, brevity, and
efficiency. The main criticisms included lack of theoretical background, limited coverage
of risk and need factors, neglect of dynamic (changeable) risk factors, lack of treatment
implications, weak explanatory value, and questionable relevance for female offenders
(Blanchette & Brown, 2006; Jones, 1996). However, as noted by Dawes, these linear
models are often surprisingly effective in terms of predictive validity and generally outper-
formed professional judgment or the opinions of trained experts (Grove & Meehl, 1996;
Hastie & Dawes, 2001; Mossman, 1994).
Third-generation (3G) assessments of the late 1970s and 1980s introduced a more
explicit, empirically based, and theory-guided approach and a broader selection of crim-
inogenic factors. In addition, some of these factors were designed to be dynamically sensi-
tive to change. The Level of Service Inventory–Revised (LSI-R; Andrews & Bonta, 1995)
exemplified these trends and perhaps has become the most widely used risk and need
assessment in corrections. However, 3G methods, including the LSI-R, eventually were
criticized for a narrow theoretical focus (mainly social learning theory), a failure to address
gender sensitivity, a dominant focus on risk, and failure to assess offender strengths or pro-
tective factors as emphasized in the “good lives” model (Andrews et al., 2006; Blanchette
& Brown, 2006; Bloom, 2000; Reisig, Holtfreter, & Morash, 2006; Ward & Stewart, 2003).
Regarding 4G assessments, Andrews et al. (2006) identified several instruments as rep-
resenting this category, including the Correctional Assessment and Intervention System
(National Council on Crime and Delinquency, 2006), LS/CMI, and COMPAS. Several gen-
eral features appear to characterize 4G approaches. These include (a) a broader selection of
explanatory theories, (b) broader range of risk and need factors (content validity), (c) incor-
poration of the strengths or resiliency perspective, (d) more advanced statistical modeling,
(e) seamless integration of the need or risk domain with the agency management informa-
tion system, and (f) criminal justice databases and Web-based implementation of assess-
ment technology. Such integration allows users to track offenders from intake to case
closure to support sequential case management monitoring, information feedback, and
decision making. COMPAS has incorporated all of these features; interested readers may
obtain full details in Brennan, Dieterich, and Oliver (2007).
The goals of this article are threefold. First, it describes the general design features and
technical overview of the COMPAS system. Second, it assesses the reliabilities of the
COMPAS scales for both male and female offenders. Third, it assesses the predictive valid-
ity of the COMPAS scales for both males and females.
BASIC DESIGN FEATURES OF COMPAS
COMPAS is an automated decision-support software package that integrates risk and
needs assessment with several other domains, including sentencing decisions, treatment
22 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
and case management, and recidivism outcomes. Documentation of the full software func-
tionality is available at www.northpointeinc.com, and detailed information on the various
risk and need scales is provided in the appendix. Beyond the integration of separate data-
bases, the following design features of COMPAS further advance and support evidence-
based practice (EBP) in criminal justice agencies.
THEORY-GUIDED ASSESSMENT
Ideally, explanatory theories of criminality should guide the selection of scale content of
an assessment system. Criminologists have long lamented the lack of theory-guided assess-
ments (Bonta, 2002; Clements, 1996; Jones, 1996). Thus, 4G systems have a strong empha-
sis on theory-guided assessment. In contrast to the LSI-R, which was designed primarily
around a social learning explanation (Andrews & Bonta, 1998), COMPAS broadens the
theoretical coverage to include key constructs from low self-control theory, strain theory or
social exclusion, social control theory (bonding), routine activities–opportunity theory, sub-
cultural or social learning theories, and a strengths or good lives perspective.
BROADBAND COMPREHENSIVE ASSESSMENT
Second, 4G approaches introduce a broader comprehensive coverage of criminogenic
factors to match the theoretical and explanatory complexity of criminal behavior and to
provide sufficient explanatory information to guide case interpretation and intervention
planning. Thus, COMPAS includes both theoretically relevant factors and the critical eight
criminogenic predictive factors that emerged from recent meta-analytic studies (Andrews
et al., 2006; Gendreau, Little, & Goggin, 1996; Lipsey & Derzon, 1998; Lösel, 1995). The
2G approaches generally reflected the opposite trend by minimizing and simplifying
assessment to reduce workload burden on staff, which, not unexpectedly, resulted in
extreme poverty of explanatory components and an almost total lack of treatment guidance
(Austin, 1983; Glaser, 1987; Palmer, 1992).
INTEGRATION OF THE STRENGTH OR RESILIENCY PERSPECTIVE
The strength-based or good lives approach (Andrews et al., 2006; Ward & Brown, 2004)
is a natural extension of the shift toward more comprehensive assessment. In their review,
Andrews et al. (2006) suggested that measures of strengths and well-being are “highly rel-
evant” (p. 23) for correctional assessments. To address this issue, COMPAS includes a
number of strength and protective factors that have shown empirical support for potential
risk reduction and protecting offenders from the full impact of criminogenic needs. These
include job and educational skills, history of successful employment, adequate finances,
safe housing, family bonds, social and emotional support, noncriminal parents and friends,
and so on.
MORE ADVANCED STATISTICAL MODELS
4G assessments, in contrast to earlier approaches, are beginning to use more advanced
statistical methods for predictive modeling and classification. Although Burgess-type,
equally weighted, linear models have performed reasonably well (S. D. Gottfredson, 1987;
Mossman, 1994), powerful multivariate, model-averaging, mixed-model ensemble methods
Brennan et al. / EVALUATING COMPAS 23
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
and artificial intelligence are now entering correctional assessment approaches. For example,
the COMPAS risk and classification models use logistic regression, survival analysis, and
bootstrap classification methods in a broad repertoire of prediction and classification proce-
dures (Brennan, Breitenbach, & Dieterich, 2008). The present article specifically examines its
predictive models for recidivism based on survival analysis (Cox regression).
INTEGRATION WITH CRIMINAL JUSTICE DATABASES TO FACILITATE EBP
Another feature of 4G methods, including COMPAS, is the seamless integration of the
risk and needs domain with separate domains of sentencing decisions, institutional pro-
cessing and placement decisions, case management decisions, treatments given (type and
amount), and various outcomes (across time). This integration provides support for correc-
tional agencies in implementing EBP studies (see Andrews et al., 2006; Brennan, Wells, &
Alexander, 2004).
The COMPAS system includes two additional design features of some note: a treatment-
explanatory classification to support staff with specific responsivity decisions and gender-
sensitive calibration, described below.
TREATMENT-EXPLANATORY CLASSIFICATION TO ADDRESS SPECIFIC RESPONSIVITY
Andrews et al. (2006) suggested that specific responsivity of offenders is the least
explored of their risk–need–responsivity principles. Yet specific responsivity is a critical
and recurrent challenge for treatment providers in matching individual offenders to appro-
priate treatment regimes (Brennan, 2008a; Meier, 2002; Millon & Davis, 1997; Warren &
Hindelang, 1979). COMPAS addresses specific responsivity and client–treatment matching
using two well-known approaches. First, it provides a person-centered assessment chart of
decile scores for each risk and need scale. Second, following the lineage of Marguerite
Warren, Ted Palmer, and others (also see Harris & Jones, 1999; Van Voorhis, 1994), COM-
PAS provides a treatment-relevant typology that integrates risk and need. This explanatory
typology identifies and demarcates several specific pathways that may guide differential
targeting and programming for diverse offender types who belong in one particular path-
way. Although the present article does not address this treatment-relevant taxonomy,
detailed descriptions of these pattern-seeking methods are presented in Brennan et al.
(2008) and Brennan (2008b).
GENDER-SENSITIVE ASSESSMENT
A major criticism of 2G and 3G approaches is that they largely based their assessment
and classification methods on dominantly male samples and then mechanically applied
these to female offenders (Blanchette & Brown, 2006; Bloom, 2000; Brennan, 2008b; Farr,
2000; Hannah-Moffatt & Shaw, 2001). However, compelling arguments are now advanced
for systematic validation of instruments on separate female and male offender samples
(Hardyman & Van Voorhis, 2004; Holtfreter & Cupp, 2007). COMPAS addresses this issue,
first by using separate samples of males and females to develop gender-specific calibrations
of all risk and need factors and second by evaluating its predictive and classification models
on separate male and female samples (see below). Plans are also under way to incorporate
additional gender-specific factors from recent research on gender-sensitive risk and need
factors into COMPAS (Salisbury, Van Voorhis, & Wright, 2006).
24 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
METHOD
PARTICIPANTS
Participants were 2,328 individuals who were assessed with COMPAS as part of their
processing at entry into probation agencies. All individuals with complete data who were
administered the full COMPAS assessment were included. The sample represented about
15% of all COMPAS assessments conducted at these agencies during this period.
Individuals were excluded if they were missing data (25%) or if they were not administered
the full COMPAS (60%). Women composed 19% of the sample. The ethnic composition of
the sample is 76% White, 15% African American, 7% Latino, and 2% Other. The average
age of participants was 31.9 years (range =18.0 to 69.7). In the sample overall, 45% of the
presenting offenses were misdemeanors, 48% nonassaultive felonies, and 7% assaultive
felonies. The median number of prior arrests was 3 (range =0 to 57). Among the probation
cases, 9% were split-sentence cases (jail and probation).
ADMINISTRATION
The assessments were conducted by local probation officers between January 2001 and
December 2004 at 18 county-level probation agencies in an eastern state. Interviews were
conducted at the point of presentence investigation (PSI) or at probation intake (approxi-
mately 50% each). Staff and supervisors take a 2-day COMPAS training program that cov-
ers relevant interview techniques, response categories, item meanings, and quality
assurance issues. Official criminal records are used to complete the current offense and
criminal history sections of COMPAS prior to the interviews. The interviews typically
require approximately 45 to 60 min, depending on the extent of probing.
MEASURES
COMPAS scales. This study examined the predictive validity of all the COMPAS base
scales (listed in Table 1) and also the main Recidivism Risk Scale. The Recidivism Risk
Scale is a regression model that has been used in COMPAS since 2000. This regression
model was trained to predict new offenses in a probation sample. The system transforms a
linear predictor from the regression model to a decile score. The system calculates a recidi-
vism risk decile score by referencing an appropriate COMPAS norm group. For the current
analyses, a gender-specific composite norm group was used. The composite norm group
(n =7,381) was constructed from COMPAS assessment data collected in prison (34%), jail
(14%), and probation (53%). The set of base scales included criminal involvement, history
of noncompliance, history of violence, current violence, criminal associates, substance
abuse, financial problems, vocational or educational problems, family criminality, social
environment, leisure, residential instability, social isolation, criminal attitudes, and crimi-
nal personality.
Dependent variables: Three outcomes for survival analyses. We matched COMPAS
assessment data with computerized official criminal history records and constructed multiple-
record survival data sets using assessment and event dates in the criminal history data.
These included crime dates, arrest dates, dispositions, disposition dates, sentence type, and
sentence length. The three outcomes selected as dependent variables for this study included
Brennan et al. / EVALUATING COMPAS 25
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
(a) an arrest for any offense, (b) an arrest for a person offense, and (c) an arrest for a felony
offense.
We defined an offense as a finger-printable arrest involving a charge and filing for any
uniform crime reporting (UCR) code. A person offense is a finger-printable arrest involv-
ing a charge and filing for any UCR code for murder, voluntary manslaughter, forcible rape,
robbery, aggravated assault, simple assault, burglary (with weapon of an occupied
dwelling), dangerous weapons, sex offenses, extortion, arson, and kidnap. This category
includes misdemeanor and felony offenses.
ANALYSIS
We fitted separate cause-specific Cox proportional hazard models to each recidivism
outcome (Kalbfleisch & Prentice, 2002). Analysis time is the number of days from COM-
PAS assessment date to first failure or end of study, whichever occurred first. As mentioned,
the assessments were conducted between January 2001 and December 2004. The end of
study is the date of the recidivism outcomes computer match (March 3, 2006). We deter-
mined the failure time point from the offense date associated with the recidivism outcome
of interest. Cases remained in the risk set and contributed information to the analyses until
the point of failure or end of study, whichever occurred first. The models controlled for
intermittent periods of incarceration during the follow-up by removing the case from the
risk set during these intermittent gaps. We also removed split-sentence cases from the risk
set during the time they were in jail. We removed PSI cases that received a subsequent jail
or prison sentence from the risk set during the incarceration period. In the felony offense
model, there were 433 cases with a gap, with an average time on gap of 206 days (range =
1 to 1,440 days). The median time at risk in the felony offense model was 759 days (range =
1 to 1,722 days).
First, we fitted a series of univariable Cox survival models in which the hazard for each
of the three recidivism outcomes was regressed on each COMPAS base scale. These
26 CRIMINAL JUSTICE AND BEHAVIOR
TABLE 1: Alpha Coefficients and Their Differences Between Women and Men, With Pointwise 95%
Confidence Intervals (CIs)
αWomen Men Difference Lower Bound 95% CI Upper Bound 95% CI
Criminal Involvement .87 .85 .87 –.02 –.04 .01
History of Noncompliance .68 .62 .67 –.04 –.11 .02
History of Violence .73 .70 .73 –.03 –.07 .02
Current Violence .59 .62 .59 .03 –.03 .09
Criminal Associates .68 .68 .68 .00 –.05 .05
Substance Abuse .79 .81 .78 .03 .00 .06
Financial Problems .73 .75 .72 .03 –.02 .07
Vocational or Educational .71 .73 .71 .02 –.02 .06
Family Criminality .63 .64 .63 .02 –.04 .07
Social Environment .74 .70 .74 –.04 –.09 .01
Leisure .82 .81 .83 –.02 –.05 .01
Residential Instability .65 .61 .65 –.05 –.11 .01
Social Isolation .81 .82 .81 .01 –.02 .04
Criminal Attitudes .82 .82 .82 .01 –.02 .03
Criminal Personality .76 .76 .76 .00 –.03 .04
Note
. The difference in alphas is significant if the 95% confidence interval does not include zero.
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
models were fitted in three partitions of the sample: the full sample, men only, and women
only. Next, two multivariate models were fitted to each recidivism outcome in each parti-
tion. Model I included all the COMPAS base scales. Model II included all the COMPAS
base scales plus age at first arrest. Finally, a model that included only the Recidivism Risk
Scale was fitted to each recidivism outcome in each partition.
To gauge the predictive utility of the above two COMPAS multivariate base scale models
and the Recidivism Risk Scale model, we estimated the area under the receiver operating
characteristic curve (AUC) for all three models. For survival models, the most relevant
measure of predictive discrimination is the concordance index, which is equivalent to the
AUC and is defined as the probability that the predictor values and survival times for a pair
of randomly selected cases are concordant. A pair is concordant if the case with the higher
predictor value has a shorter survival time. The calculation is based on the number of all
possible pairs of nonmissing observations for which survival time can be ordered and the
proportion of relevant pairs for which the predictor and survival time are concordant
(Harrell, Califf, Pryor, Lee, & Rosati, 1982).
RESULTS
RELIABILITY
Table 1 provides reliability coefficients (Cronbach’s α) to indicate the internal consis-
tency of the core COMPAS scales both for the total sample and by gender. Alpha is the
most widely used measure of internal consistency of summative scales. By convention,
alpha coefficients of .70 or higher indicate satisfactory reliability. The table indicates that a
large majority of these alphas are in the satisfactory range of close to or above .70 with only
a few exceptions (current violence, family criminality, and residential instability), and these
are close to an acceptable range. The alpha coefficients were statistically equivalent for
both genders.
PREDICTIVE VALIDITY
Table 2 provides a summary of the survival experience of men and women for the three
outcomes examined in the study. For each offense type, the table shows the number of fail-
ures that occurred during each year and the estimated survivor function at the end of each
year. The survivor function is the probability of surviving beyond Time t, given survival up
to Time t. The survivor function is interpreted as the cumulative proportion surviving over
time. Note that only the first 4 years of the follow-up are shown.
Table 3 shows the results from univariable Cox regressions of the hazard for a new
felony offense on each COMPAS base scale. The results indicate that all except three of
these base scales reach significant levels in predicting felony recidivism. The scales that did
not attain significance were current violence, financial problems, and residential instability.
Note that substance abuse has a negative effect on felony recidivism. This may have
resulted from the selected outcome variable (felony recidivism) for this analysis. Drug
offenders (overall) may be at lower risk for new arrests for serious felony offenses.
Table 4 shows the results of the multivariate Cox regression of the hazard of a felony
offense on the COMPAS base scales. In this model, which combines all COMPAS base
Brennan et al. / EVALUATING COMPAS 27
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
scales, the significant predictors of felony recidivism are history of noncompliance, crimi-
nal associates, substance abuse, financial problems, vocational or educational, and high
crime (social) environment. As often occurs when multiple predictor variables are used and
when some are correlated, the signs of certain parameters can take unexpected directions.
Table 5 shows the results of the Cox regression of the hazard for a new felony offense
on the levels of the Recidivism Risk Scale. The hazard of a new felony offense for cases
that score high on the Recidivism Risk Scale is 5.66 times the hazard of cases that score
low (pvalue <.001). The hazard of the high-risk group relative to the medium-risk group
is 1.84 (pvalue <.001).
28 CRIMINAL JUSTICE AND BEHAVIOR
TABLE 2: Number Failing Each Year and Survivor Function Through the End of Each Year for Any
Offense, Offenses Against Persons, and Felony Offenses for Men and Women
Women (
n
=449) Men (
n
=1,879)
Offense Type Number Failing Each Year Survivor Function Number Failing Each Year Survivor Function
Any
1st year 64 .85 308 .81
2nd year 32 .76 180 .69
3rd year 14 .70 63 .62
4th year 10 .58 41 .52
Person
1st year 11 .97 114 .93
2nd year 10 .95 90 .87
3rd year 5 .92 33 .83
4th year 2 .89 23 .78
Felony
1st year 16 .96 134 .92
2nd year 12 .93 95 .85
3rd year 7 .89 31 .82
4th year 1 .88 20 .77
Note
. The description of the survival experience is limited to the first 4 years of the follow-up.
TABLE 3: Results From Univariable Cox Proportional Hazards Models Regressing the Hazard for a
Felony Offense on Each COMPAS Base Scale
coeff exp(coeff)
SE
(coeff)
p
Value
Criminal involvement 0.033 1.03 0.013 .008
History of noncompliance 0.148 1.16 0.020 .000
History of violence 0.108 1.11 0.018 .000
Current violence 0.101 1.11 0.052 .052
Criminal associates 0.148 1.16 0.022 .000
Substance abuse –0.091 0.91 0.023 .000
Financial problems 0.009 1.01 0.024 .691
Vocational or educational 0.118 1.13 0.014 .000
Criminal attitudes 0.057 1.06 0.009 .000
Family criminality 0.100 1.11 0.036 .006
Social environment 0.257 1.29 0.032 .000
Leisure 0.073 1.08 0.016 .000
Residential instability 0.016 1.02 0.014 .277
Criminal personality 0.048 1.05 0.007 .000
Social isolation 0.029 1.03 0.010 .003
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
Figure 1 shows a plot of the Nelson–Aalen (Aalen, 1978; Nelson, 1972) estimator of the
cumulative hazard function H
ˆ
(t) within levels of the Recidivism Risk Scale. The estimator
is defined as
H
ˆ
(t) =
j:tjt
dj
nj
,
where njstands for the number of cases in the risk set just before Time tjand djis the
number of failures at Time tj. The cumulative hazard is the expected number of failures for
an individual as a function of time, if failure was a repeatable process. Figure 1 also shows
the size of the risk set (nj) at 180-day intervals in each level of the Recidivism Risk Scale.
Although the maximum follow-up time in the data is 1,887 days, the time axis in the plot
is truncated to 1,620 days because the risk set is small and the estimates less precise beyond
this point.
Finally, we assessed the discriminatory power of the three models for predicting the
three criteria of interest (general offenses, offenses against persons, and felony offenses) in
each of the three partitions of the sample. As described previously, Model I includes all
COMPAS base scales, Model II adds age at first arrest to these COMPAS base scales, and
Model III represents the COMPAS Recidivism Risk Scale. Again, because we are fitting
survival models, we estimate Harrell’s concordance index (Harrell et al., 1982) and inter-
pret it as the AUC. A rule of thumb according to several recent articles is that AUCs of .70
Brennan et al. / EVALUATING COMPAS 29
TABLE 4: Results From Multivariable Cox Proportional Hazards Model Regressing the Hazard for a
Felony Offense on the COMPAS Base Scales
coeff exp(coeff)
SE
(coeff)
p
Value
Criminal involvement 0.003 1.00 0.018 .854
History of noncompliance 0.124 1.13 0.032 .000
History of violence 0.026 1.03 0.024 .276
Current violence 0.001 1.00 0.057 .982
Criminal associates 0.072 1.07 0.028 .009
Substance abuse –0.134 0.87 0.027 .000
Financial problems –0.059 0.94 0.026 .025
Vocational or educational 0.081 1.08 0.017 .000
Criminal attitudes 0.014 1.01 0.011 .212
Family criminality –0.001 1.00 0.040 .978
Social environment 0.117 1.12 0.037 .002
Leisure 0.015 1.01 0.021 .476
Residential instability –0.002 1.00 0.015 .880
Criminal personality 0.007 1.01 0.011 .555
Social isolation –0.003 1.00 0.012 .771
TABLE 5: Results From Cox Proportional Hazards Model Regressing the Hazard for a Felony Offense on
the Recidivism Risk Scale
coeff exp(coeff)
SE
(coeff)
p
Value 95% Confidence Interval
Medium risk 1.12 3.07 0.473 <.001 2.27, 4.15
High risk 1.73 5.66 0.843 <.001 4.22, 7.58
Note
. The reference category is the low risk level of the Recidivism Risk Scale.
(1)
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
or above typically indicate satisfactory predictive accuracy, and measures between .60 and
.70 suggest low to moderate predictive accuracy (Aos & Barnoski, 2003; Jones, 1996;
Quinsey, Harris, Rice, & Cormier, 1998).
Table 6 presents the model-specific AUCs for all of these analyses. The AUC values
range from .66 to .80, with a majority being above .70, which suggests satisfactory predic-
tive validities of these COMPAS risk models for all three recidivism outcomes. The AUCs
for person and felony offenses are a little higher than those for the broader and less pre-
cisely defined recidivism category “any offense” (which includes both misdemeanor and
felony offenses). The AUCs for predicting person offenses range from .71 to .80, with high
AUC values for women (.76 to .80). However, for women, although the sample size was
449, there were only 29 women with a person offense, which reduces statistical power and
suggests caution regarding this result.
30 CRIMINAL JUSTICE AND BEHAVIOR
0
.2
.4
.6
0 180 360 540 720 900 1,080 1,260 1,440 1,620
Days to First Felony Offense
490 378 368 298 234 185 134 87 56 11High Risk
668 578 572 480 366 289 178 116 93 29Med. Risk
1,048 977 999 876 685 512 310 179 154 39Low Risk
Number at Risk
Low 95% CI
Medium
High
Cumulative Hazard
Figure 1: Nelson-Aalen Plot of the Cumulative Hazard for a Felony Offense in Each Level of the
Recidivism Risk Scale, With Pointwise 95% Confidence Bands
Note
. CI =confidence interval.
TABLE 6: Area Under the Curve Values for COMPAS Risk Models Predicting Any Offense, Offenses
Against Persons, and Felony Offenses
Total Sample (
N
=2,328) Women (
n
=449) Men (
n
=1,879)
Model Any Person Felony Any Person Felony Any Person Felony
COMPAS I .66 .72 .70 .69 .78 .68 .67 .71 .71
COMPAS II .68 .73 .72 .72 .80 .69 .68 .72 .73
Recidivism Risk III .68 .71 .70 .65 .76 .66 .68 .70 .71
Note
. COMPAS Model I includes COMPAS base scales; COMPAS II adds age at first arrest to Model I.
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
Last, we examined the predictive accuracy of each of the models for African American
men and White men. We do not report results for African American women because the
effective sample sizes for most of the outcomes were too small to calculate separate AUCs
for that group. Table 7 presents the results from each model for the outcomes any arrest,
person offense arrest, and felony arrest for African American and White men. The AUCs
for African American men range from .64 to .73. As was the case in the full sample, the
highest AUCs are obtained for the felony offense and person offense arrest outcomes. The
AUC results for White men are quite similar to the results for African American men,
except that White men have somewhat higher AUCs on the COMPAS base scale models.
DISCUSSION AND CONCLUSIONS
This study examined the reliability (internal consistency) and predictive validity of the
COMPAS risk and needs scales on a large sample of PSI and probation cases. A first gen-
eral conclusion is that a majority of these scales reach levels of internal consistency and
predictive validity that are within generally acceptable ranges. Second, the separate uni-
variable analyses show that a majority of the specific COMPAS risk and need base scales
were significantly associated with felony recidivism (Table 3). Third, results from survival
analyses demonstrated that the predictive power of the three models tested was comparable
to, and in some cases higher than, similar risk predictive instruments in this field. We now
examine some more detailed specific findings.
Regarding internal consistency, most of the scales have alpha coefficients equal to or
greater than .70, with only three exceptions, and the latter were close to acceptable levels.
These satisfactory results reemerged in both male and female subsamples. We found no sig-
nificant differences in alpha levels between male and female subsamples, suggesting that
the scales are equally reliable for men and women.
Regarding predictive validity, we must first place the present results in the context of recent
research and accuracy performances for offender recidivism studies. An important point is that
the AUC has become the preferred measure of accuracy largely because of its independence
across base rates and selection ratios that allow it to provide clearer comparisons across dif-
ferent predictive instruments and studies (Flores, Lowenkamp, Smith, & Latessa, 2006;
Quinsey et al., 1998). Another contextual issue concerns interpretations of levels of the AUC
coefficient. As noted previously, AUCs in the .50s are considered to have little or no predictive
accuracy, those in the .60s are considered weak, those approaching or above the .70s are mod-
erate, and those in the .80s are strong (Tape, 2003). However, various authors appear to use
Brennan et al. / EVALUATING COMPAS 31
TABLE 7: Area Under the Curve Values for COMPAS Risk Models Predicting Any Offense, Offenses
Against Persons, and Felony Offenses for White and African American Men
White Men (
n
=1,412) African American Men (
n
=296)
Model Any Person Felony Any Person Felony
COMPAS I .69 .74 .73 .64 .69 .69
COMPAS II .71 .75 .75 .66 .71 .72
Recidivism Risk III .69 .71 .71 .67 .72 .73
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
different standards. For example, Flores et al. (2006) described their achieved AUC of .689 for
the LSI-R as valid and robust and as moderate to large. Similarly, Kroner, Stadtland, Eidt, and
Nedopil (2007) describe an AUC of .703 as representing a high predictive accuracy in a study
of the Violence Risk Appraisal Guide (VRAG). Perhaps the relative recency of using the AUC
to investigate offender risk assessment explains variations in evaluative statements. Finally,
with more studies reporting AUCs in this area, it appears that the accuracy levels achieved for
most current instruments, across a variety of samples and outcome variables, generally fall in
the range of .65 to .75, with only a few exceptions (see below).
The present study of COMPAS models produced AUCs mostly in a range of .70 to .80.
Specifically, 16 out of 27 cells examined for AUC reached .70 or above, with a smaller set
of cells in the .66 to .69 range. Note in particular the predictive accuracies of COMPAS for
person offenses in which all 9 of the cells (total sample, males or females, and three
models) had AUCs between .70 and .80. Thus, we may conclude that COMPAS predictive
accuracies are similar to or slightly higher than AUCs obtained by other major instruments
in this field (e.g., Barbaree, Seto, Langton, & Peacock, 2001; Barnoski & Aos, 2003; Dahle,
2006; Flores et al., 2006; Grann, Belfrage, & Tengstrom, 2000; Quinsey et al., 1998).
Furthermore, the present findings are in the same range as found in the initial validation
studies of the COMPAS recidivism risk model for probationers that produced AUCs of .72
and .74 over a 24-month outcome period (Brennan, Dieterich, & Oliver, 2004).
The AUCs of the other main instruments often used for offender risk prediction may fur-
ther help to contextualize the above findings. Perhaps the best known instruments are the
VRAG (Quinsey et al., 1998), the LSI-R (Andrews & Bonta, 1995), and the Psychopathy
Checklist–Revised (PCL-R; Hare, 1991). The AUC values for these instruments in recent
studies are quite varied according to the specific populations, outcome periods, and depen-
dent variables used in specific studies, as illustrated below.
VRAG
Quinsey et al. (1998) found an AUC of .76 in a large-scale, multiyear recidivism study.
Barbaree et al. (2001) reported AUCs of .69 in predicting serious reoffending and .77 when
predicting any reoffense for sex offenders. Kroner et al. (2007) obtained an AUC of .703 in
a study of reoffending among mentally ill offenders.
LSI-R
The recent review by Andrews et al. (2006) did not provide AUCs for the LSI-R.
However, Barnoski and Aos (2003) found AUCs of .64 to .66 for the LSI-R in predicting
felony and violent recidivism among Washington State prisoners. Flores et al. (2006) found
an AUC of .689 using the LSI-R to predict reincarceration among federal probationers.
Dahle (2006) reported an AUC of .65 using the LSI-R to predict violent recidivism.
Barnoski (2006) reported an AUC of .65 using the LSI-R to predict felony sex recidivism.
PCL-R
AUC levels again varied across studies. For example, in a Swedish study of mentally ill vio-
lent offenders, Grann et al. (2000) found AUC levels of .64 to .75 based on various follow-up
time frames. Barbaree et al. (2001) reported AUCs of .61, .65, and .71 for the PCL-R in pre-
dicting various recidivism outcomes among sex offenders.
32 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
The above findings clearly do not exhaust the full range of studies in this area. As more
studies report AUCs for specific instruments, varying populations, outcome variables, and
time frames, it may become possible to identify which instruments perform well in these
varying conditions.
The present study has several strengths and limitations. Strengths include the large sam-
ple (N=2,328), a multiyear outcome period, and an examination of AUCs for different
COMPAS models, different offense categories, and across genders. The incorporation of
survival modeling also shows not only that these AUCs achieved significant discrimination
between recidivists and nonrecidivists but also that the timing of failure events can be pre-
dicted using either the COMPAS base scales model or the overall recidivism risk model.
However, the present study did not systematically address variations in predictive accuracy
by offender subgroups broken down by age, ethnicity, and race; level of addiction; length of
follow-up; and so on. Several large-scale studies of COMPAS that will allow such detailed
examination are in progress. The following issues may still require further research.
PREDICTIVE ACCURACY BY OUTCOME OFFENSE
The present study found minor differences in AUCs for COMPAS risk models in pre-
dicting differing outcomes and clearly achieved stronger results for new offenses against
persons and for felony offenses than for the broader category of any new offense. This may
stem from the higher precision of definitions for the first two outcomes as opposed to the
less precise “any” new offense, as this included both misdemeanors and felonies across a
wide range of offenses. Barnoski and Drake (2007) found similar results when they exam-
ined the validity of a static risk scale for predicting three outcome categories (violent, prop-
erty and violent, and felony).
PREDICTIVE ACCURACY BY GENDER
The present study found that the basic COMPAS recidivism model predicts behaviors for
men and women about equally well, again similar to Barnoski and Drake (2007), who
found only small differences by gender. We realize that there are substantial concerns in
this topic, and a search for gender-sensitive risk factors is currently under way. COMPAS
is being augmented by additional risk and need factors of specific importance for female
offenders (Blanchette & Brown, 2006; Brennan, 2008b; Salisbury et al., 2006).
PREDICTIVE ACCURACY BY ETHNICITY
The present study found that the COMPAS recidivism models preformed equally well
for African American and White men at predicting the arrest outcomes. There is only one
previous study of which we are aware that examined the predictive accuracy of the COM-
PAS for different ethnic groups, and that study reported much weaker results for African
American men (Fass, Heilbrun, DeMatteo, & Fretz, 2008). In predicting rearrest within 1
year of release, Fass et al. (2008) reported AUCs for the COMPAS Recidivism Risk Scale
of .81 for Whites, .67 for Hispanics, .48 for African Americans, and .53 for the total sam-
ple assessed with COMPAS (N=276). However, their study has at least one major weak-
ness that renders its findings unreliable. Their small overall sample size and base rates
resulted in extremely small effective sample sizes for the ethnic groups (African American =
36, Hispanic =4, White =1), and this almost ensures unreliable results.
Brennan et al. / EVALUATING COMPAS 33
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
PREDICTIVE ACCURACY ACROSS DIVERSE CRIMINAL JUSTICE POPULATIONS
The present study did not include other specific criminal justice populations from prison,
parole, community corrections, or jails. However, a current, large-scale, parole reentry
study is evaluating predictive performance for parolees released to the community (Zhang,
Farabee, & Roberts, 2007). Preliminary findings reported AUCs of .67 rising to .72 with
only minor adjustments to the COMPAS recidivism risk model.
In conclusion, given that instrument validation is an ongoing process, we acknowledge
that numerous further tests and models could be applied to examine the predictive validity
of COMPAS risk models. The present results, however, are encouraging and suggest that
the COMPAS risk models reach levels of reliability, predictive validity, and generalizabil-
ity that are at least equal to those of other major instruments in offender risk assessment.
APPENDIX
Scale Content and Selection: Theoretical and Empirical Background
This appendix describes the item content and research background for each Correctional Offender
Management Profiling for Alternative Sanctions (COMPAS) scale listed in Table 1. Full details of
psychometric properties, theoretical justification, and supportive empirical studies for each scale are
given in Brennan, Dieterich, and Oliver (2007). We now describe the background of the COMPAS
scales, along with the main items loading on each scale and factor loadings (in parentheses).
Criminal Involvement. This scale includes items pertaining to number of prior arrests and con-
victions, frequency of incarceration, and criminal justice involvements. The highest loading item is
total number of prior arrests (.52). Past criminal involvement has been consistently supported by
meta-analytic studies as a major risk factor for predicting ongoing criminal behavior (Andrews &
Bonta, 1998; Gendreau, Little, & Goggin, 1996).
History of Violence. This scale includes official history items reflecting prior arrests and convic-
tions for violent felonies, use of weapons, infractions for fighting, and so on. The highest loading
items are the number of prior assaultive felony convictions (.47) and frequency of injury to victims
(.40). The research literature has indicated that the likelihood of future violence appears to increase
with each instance of a prior violent incident (Farrington, 1991; Lipsey & Derzon, 1998; Parker &
Asher, 1987).
History of Noncompliance. This scale includes official items reflecting failures to appear, failures of
drug tests, failures to comply with sentencing conditions, revocations for technical reasons, and so
forth. High-loading items include the number of probation revocations (.56) and prior failures to appear
(.37). Repeated noncompliance with criminal justice conditions and treatment regimes has emerged as
a predictor of both violent and general recidivism (Stalans, Yarnold, Seng, Olsen, & Repp, 2004).
Criminal Associates. This scale assesses associations with others who are involved in drugs, crim-
inal activity, and gangs. High-loading items include friends who have been arrested (.41) and friends
who have been gang members (.43). This construct is of central theoretical importance in both social
learning and subcultural theories of crime (Cullen & Agnew, 2003; Elliott, Huizinga, & Ageton,
1985). Meta-analytic research has consistently shown that having antisocial associates is a major risk
factor for recidivism (Gendreau et al., 1996).
(continued)
34 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
APPENDIX (continued)
Substance Abuse. The central items in this scale reflect the influence of alcohol or drugs on the
current offense (.40), perceived benefit of substance abuse treatment (.41), and prior substance abuse
treatment (.37). Drug use has consistently emerged as a significant risk factor for general criminal
and violent behavior (National Institute of Justice, 1999; National Research Council, 1993) and is a
major risk factor in meta-analytic studies (Gendreau et al., 1996).
Financial Problems and Poverty. This scale includes items such as worry about financial survival
(.53), problems paying bills (.52), and not enough money to get by (.52). Although poverty has shown
only modest predictive power in meta-analytic studies (Gendreau et al., 1996), decades of research has
shown reliable associations between poverty and high crime rates and related risk factors such as unsta-
ble residence, family disruption, single-parent families, community disorganization, and substandard
housing (National Research Council, 1993; Sampson & Lauritsen, 1994). Theoretically, poverty is a
key factor in strain or social marginalization and subcultural theories (Cullen & Agnew, 2003).
Occupational and Educational Resources or Human Capital. This scale includes items reflecting
levels of educational and vocational occupational success such as job skills (.43), current unemploy-
ment (.52), low wages (.49), and employment history (.39). A social achievement (human capital) scale
was selected for both empirical and theoretical reasons (Coleman, 1990; Gendreau et al., 1996; Hagen,
1998). It is central to strain theory because people with lower social capital have fewer life chances and
more restricted opportunities than do those with greater capital. The scale is dynamic because human
capital can be built or destroyed. Job loss or high school dropout may lower economic and social
opportunities, whereas completing job skills training or obtaining a GED may increase these chances.
Family Crime. The COMPAS scale of family criminality includes items assessing the criminality
and drug use of mother, father, and siblings. The highest loading items include parent ever jailed
(.55), parent has had drug problems (.43), and mother ever arrested (.40). Many empirical studies
have linked delinquency and adult crime to antisocial families (Farrington, Jolliffe, Loeber,
Stouthamer-Loeber, & Kalb, 2001; Farrington & West, 1993; Lykken, 1995). Social learning theory
has also linked deviant behavior to violent or criminal family role models and ineffective parenting
(Lykken, 1995). Heritability theories have included gene-based evolutionary theories (Ellis & Walsh,
1997) and biosocial theories of sociopathy (Mealey, 1995).
High Crime Neighborhood. This scale assesses levels of crime (.44), gang activity (.40), and drug
activity (.39) in the person’s neighborhood. Living in a high-crime neighborhood is an established
correlate of both delinquency and adult crime (Sampson & Lauritsen, 1994; Thornberry, Huizinga,
& Loeber, 1995). It plays a role in social disorganization, social learning, and subcultural theories
(Cullen & Agnew, 2003; Sampson, Raudenbush, & Earls, 1997).
Boredom and Lack of Constructive Leisure Activities (Aimlessness). This scale includes items
from two closely linked themes: boredom proneness and lack of engaging leisure activities.
Dominant items include often bored (.46), nothing to do (.47), restless with current activities (.47),
and scattered attention (.37). Although conceptually different, items from these two themes all load
on a single factor. Theoretically, an absence of constructive leisure activities partially reflects weak
engagement bonds of early social control theory (Hirschi, 1969), and a similar concept (idle hands)
enters routine activities theory (Osgood et al., 1996). Finally, restlessness, distractibility, and atten-
tion problems enter M. R. Gottfredson and Hirschi’s (1990) low self-control theory and Hare’s
(1991) related construct of psychopathy.
(continued)
Brennan et al. / EVALUATING COMPAS 35
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
APPENDIX (continued)
Residential Instability. The present scale includes items assessing the number of recent moves
(.39), homelessness (.33), and absence of a verifiable address (.34). The background literature indi-
cates that transience is often associated with poverty, poor housing, social disorganization, and crime
(McNaughton, 2007; National Research Council, 1993). Theoretically, transience and homelessness
may weaken social ties and have been associated with family breakup, social exclusion, and stress-
ful life events (Marris, 1987). Theoretically, it plays a role in both social control theory (weakening
or attenuating social bonds) and strain theory (poverty, personal stress, marginalization). We note
that personal stress or distress has emerged as a risk factor with modest predictive validity for recidi-
vism (Gendreau et al., 1996).
Social Isolation Versus Social Support. This scale captures social isolation at one pole and social
supports at the other. It includes items indicating self-reported loneliness (.33), absence of friends
(.40), feeling left out of things (.33), and no close or best friend (.37). Social support theory suggests
that even in high-risk environments social supports may mediate or buffer the criminogenic effects
of economic and social strain (Bennett & Morabito, 2006; Estroff, Zimmer, Lachicotte, & Benoit,
1994; National Research Council, 1993; Stevenson, 1998). In addition, at prison release and reentry
to society, prisoners with stronger social and family supports are found to have lower recidivism
(Solomon, Johnson, Travis, & McBride, 2004). Theoretically, this factor enters both strain theory
(buffering strain) and social control theory (reflecting social bonds).
Criminal Attitude. This scale assesses antisocial attitudes using items that may justify, excuse, or
minimize damage caused by the offender’s crime. Prominent items include the law does not help aver-
age people (.32), minor offenses such as drug use don’t hurt anyone (.24), and things stolen from rich
people won’t be missed (.36). Antisocial attitudes have emerged in meta-analytic studies as a major
risk factor (Andrews, Bonta, & Wormith, 2006; Gendreau et al., 1996). There is less agreement on the
particular attitudes that are most useful or predictive (e.g., minimizing the damage of offenses, toler-
ance for law violation, etc.; Walters, 1995). In the absence of consensus, COMPAS uses a higher order
scale with items adapted from Bandura, Barbaranelli, Caprara, and Pastorelli (1996).
Antisocial Personality. This scale addresses impulsivity, absence of guilt, selfish narcissism, domi-
nance, risk taking, and anger or hostility. Representative items include short temper (.39), often does
things without thinking (.30), and seen as cold and callous (.32). It is not designed as a comprehensive
coverage of all personality subfactors but as a short, broadband scale using only the first principle com-
ponent from a larger battery of antisocial personality items adapted from Eysenck and Eysenck (1978)
and Bandura et al. (1996). Important subfactors are impulsivity, risk taking, restlessness or boredom,
absence of guilt (callousness), selfish narcissism, interpersonal dominance, and anger or hostility
(Bandura et al., 1996; Cooke, Forth, & Hare, 1996; Lilienfeld & Andrews, 1996; Marcus, 2003;
Widiger & Lynum, 1998). Empirical support for antisocial personality (variously measured) is found
in Gendreau et al. (1996), Bandura et al. (1996), Blackburn and Coid (1998), and Quinsey, Harris, Rice,
and Cormier (1998). Theoretically, antisocial personality plays an important role in theories of antiso-
cial dispositions (Farrington, 2003; M. R. Gottfredson & Hirschi, 1990; Hare, 1991).
REFERENCES
Aalen, O. O. (1978). Nonparametric inference for a family of counting processes. Annals of Statistics, 6, 701-726.
Andrews, D. A., & Bonta, J. (1995). The Level of Service Inventory–Revised. Toronto: Multi-Health Systems.
Andrews, D. A., & Bonta, J. (1998). The psychology of criminal conduct (2nd ed.). Cincinnati, OH: Anderson.
36 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
Andrews, D. A., Bonta, J., & Wormith, J. S. (2004). The Level of Service/Case Management Inventory (LS/CMI). Toronto:
Multi-Health Systems.
Andrews, D. A., Bonta, J., & Wormith, J. S. (2006). The recent past and near future of risk and/or needs assessment. Crime
& Delinquency,52, 7-27.
Aos, S., & Barnoski, R. (2003). Washington’s offender accountability act: An analysis of the Department of Correction’s risk
assessment (Document No. 03-12-1202). Washington, DC: Washington Institute for Public Policy.
Austin, J. (1983). Assessing the new generation of prison classification models. Crime & Delinquency, 29, 561-576.
Bandura, A., Barbaranelli, C., Caprara, G. V., & Pastorelli, C. (1996). Mechanisms of moral disengagement in the exercise
of moral agency. Journal of Personality and Social Psychology, 71, 364-374.
Barbaree, H. E., Seto, M., Langton, C. M., & Peacock, E. J. (2001). Evaluating the predictive accuracy of six risk assessment
instruments for adult sex offenders. Criminal Justice and Behavior, 28, 490-521.
Barnoski, R. (2006). Sex offender sentencing in Washington State: Predicting recidivism based on the LSI-R. Retrieved
January 20, 2008 from http://www.wsipp.wa.gov/rtpfiles/06-02-1201.pdf
Barnoski, R., & Aos, S. (2003). Washington’s offender accountability act: An analysis of the Department of Corrections’ risk
assessment (Document No. 03-12-1202). Olympia: Washington State Institute for Public Policy.
Barnoski, R., & Drake, E. K. (2007). Washington’s Offender Accountability Act: Department of Corrections’ static risk
assessment. Olympia: Washington State Institute for Public Policy.
Bennett, R. R., & Morabito, M. S. (2005, November). Institutional social support and crime: A cross-national investigation.
Paper presented at the annual meeting of the American Society of Criminology, Toronto, Canada.
Blackburn, R., & Coid, J. W. (1998). Psychopathy and the dimensions of personality in violent offenders. Personality and
Individual Differences, 25, 129-145.
Blanchette, K., & Brown, S. L. (2006). The assessment and treatment of women offenders: An integrative perspective. New
York: John Wiley.
Bloom, B. (2000). Beyond recidivism: Perspectives on evaluation of programs for female offenders in community correc-
tions. In M. McMahon (Ed.), Assessment to assistance: Programs for women in community corrections (pp. 107-138).
Latham, MD: American Correctional Association.
Bonta, J. (1996). Risk-needs assessment and treatment. In A. T. Harland (Ed.), Choosing correctional options that work:
Defining the demand and evaluating the supply (pp. 18-32). Thousand Oaks, CA: Sage.
Bonta, J. (2002). Risk needs assessment: Guidelines for selection and use. Criminal Justice and Behavior, 29, 355-379.
Boothby, J. L., & Clements, C. B. (2000). A national survey of correctional psychologists. Criminal Justice and Behavior, 27,
715-731.
Brennan, T. (1987). Classification: An overview of selected methodological issues. In D. M. Gottfredson & M. Tonry (Eds.),
Prediction and classification: Criminal justice decision making (pp. 201-248). Chicago: University of Chicago Press.
Brennan, T. (2008a). Explanatory diversity among female delinquents: Examining taxonomic heterogeneity. In R. Zaplin
(Ed.), Female crime and delinquency: Critical perspectives and effective interventions (pp. 197-232). Boston: Jones and
Bartlett.
Brennan, T. (2008b). Institutional assessment and classification of female offenders: From robust beauty to person-centered
assessment. In R. Zaplin (Ed.), Female crime and delinquency: Critical perspectives and effective interventions (pp. 283-
322). Boston: Jones and Bartlett.
Brennan, T., Breitenbach, M., & Dieterich, W. (2008). Towards an explanatory taxonomy of adolescent delinquents:
Identifying several social-psychological profiles. Journal of Quantitative Criminology, 24, 179-203.
Brennan, T., Dieterich, W., & Oliver, W. (2004). The COMPAS scales: Normative data for males and females. Community
and incarcerated samples. Traverse City, MI: Northpointe Institute for Public Management.
Brennan, T., Dieterich, W., & Oliver, W. (2007). COMPAS: Correctional offender management for alternative sanctioning.
Technical manual and psychometric report (V. 5.01). Traverse City, MI: Northpointe Institute for Public Management.
Brennan, T., Wells, D., & Alexander, J. (2004). Enhancing prison classification systems: The emerging role of management
information systems. Washington, DC: U.S. Department of Justice, National Institute of Corrections.
Clements, C. B. (1996). Offender classification: Two decades of progress. Criminal Justice and Behavior, 23, 121-143.
Coleman, J. S. (1990). Foundations of social theory. Cambridge, MA: Harvard University Press.
Cooke, D. J., Forth, A. E., & Hare, R. D. (1996). Psychopathy: Theory, research and implications for society. Dordrecht,
Netherlands: NATO Science Series.
Cullen, F., & Agnew, R. (2003). Criminological theory: Past to present. Los Angeles: Roxbury.
Dahle, K. P. (2006). Strengths and limitations of actuarial prediction of criminal re-offence in a German prison sample: A
comparative study of LSI-R, HCR-20 and PCL-R. International Journal of Law and Psychiatry, 29(5), 431-442.
Dawes, R. M. (1979). The robust beauty of improper linear models in decision models. American Psychologist, 34, 571-582.
Elliott, D. S., Huizinga, D., & Ageton, S. S. (1985). Explaining delinquency and drug use. Beverly Hills, CA: Sage.
Ellis, L., & Walsh, A. (1997). Gene-based evolutionary theories in criminology. Criminology, 35, 229-276.
Estroff, S., Zimmer, C., Lachicotte, W., & Benoit, J. (1994). The influence of social networks and social support on violence
by persons with serious mental illness. Hospital and Community Psychiatry, 45, 669-679.
Brennan et al. / EVALUATING COMPAS 37
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
Eysenck, S., & Eysenck, H. (1978). Impulsiveness and venturesomeness: Their position in a dimensional system of person-
ality description. Psychological Reports, 43, 1247-1255.
Farr, K. A. (2000). Classification for female inmates: Moving forward. Crime & Delinquency, 46, 3-17.
Farrington, D. (1991). Childhood aggression and adult violence: Early precursors and later life outcomes. In D. Pepler &
K. Rubin (Eds.), The development and treatment of childhood aggression (pp. 5-29). Hillsdale, NJ: Lawrence Erlbaum.
Farrington, D. P. (2003). Developmental and life course criminology: Key theoretical and empirical issues. Criminology, 41,
221-255.
Farrington, D. P., Jolliffe, D., Loeber, R., Stouthamer-Loeber, M., & Kalb, L. M. (2001). The concentration of offenders in
families, and family criminality in the prediction of boys’ delinquency. Journal of Adolescence, 24, 579-596.
Farrington, D. P., & West, D. J. (1993). Criminal, penal, and life histories of chronic offenders: Risk and protective factors
and early identification. Criminal Behaviour and Mental Health, 3, 492-523.
Fass, T. L., Heilbrun, K., DeMatteo, D., & Fretz, R. (2008). The LSI-R and the COMPAS:Validation on two risk-needs tools.
Criminal Justice and Behavior, 35, 1095-1108.
Flores, A. W., Lowenkamp, C. T., Smith, P., & Latessa, E. J. (2006). Validating the Level of Service Inventory–Revised on a
sample of federal probationers. Federal Probation, 70(2), 44-48.
Gendreau, P., Little, T., & Goggin, C. (1996). A meta-analysis of the predictors of adult offender recidivism: What works!
Criminology, 34, 575-607.
Glaser, D. (1987). Classification for risk. In D. M. Gottfredson & M. Tonry (Eds.), Prediction and classification: Criminal
justice decision making (pp. 249-292). Chicago: University of Chicago Press.
Gottfredson, M. R., & Hirschi, T. (1990). A general theory of crime. Stanford, CA: Stanford University Press.
Gottfredson, S. D. (1987). Prediction: An overview of selected methodological issues. In D. M. Gottfredson & M. Tonry
(Eds.), Prediction and classification: Criminal justice decision making (pp. 21-52). Chicago: University of Chicago Press.
Grann, M., Belfrage, H., & Tengstrom, A. (2000). Actuarial assessment of risk for violence: Predictive validity of the VRAG
and the historical part of the HCR-20. Criminal Justice and Behavior, 27, 97-114.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical,
algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy and Law, 2, 293-323.
Hagen, J. (1998). Life course capitalization and adolescent behavioral development. In R. Jessor (Ed.), New perspectives on
adolescent risk behavior. Cambridge, MA: Cambridge University Press.
Hannah-Moffatt, K., & Shaw, M. (2001). Taking risks: Incorporating gender and culture into classification and assessment
of federally sentenced women in Canada. Ottawa, Ontario: Status of Women Canada.
Hardyman, P., & Van Voorhis, P. (2004). Developing gender-specific classification systems for women offenders. Washington,
DC: U.S. Department of Justice, National Institute of Corrections.
Hare, R. D. (1991). The Hare Psychopathy Checklist–Revised. Toronto: Multi-Health Systems.
Harrell, F. E., Califf, R. M., Pryor, D. B., Lee, K. L., & Rosati, R. A. (1982). Evaluating the yield of medical tests. Journal
of the American Medical Association, 247, 2543-2546.
Harris, P. W., & Jones, P. R. (1999). Differentiating delinquent youths for program planning and evaluation. Criminal Justice
and Behavior, 26, 403-434.
Hastie, R., & Dawes, R. M. (2001). Rational choice in an uncertain world: The psychology of judgment and decision-making.
Thousand Oaks, CA: Sage.
Hirschi, T. (1969). Causes of delinquency. Berkeley: University of California Press.
Hoffman, P. B. (1994). Twenty years of operational use of a risk prediction instrument: The United States Parole
Commission’s Salient Factor Score. Journal of Criminal Justice, 22, 477-494.
Holtfreter, K., & Cupp, R. (2007). Gender and risk assessment: The empirical status of the LSI-R for women. Journal of
Contemporary Criminal Justice, 23, 363-382.
Jones, P. R. (1996). Risk prediction in criminal justice. In A. T. Harland (Ed.), Choosing correctional options that work
(pp. 33-68). Thousand Oaks, CA: Sage.
Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (2nd ed.). New York: John Wiley.
Kroner, D. G., Stadtland, M., Eidt, M., & Nedopil, N. (2007). The validity of the Violence Risk Appraisal Guide (VRAG) in
predicting criminal recidivism. Criminal Behaviour and Mental Health, 17, 89-100.
Lilienfeld, S. O., & Andrews, B. P. (1996). Development and preliminary validation of a self-report measure of psychopathic
personality traits in a non-criminal population. Journal of Personality Assessment, 66, 488-524.
Lipsey, M. W., & Derzon, J. H. (1998). Predictors of violent or serious delinquency in adolescence and early adulthood: A
synthesis of longitudinal research. In R. Loeber & D. P. Farrington (Eds.), Serious and violent juvenile offenders: Risk
factors and successful interventions (pp. 86-105). Thousand Oaks, CA: Sage.
Lösel, F. (1995). The efficacy of correctional treatment: A review and synthesis of meta-evaluations. In J. McGuire (Ed.),
What works: Reducing re-offending (pp. 79-111). Chichester, UK: Wiley.
Lykken, D. T. (1995). The antisocial personalities. Mahwah, NJ: Lawrence Erlbaum.
Marcus, B. (2003). An empirical examination of the construct validity of two alternative self-control measures. Educational
and Psychological Measurement, 63, 674-706.
Marris, P. (1987). Loss and change. New York: Pantheon.
38 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
McNaughton, C. (2007, September). Life on the edge: Substance abuse and homelessness—Escape, resistance or deviance.
Paper presented at the annual meeting of the British Society of Criminology, Glasgow, UK.
Mealey, L. (1995). The sociobiology of sociopathy: An integrated evolutionary model. Behavioral and Brain Sciences, 18,
523-599.
Meier, S. (2002). Bridging case conceptualization, assessment and intervention. Thousand Oaks, CA: Sage.
Millon, T., & Davis, R. D. (1997). The place of assessment in clinical science. In T. Millon (Ed.), The Millon inventories:
Clinical and personality assessment (pp. 3-22). New York: Guilford.
Mossman, D. (1994). Assessing predictions of violence: Being accurate about accuracy. Journal of Consulting and Clinical
Psychology, 62, 783-792.
National Council on Crime and Delinquency. (2006). Correctional Assessment and Intervention System. Retrieved March 28,
2007, from http://www.nccd-crc.org/nccd/n_index_main.html
National Institute of Justice. (1999). Annual report on drug use among adult and juvenile arrestees (1998). Arrestee Drug
Abuse Monitoring Program (ADAM). Washington, DC: Author.
National Research Council. (1993). Understanding and preventing violence (A. J. Reiss & J. A. Roth, Eds.). Washington, DC:
National Academy Press.
Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics, 14, 945-965.
Northpointe Institute for Public Management. (1996). COMPAS [Computer software]. Traverse City, MI: Author.
Osgood, D., Wayne, J. K., Wilson, J. G., Bachman, P., O’Malley, G., & Johnston, L. D. (1996). Routine activities and indi-
vidual deviant behavior. American Sociological Review, 61, 635-655.
Palmer, T. (1992). The re-emergence of correctional intervention. Newbury Park, CA: Sage.
Parker, J., & Asher, S. (1987). Peer relations and later personal adjustment: Are low accepted children at risk? Psychological
Bulletin, 102, 357-389.
Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C. A. (1998). Violent offenders: Appraising and managing risk.
Washington, DC: American Psychological Association.
Reisig, M. D., Holtfreter, K., & Morash, M. (2006). Assessing recidivism risk across female pathways to crime. Justice
Quarterly, 23, 384-405.
Salisbury, E. J., Van Voorhis, P., & Wright, E. (2006, November). Construction and validation of a gender responsive
risk/needs instrument for women offenders in Missouri and Maui. Paper presented at the annual conference of the
American Society of Criminology, Los Angeles.
Sampson, R. J., & Lauritsen, J. (1994). Deviant lifestyles proximity to crime and the offender-victim link in personal vio-
lence. Journal of Research in Crime and Delinquency, 27, 7-40.
Sampson, R. J., Raudenbush, S., & Earls, F. (1997). Neighborhoods and violent crime: A multilevel study of collective effi-
cacy. Science, 277, 918-924.
Solomon, A., Johnson, K. D., Travis, J., & McBride, E. (2004). From prison to work: The employment dimensions of pris-
oner reentry. Washington, DC: Urban Institute.
Stalans, L. J., Yarnold, P. R., Seng, M., Olsen, D., & Repp, M. (2004). Identifying three types of violent offenders and pre-
dicting their recidivism and performance while on probation: A classification tree analysis. Law and Human Behavior, 26,
253-271.
Stevenson, H. C. (1998). Raising safe villages: Cultural-ecological factors that influence the emotional adjustment of ado-
lescents. Journal of Black Psychology, 24, 44-59.
Tape, T. G. (2003). Interpreting diagnostic tests: The area under the ROC curve. Unpublished report, University of Nebraska
Medical Center, Omaha.
Thornberry, T., Huizinga, D., & Loeber, R. (Eds.). (1995). The prevention of serious delinquency and violence: Implications
from the program of research on the causes and correlates of delinquency. Sourcebook on serious, violent and chronic
juvenile offenders. Thousand Oaks, CA: Sage.
Van Voorhis, P. (1994). Psychological classification of the adult male prison inmate. Albany: State University of New York
Press.
Walters, G. D. (1995). The Psychological Inventory of Criminal Thinking Styles: Part I. Reliability and preliminary validity.
Criminal Justice and Behavior, 22, 307-325.
Ward, T., & Brown, M. (2004). The good lives model and conceptual issues in offender rehabilitation. Psychology, Crime and
Law, 10, 243-257.
Ward, T., & Stewart, C. (2003). Criminogenic needs and human needs: A theoretical model. Psychology, Crime and Law, 9,
125-143.
Warren, M. Q., & Hindelang, M. J. (1979). Differential explanation of offender behavior. In H. Toch (Ed.), Psychology of
crime and criminal justice (pp. 166-182). Prospect Heights, IL: Waveland Press.
Widiger, T. A., & Lynum, D. R. (1998). Psychopathy and the five-factor model of personality. In T. Millon, E. Simonsen,
M. Birkett-Smith, & R. Davis (Eds.), Psychopathy: Antisocial, criminal and violent behavior (pp. 171-186). New York:
Guilford.
Wormith, J. S. (2001, July). Assessing offender assessment: Contributing to effective correctional treatment. The ICCA
Journal, pp. 12-23.
Brennan et al. / EVALUATING COMPAS 39
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
Zhang, S., Farabee, D., & Roberts, R. (2007, October). Predicting parolee risk of recidivism. Paper presented at the 66th semi-
annual meeting of the Association for Criminal Justice Research, Sacramento, CA.
Tim Brennan, PhD, is a senior research scientist at Northpointe Institute. His main research interests include risk assess-
ment, pattern recognition, classification, and machine learning in the context of crime and delinquency.
William Dieterich, PhD, is director of research at Northpointe Institute. His research interests include developing and test-
ing prognostic models for use in criminal justice agencies.
Beate Ehret, PhD, is a research analyst at Northpointe Institute. Her research interests include gender and crime, juvenile
delinquency, and evidence-based practice in criminal justice.
40 CRIMINAL JUSTICE AND BEHAVIOR
at DENVER UNIV on December 12, 2008 http://cjb.sagepub.comDownloaded from
... AI use gives these systems the ability to predict outcomes and distinguishes them from automation systems. In the case of the COMPAS algorithm [18], the objective was to predict whether an offender would reoffend after being released. This (AI) system provided advice to a human judge, who made the decision after being presented with the AI advice. ...
... Then an advisor is employed to advise the judge on their decision, whereafter the judge can give a final decision. In the case of the COMPAS algorithm [18], the judge would first have to make their own assessment. Only then the advice of the COMPAS system would be presented to the judge, who then makes the final decision. ...
Preprint
Full-text available
Artificial intelligence (AI) systems have become an indispensable component of modern technology. However, research on human behavioral responses is lagging behind, i.e., the research into human reliance on AI advice (AI reliance). Current shortcomings in the literature include the unclear influences on AI reliance, lack of external validity, conflicting approaches to measuring reliance, and disregard for a change in reliance over time. Promising avenues for future research include reliance on generative AI output and reliance in multi-user situations. In conclusion, we present a morphological box that serves as a guide for research on AI reliance.
... This algorithm allows for an objective assessment of risks, keeping significantly fewer people in custody while not posing a threat to society [32]. In 2016, the Wisconsin Supreme Court allowed the use of the Compas (Correctional Offender Management Profiling for Alternative Sanctions) risk assessment algorithm for recidivism in judicial decision-making [33]. However, it turned out that this algorithm has cognitive biases regarding race and ethnicity [34]. ...
Article
Full-text available
Problem statement. Artificial intelligence (AI) has become one of the greatest achievements of modern technological progress and the foundation for the creation of electronic justice. Many advanced countries around the world are already using it to optimize their judicial systems. Ukrainian justice is at the initial stage of digital transformation and requires the introduction of the latest information technologies (IT). An urgent scientific problem is to analyze the advantages and challenges of applying AI technologies to increase the efficiency, transparency and accessibility of justice. The purpose of the work is to study the possibilities of implementing artificial intelligence in justice and to identify key perspectives and challenges associated with the use of AI algorithms in the judicial system of Ukraine in the context of its integration with the European Union. Methods. The work uses the comparison method – to analyze the level of development and efficiency of justice systems and to assess the availability of legal remedies in different countries of the world; the method of systematic literature review – to analyze the literature on the effectiveness of implementing AI tools in the judicial system; the method of legal expert analysis – to analyze the legislative norms on security, confidentiality and ethical use of data science tools in the legal field; the method of system dynamics – to study the possible consequences of introducing new technologies into thejustice system; the formal-logical method – to analyze the legal framework of the EU and Ukraine regarding the use of artificial intelligence. Results. It has been studied that AI technologies can simplify access to justice, increase its transparency and efficiency by automating routine processes, analyzing large data sets and supporting decision-making. The existence of threats of bias and discrimination of artificial intelligence algorithms has been argued. The necessity ofbalancing technological progress with respect for ethical norms and human rights has been substantiated. The current state of implementation of e-justice and AI in Ukraine has been analyzed. It is proposed to implement effective mechanisms for regulating digital transformation in the legal system of Ukraine, as provided for by EU legislation. Conclusions. The analysis of existing scientific approaches to defining the concept of "artificial intelligence" and the tasks solved by AI-based systems in justice has been carried out. The features of information and legal support and international experience in the use of AI in the judicial systems of the world, in particular in the EU, have been studied. It has been established that even progressive states use AI algorithms for information support of court proceedings cautiously and partially due to the lack of a legislative framework and existing risks of bias and non-compliance with human rights. It is noted that the integration of AI tools into the judicial system of Ukraine, taking into account European experience, should become a priority of the digital transformation of justice. The use of AI provides undeniable advantages for increasing the efficiency and accessibility of judicial proceedings. However, there are risks that its conclusions may be biased or discriminatory. For the effective and safe use of artificial intelligence in the judicial system, it is necessary to develop a legislative framework for its regulation. ________________________ Постановка проблеми. Штучний інтелект (ШІ) став одним із найбільших надбань сучасного технологічного прогресу і підґрунтям для створення електронного судочинства. Багато передових країн світу вже використовують його для оптимізації своєї судової системи. Українське правосуддя знаходиться на початковому етапі цифрової трансформації та потребує впровадження новітніх інформаційних технологій (ІТ). Актуальною науковою проблемою є аналіз переваг та викликів застосування технологій ШІ для підвищення ефективності, прозорості та доступності правосуддя. Метою роботи є дослідження можливостей впровадження штучного інтелекту в правосудді та визначення ключових перспектив і викликів, пов’язаних із використанням алгоритмів ШІ в судовій системі України в контексті її інтеграції з Європейським Союзом. Методи. У роботі використано метод порівняння ‒ для аналізу рівня розвитку й ефективності систем правосуддя та оцінювання доступності засобів правового захисту у різних країнах світу; метод систематичного огляду літератури ‒ для аналізу літератури з питань ефективності впровадження інструментів ШІ в судову систему; метод правового експертного аналізу ‒ для аналізу законодавчих норм щодо безпеки, конфіденційності та етичного використання інструментів науки про дані в юридичній сфері; метод системної динаміки ‒ для дослідження можливих наслідків впровадження нових технологій у системі правосуддя; формально-логічний метод ‒ для проведення аналізу нормативно-правової бази ЄС та України щодо застосування штучного інтелекту. Результати. Досліджено, що технології ШІ можуть спростити доступ до правосуддя, підвищити його прозорість та ефективність за рахунок автоматизації рутинних процесів, аналізу великих масивів даних та підтримки ухвалення рішень. Аргументовано існування загроз упередженості та дискримінації алгоритмів штучного інтелекту. Обґрунтовано необхідність збалансування технологічного прогресу із дотриманням етичних норм та прав людини. Проаналізовано поточний стан впровадження електронного правосуддя та ШI в Україні. Запропоновано імплементувати у правову систему України ефективні механізми регулювання цифрової трансформації, передбачені законодавством ЄС. Висновки. Проведено аналіз існуючих наукових підходів до визначення поняття "штучний інтелект" та завдань, які вирішують системи на основі ШІ у правосудді. Досліджено особливості інформаційно-правового забезпечення та міжнародний досвід використання ШІ у судових системах країн світу, зокрема в ЄС. Встановлено, що навіть прогресивні держави використовують алгоритми ШІ для інформаційної підтримки судових процесів обережно та частково через відсутність законодавчої бази та існуючі ризики щодо упередженості та дотримання прав людини. Зазначено, що інтеграція інструментів ШІ в судову систему України з урахуванням європейського досвіду має стати пріоритетом цифрової трансформації правосуддя. Використання ШІ надає беззаперечні переваги для підвищення ефективності та доступності правосуддя. Однак існують ризики, що його висновки можуть бути упередженими або дискримінаційними. Для ефективного та безпечного використання штучного інтелекту у судовій системі необхідно розробити законодавчі базу для його регулювання.
... Another source of bias can arise when algorithms are deployed as decision aids without taking into account whether the distribution of errors is equitable and fair, or whether there are disparate impacts on individuals. For example, Northpointe's COMPAS (Correctional Offender Management Profiling for Alternative Sanctions; T. Brennan & Dieterich, 2017) algorithm derives individuals' recidivism risk scores from questionnaires given to them when they are incarcerated. COMPAS does not include an overt race variable, and its developers claimed that it was unbiased because its error rate in predicting recidivism for Black parolees was the same as for White parolees (39% for both). ...
Article
Full-text available
How might data analytic tools support intake decisions? When faced with a request for post-conviction assistance, innocence organizations’ intake staff must determine (1) whether the applicant can be shown to be factually innocent, and (2) whether the organization has the resources to help. These difficult categorization decisions are often made with incomplete information (Weintraub, 2022). We explore data from the National Registry of Exonerations (NRE; 4/26/2023, N = 3,284 exonerations) to inform such decisions, using patterns of features associated with successful prior cases. We first reproduce Berube et al. (2023)’s latent class analysis, identifying four underlying categories across cases. We then apply a second technique to increase transparency, decision tree analysis (WEKA, Frank et al., 2013). Decision trees can decompose complex patterns of data into ordered flows of variables, with the potential to guide intermediate steps that could be tailored to the particular organization’s limitations, areas of expertise, and resources.
Research Proposal
Full-text available
This study includes multiple investigations regarding the use of artificial intelligence (AI) in predictive crime demonstrating and its broader implications in criminology, illustrating significant advances and challenges. Neural networks used for analyzing historical data achieve an 81% precision, while LSTM networks predict crime occurrences within a 75%-90% range, showing AI's ability to improve public security in urban settings such as Mexico by leveraging spatial and temporal data analysis. Artificial intelligence is being used in the justice sector to improve legal advice, decision-making, and collaborative crime-fighting efforts through applications such as facial recognition and predictive surveillance. Ethical considerations, such as privacy and discrimination prevention, are essential for maintaining appropriate artificial intelligence use. AI is able to detect patterns in large datasets and is increasingly being used for crime prediction and prevention via data mining, machine learning, and deep learning, the field is still in its early stages, resulting in access to larger datasets and more demanding model training. In Abu Dhabi, a study using a multi-linear regression model and data from 316 police department employees emphasizes the importance of predictive policing, specialized training, and collaborative learning in crime prevention. Also, a primer for criminologists explores AI's dual role in crime-as a tool, a target, and potentially a self-driving agent-while emphasizing AI's beneficial impact on law enforcement uses through advanced detection and predictive policing. These studies suggest that criminologists should play a more active role in implementing AI in crime prediction and prevention strategies, proceeding away from traditional statistical models and toward more advanced AI-driven approaches.
Article
Recently, with the widespread use of deep neural networks (DNNs) in high-stakes decision-making systems (such as fraud detection and prison sentencing), concerns have arisen about the fairness of DNNs in terms of the potential negative impact they may have on individuals and society. Therefore, fairness testing has become an important research topic in DNN testing. At the same time, the neural network coverage criteria (such as criteria based on neuronal activation) is considered as an adequacy test for DNN white-box testing. It is implicitly assumed that improving the coverage can enhance the quality of test suites. Nevertheless, the correlation between DNN fairness (a test property) and coverage criteria (a test method) has not been adequately explored. To address this issue, we conducted a systematic empirical study on seven coverage criteria, six fairness metrics, three fairness testing techniques, and five bias mitigation methods on five DNN models and nine fairness datasets to assess the correlation between coverage criteria and DNN fairness. Our study achieved the following findings: 1) with the increase in the size of the test suite, some of the coverage and fairness metrics changed significantly, as the size of the test suite increased; 2) the statistical correlation between coverage criteria and DNN fairness is limited; and 3) after bias mitigation for improving the fairness of DNN, the change pattern in coverage criteria is different; 4) Models debiased by different bias mitigation methods have a lower correlation between coverage and fairness compared to the original models. Our findings cast doubt on the validity of coverage criteria concerning DNN fairness (i.e., increasing the coverage may even have a negative impact on the fairness of DNNs). Therefore, we warn DNN testers against blindly pursuing higher coverage of coverage criteria at the cost of test properties of DNNs (such as fairness).
Chapter
Full-text available
The purpose of this chapter is to provide an overview of the relationship of the Five- Factor Model (FFM) to personality disorder. The FFM has traditionally been viewed as a dimensional model of normal personality structure. However, it should probably be viewed as a dimensional model of general personality structure, including maladaptive as well as adaptive personality traits. Discussed herein is the empirical support for the coverage of personality disorders within the FFM; the ability of the FFM to explain the convergence and divergence among personality disorder scales; the relationship of the FFM to the DSM-5 dimensional trait model; the empirical support for maladaptivity within both poles of each FFM domain (focusing in particular on agreeableness, conscientiousness, and openness); and the development of scales for the assessment of maladaptive variants of the FFM.
Chapter
Full-text available
The complexity and mystery of the brain has, moreover, led to a culture that rewards intuition, and has thus convinced each neurosurgeon that [his or her] own experience is as valid as anyone else's. (Gladwell, 1996, p. 39) Gladwell's basic argument is that neurosurgeons often rely on their clinical judgment and ignore procedures and methods based on scientific research. Sound familiar? Like neurosurgeons, therapists typically receive little systematic feedback about the effects of their work. As a result, theories about what process leads to effective outcomes have proliferated, often with little empirical support. The approach described here attempts to present a balance between questioning and supporting clinical judgments by making those thoughts, decisions, and judgments explicit and checking them against other kinds of data. All practitioners at least implicitly think about their clients in terms of the causes of their problem(s) and the desired outcomes of the intervention. Some practitioners also collect qualitative data (i.e., case notes) and quantitative data (e.g., outcome assessments) as part of their record-keeping. The approach described here explicitly combines these conceptualization, assessment, and analytic components teaching practitioners how to produce a tentative model of their client's process and outcome and then collect data on the key elements of the model. On the basis of an analysis of that data they revise the model and intervention and continue to collect additional data. This iterative process could (and often does) continue until termination, but practically stops when the practitioner finds that the intervention is sufficiently effective. This integrative approach differs from traditional methods of teaching counseling strategies and psychotherapy in its emphasis on the inclusion of assessment data, particularly idiographic assessment. Most counseling texts provide information related to counseling process and intervention; a few explicitly relate this information to client conceptualization; but none thoroughly and systematically connect conceptualization and intervention with assessment. Assessment tends to be taught in separate courses on measurement, intelligence testing, and personality testing. The extent to which assessment is integrated into the therapeutic process often depends upon the judgment and background of a supervisor in a practicum setting. In fact, counseling and psychotherapy appears to be lagging behind comparable fields such as medicine and education in terms of using assessment data as feedback in the intervention process. What are the likely benefits of such an approach? By closely linking case conceptualization and assessment data with intervention decisions, practitioners can: employ an effective method for thinking about, organizing, and focusing on the key elements of counseling process and outcome; be introduced to a standard of practice that moves beyond an eclectic, flying-by-the-seat-of-your-pants approach to therapy; increase the likelihood that their work will be effective; better understand why counseling is ineffective in some cases. Graduate training is an appropriate setting in which to teach students how to integrate conceptualization, assessment, and analysis with intervention. These procedures add structure to what is often an ambiguous learning experience in counseling courses and practica. This integrative learning should particularly serve students well in future practice when they are asked to conduct regular outcome assessments or face difficult and complex clients. In addition, conceptualization, assessment, and analysis each present dilemmas that can be strengthened or resolved by consulting one or more of the other domains. With conceptualization, for example, the temptation is to believe that what we think about something actually constitutes the whole of that thing in actual situations. Collecting and analyzing data about specific client conceptualizations tempers overconfidence by providing corrective feedback. Similarly, students often treat assessment data as face valid, requiring analysis of actual data (e.g., correlation with other measures) to raise their awareness about the limitations of assessment information from any one source. The book currently consists of 6 chapters. Chapter 1 provides an introduction to the key elements of conceptualization, assessment, and analysis as well as a rationale for their integration. Chapter 2 guides the student practitioner through the steps necessary to select (a) process elements related to client etiology and intervention and (b) outcome elements for multiple and selected problems. In Chapter 3 the focus shifts to on how to assess the process and outcome elements in a model (e.g., progress notes, outcome assessments). Discussion includes an overview of assessment methods (including idiographic and behavioral assessment), construct explication, psychometric principles, guidelines for method selection, the use of baselines, and examples. Chapter 4 presents basic graphical, qualitative, and statistical analytic techniques, including time series plots, variability, correlation, and content analysis. Chapter 5 discusses common problems in conceptualization, idiographic assessment, and analysis, while Chapter 6 summarizes the book and examines the future of these approaches in clinical settings. Finally, appendices include forms to assist in model development, for recording qualitative and quantitative assessments, several outcome scales in the public domain, and class assignments designed to assist in learning this material. Portions of this book are based on Meier (1999, 1994): more detailed descriptions of teaching methods can be found in Meier (1999) and of measurement and assessment issues in Meier (1994). This book should not substitute as a sole text for case conceptualization, assessment, analysis, or intervention, nor should students who read this text believe that they fully understand the application of this material. Because the clinical world is always more complex than anything you read about it, this book is only a part of what students need to work in that world. These materials are intended as an extension or sequel to be used in counseling and practica classes where the instructor wishes to bring scientific methods into the discussion of clinical issues. This integration can be surprisingly difficult for students attempting to apply this material in actual practice. Even students enrolled in scientist-practitioner programs may see practica classes as "easy" and nontechnical; for example, one student complained to me in a practicum where I taught this material that "I didn't think this class was about research." Even in training programs that believe they are scientist-practitioner in orientation, a strong split often exists between what constitutes research and practice courses.
Article
Full-text available
The authors respond to a recent ProPublica article claiming that the widely used risk assessment tool COMPAS is biased against black defendants. They conclude that ProPublica's report was based on faulty statistics and data analysis and failed to show that the COMPAS itself is racially biased, let alone that other risk instruments are biased.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Book
Psychopathy is a very important concept for those working in the field of criminal justice - investigators, prosecutors, and those who have to evaluate, manage and treat offenders. In Psychopathy: Theory, Research and Implications for Society, detailed, empirically based contributions by the world's leading researchers describe the relevance of the construct to practical and policy issues, examining its relevance to such topics as treatment, risk management and recidivism. The use of the concept in a range of populations is discussed, including juveniles, children, and the mentally disordered, as well as across cultures. The major strength of the volume is that the validity of the psychopathy construct is enhanced by the extensive empirical support: contributors explore topics including the genetic, biological, affective, interpersonal and information processing models that underpin the disorder. Audience: All those dealing with offenders - psychologists, psychiatrists, lawyers, judges, prison administrators and those who formulate policy in the criminal justice system.