Chapter 18. Evaluating an Acoustic Gunshot Detection System in Cincinnati
In: The Study of Crime and Place: A Methods Handbook, edited by Elizabeth R. Groff and Cory P.
Haberman. Philadelphia: Temple University Press
Professor of Criminal Justice Studies
Southern Illinois University Edwardsville
Peck Hall box 1455
Edwardsville, IL 62026
Acoustic Gunshot Detection Systems (AGDS) rely on sensors to detect and triangulate distinct
soundwaves produced by gunfire. Notifications can be sent to dispatch centers, Mobile Computer
Terminals and even officer cell phones (see figure 18.1 below).
-Figure 18.1 about here-
Evaluation research on AGDS remains limited with most studies focusing on the accuracy of incident
identification and response (Watkins et al., 2002; Choi Librett & Collins, 2014), rather than AGDS’
efficacy in reducing violent crime (Lawrence, La Vigne & Thompson, 2019; Mares & Blackburn, 2012;
2020; Ratcliffe et al., 2019). Nonetheless, there may be multiple reasons why AGDS could reduce gun
violence. A faster and more accurate response could increase arrests and thus provide targeted
deterrence. Second, assuming AGDS increases police presence in gunfire hotspots, AGDS responses may
deliver a form of hotspot policing, providing a more general deterrent by increasing guardianship (Mares
& Blackburn, 2020). Available evidence on the efficacy of AGDS is minimal. Ratcliffe et al. (2019)
reviewed an acoustic system connected to cameras in Philadelphia and found no impact on crime.
Similar results are found by Mares and Blackburn (2012; 2020) for a ShotSpotter system in St. Louis.
Lawrence, La Vigne and Thompson (2019), are slightly more positive, and suggest that the benefits of
AGDS may outweigh the cost.
AGDS is typically implemented across contiguous high gun crime areas, meaning that
Randomized Controlled Trials (RCTs) are often not feasible. The below sample evaluation will therefore
detail an example of a place-based quasi-experimental approach that is most indicative of current
agency implementation practices.
Assumptions and requirements of evaluation
The presumption in this chapter is that you are conducting a retrospective quasi-experimental study as
treatment assignment is often non-random and evaluations typically occur well after deployment. In
such cases you: (1) assess the locations of AGDS treatment; (2) compare AGDS and traditional shots fired
data from an agency’s calls-for-service database (hereafter referred to as CAD); (3) build a Difference-in-
Difference (DID) model to examine changes in firearm-involved crime data. For the sake of brevity, we
assume that you know how to work with raw law enforcement data and have mastered basic data skills
such as recoding, aggregating and reorganizing data (see supplement for some tips).
Evaluations can be done independently or in collaboration with an agency. Either way you will
need calls for service data (CAD) or vendor AGDS notifications to determine the actual locations of
treatment. You also need Records Management System (RMS) data/incident data, ideally allowing you
to parse out incidents occurring with firearms. Finally, you want to access publicly available maps (Tiger
line street maps, for example) as well as census data (neighborhood, block or tract) in order to locate
treatment and control areas. In most cases you will use 2-3 software packages to complete the task
from beginning to end. First, you will have to condition/clean/prep your CAD/AGDS and RMS/incident
data so it can be read in both spatial and analytical software; you may use whatever software package
you are comfortable with. Second, you will need spatial software to geocode point data and spatially
join them to polygons, here we use ArcGIS Pro (v. 2.6) for this purpose. Finally, you need a statistical
package to analyze the data. Given that you are working with count data, you will most likely use a
mixed effects Poisson/negative binomial model, or in some cases, a simpler interrupted time-series
model. For the analytical portion we use Stata (v.16), but most functions can be implemented in other
Evaluating AGDS in Cincinnati, OH
For the current illustration we use data from Cincinnati. Cincinnati initially implemented ShotSpotter’s
AGDS system in late 2017 in neighborhoods Northeast of downtown. Expansions occurred in August
2019 and May 2020. Cincinnati is an ideal candidate for evaluation because the city has a relatively new
AGDS system initially placed in a small number of high gun crime areas. The latter is important as it
means treatment areas do not cover all high gun crime areas, allowing you to find control areas without
AGDS. Cincinnati developed a robust Standard Operating Procedure (SOP), that guides the response to
AGDS incidents. The SOP stipulates, for example, that AGDS notifications must be responded to with the
highest priority and officers are to search a 100ft area around the AGDS location and contact nearby
residents. Cincinnati’s CAD data includes separate codes for AGDS incidents (coded: “SPOTS”) and
regular shots fired incidents (coded: “SHOTS”) and its RMS data allows us to disaggregate gun-involved
crimes. This combined set of factors allows us to demonstrate the most extensive evaluation of AGDS by
following these steps:
1. Get your data ready (and Geocode)
2. Determine actual AGDS coverage area
3. Compare shots fired against AGDS calls for service
4. Evaluate crime impact
Step 1. Getting data ready, some general comments
For this project we will use ‘cleaned’ crime incident and CAD incident level data. Crime incident data is
primarily useful for exploring the potential impact of AGDS on crime levels, whereas CAD data can be
used to measure reporting changes in shots fired and to determine the AGDS coverage area. To achieve
the best resolution, we use the included anonymized address data by recoding address numbers by
replacing “XX” with “50”. For example, we recoded “12XX W Main st” to “1250 W Main st”. This will
ensure our data points are roughly at the center of a street segment and accurate to within 1/2 of a
street segment. We do not recode “XX” with “00” because that will place incidents at the ends of street
segments and could potentially impact positional accuracy by a whole street segment length1.
To facilitate analysis and data aggregation it is good practice during your data cleaning process
to create dummy codes for your crimes/incidents of interest. This data can later be aggregated to the
preferred space and time levels. The recoding itself can be done in any major software package (see
supplement for some additional tips).
Step 2. Determine AGDS coverage areas – Use file cadshots.csv in ArcGIS
Determining the coverage area of AGDS can be problematic. Do not rely on formal coverage maps.
Environmental factors (weather, topography and architecture) can greatly impact the accuracy of maps,
therefore it is always a good idea to verify the exact coverage. Knowing where these incidents occur will
assist you in deciding on the spatial and temporal level of aggregation.
Mapping acoustic incident data should occur after you cleaned your data and created dummy
codes for your incidents of interest. Acoustic data can be derived from two sources: (1) data collected
directly from a vendor (i.e. ShotSpotter) or (2) call for service data (i.e. CAD). AGDS calls for service data
are typically mixed in with other calls and will often provide an indication of the response times which
we will use in the next step. Geocode CAD data in ArcGIS using the provided address locator (see
supplement for more detailed instructions). If you do this with another agency’s data, parse your file
down to only the incidents you want as geocoding an entire multiyear CAD file may take a while.
-Figure 18.2 about here-
ShotSpotter incidents in Cincinnati are concentrated in a few areas. You could manually draw a polygon
around mapped incidents, but this will make it difficult to establish control neighborhoods by using
demographic and socio-economic indicators. The supplement includes layers for census tracts and
neighborhoods (SNAs). By selecting those layers on and off you manually ‘eyeball’ the best fit. Census
tracts appear to fit most efficiently around ShotSpotter areas. Our focus here is on ensuring a
substantial number of residential streets fall within AGDS coverage areas (admittedly this can be
subjective). Especially if you are working in partnership with an agency it may help to consult some of
their experts. Next, create a list of the census tracts and their approximate coverage of the census tract
as in table 1 below; you will use this list to create/recode binary variables indicating if a specific census
tract belongs to the treatment group(s).
-Table 18.1 about here-
Table 18.1 above shows how AGDS incidents fit in census tracts. For some agencies, the process may be
easier. Chicago, for instance, implemented AGDS by police districts. Make sure to spatially join the
census tracts to the geocoded CAD data (both SHOTS and SPOTS) and create a new variable indicating if
a census tract belongs to a treatment or control location; this can be done in ArcGIS or any other
software package. Finally, repeat geocoding for your RMS/incident data and spatially join them to
census tracts as well (you will need these in step 4).
3. Response time evaluation - use file cadshots.dta in Stata
You can now explore how response times may differ between ‘shots fired’ calls by residents (SHOTS) and
those produced by AGDS (SPOTS). In the case of Cincinnati, the SOP dictates that police are to respond
faster and do a more extensive search for AGDS incidents. The analysis can be done easily in any major
Here we only focus on the active 2017 treatment areas (“sssaug2017” identifies these cases in
the cadshots.dta data set). In addition, you must select on the treatment time periods (from August
2017). First, we examine by call type the descriptives for the four time variables of interest:
“timetodispatch” (how long it takes between receiving call and a police unit starts responding),
“distoarrive” (how long does the unit take to arrive to scene), “investigative” (how long did unit spend
on the ground) and “totaltime”.
Response time data tend to be skewed and/or kurtotic, as a few calls take a disproportionate
large or short amount of time. Such non-normal properties may be an outcome of reporting errors, for
example, an officer forgetting to close out a call for service. It is, however, reasonable to assume that
such reporting oversights are random across different types of calls for service. There are numerous
normality tests that can be used to assess normality; here we choose two (skew/kurtosis and Shapiro-
Wilk) and find that all variables exhibit significant non-normal features. Independent t-tests are a less
desirable choice as they require limited skew and kurtosis. To formally test response time differences
between AGDS and shots fired calls we thus perform a non-parametric Mann-Whitney U test/Wilcoxon
Rank Sum Test. Stata code for the 2017 areas looks as following:
ranksum disptoarrive if (year>=2017 & month>=8 | year>=2018) & sssaug2017==1, by(shotspotter)
This syntax instructs the software to examine the time from dispatch to arrival- effectively how long
officers took to get to the scene once they receive dispatch information- for the experimental period
(“year>=2017 & month>=8” for the part of 2017, or “| year>=2018” for the remainder) and cases
occurring in the 2017 treatment area (“sssaug2017==1”) using the AGDS binary as a cut factor
(“by(shotspotter)”). Please note that this only works with two types of calls for service in the data (In
this case SHOTS and SPOTS), if you create a data set with more call types, you will have to add additional
-Table 18.2 about here-
Once done for all for four variables, table 18.2 shows that AGDS dispatches occur significantly faster in
the 2017 treatment areas. While travel time is marginally shorter, investigative time is significantly
longer for AGDS calls, bumping up total time as well. The faster response time (combining dispatch and
travel) could thus serve to increase deterrence and raise levels of arrests, but this conclusion is, of
course, speculative. The additional time officers spend in AGDS incident locations may also be a positive
for the potential impact of AGDS, increasing the time officers are visible by about 6 minutes over
traditional calls. In effect, we may conclude that the implementation of AGDS achieves a decrease of the
response time compared to traditional shots fired calls, while increasing the time officers are visible in
gunshot hotspots. These results indicate that officers appear to have followed the SOP and shows that
implementation has been at least somewhat successful.
It is important to note, however, that these results only compare AGDS against shots fired since
the experiment began; it is possible that as part of a larger strategy to reduce gun violence, officers may
also have been instructed to respond faster to all shots fired calls for service, so it is important to discuss
with the agency if any other changes occurred alongside AGDS deployment.
4. Crime impact evaluation, a Difference-In-Difference approach -use file impacteval.dta in Stata
By aggregating both crime and CAD incident counts by month and census tract and merging the data, we
create a panel time-series dataset (see supplement for additional guidance on aggregation and selecting
control neighborhoods). The file “impacteval.dta” was created by aggregating geocoded CAD and crime
incident data between October 2014 and February 2020. Statistical power test typically assume
randomization as well as independence in observations; this is likely violated in two ways; AGDS
locations are typically spatially connected and temporal autocorrelation is commonly present in crime
data. For cross-sectional studies, a more general guideline is to have about 30 independent treatment
and control locations each (see Rosenfeld, Deckard & Blackburn, 2015). This bar is difficult to achieve in
AGDS experiments as treatment locations are often spatially connected, meaning a more granular
spatial scale does not create more independent observations. In cases with a low and/or dependent
sample size, as our current data, researchers should look beyond p-levels and use alternate ways to
tease out whether results are meaningful and robust (examples are provided in the supplement).
Typically, small spatial and temporal (month x census tract) units have a Poisson or negative
binomial distribution, meaning you will use count models for your analysis (Allison, 2009). This is
compounded by the problem that you may want to use a variety of different standard errors to account
for heteroskedasticity and/or autocorrelation. While numerous software packages can handle
Difference-in-Difference (DID) models, not all can handle violations of the assumptions of independence.
One of the key choices in impact evaluations is to select the most appropriate model design.
Generally, both fixed-effects, random effects and mixed-effects models should be examined for their fit.
The preference for model choice should ultimately be based on Likelihood Ratio-tests. Based on
experience, those results favor mixed-effects models with random effects for the spatial unit (tract) and
fixed effects for the independent/control factors. The comparison of different modeling strategies,
however, serves the additional favor of detecting robustness in our results. Essentially, if your results are
stable across approaches (different distributional and error specifications and different types of models)
this may lend credibility to findings below what some would say are arbitrary p-thresholds to begin
(Wasserstein, Schirm & Lazar, 2019).
Let us run through a few models while assuming that mixed-effect models are best and using
the panelized “menbreg” command in Stata. As a general practice it is advisable to include fixed-effects
for year, and month, especially if you do not use the exact same number of pre and post observations
(which may erase the need for a seasonality component)2. You additionally may want to include factor
variables that denote the number of weekend days in a month, the total number of days in a month (if
you data covers leap years, or if you do not include month fixed-effects), and holidays. If you really want
to press for pre and post experimental equivalency you additionally include weather data (temperature
and precipitation) as well as policing factors (here we use directed patrols as an additional control).
Effectively, we examine two hypotheses to explore the efficacy of AGDS: (1) AGDS reduces
random gunfire (as measured by citizen reported gunshots) and (2) AGDS reduces gun-involved crimes
(as measured by aggravated assault with firearm).
We first examine the likely impact of AGDS on reported shots fired, to examine if AGDS may be
able to reduce random gunfire using the code:
menbreg shots i.year i.month i.dotm i.weekendday dirpatrol PRCP TMAX i.full2017##i.exptime2017 if
(full2017==1 | controlfull2017==1)& (timeselector>=56 & timeselector<=104 )|| tract:, irr
The above command runs a negative binomial mixed-effect model (menbreg) in which we assign the
census tract as a random effect (“||tract”) and all others as fixed-effects.
• The dependent variable in this model “shots” is a count of the number of shots fired reported by
citizens by census tract, by month.
• We use binary controls for year and month, the number of days in a month and the number of
weekend days in a month. In Stata, the “i.” specification designates the variable is a
dummy/factor variable and the software will factorize the variables automatically.
• We additionally control for the level of directed patrols, as well as precipitation and
• The interaction specification “##” runs both the experimental dummies separately and as an
interaction effect (running “i.full2017 i.exptime2017 i. full2017#i.exptime2017” produces the
same result). The double hashtag method produces a ‘Difference-in-Difference’ coefficient that
allows inference on the likely impact of the treatment controlling for prior trends and control
groups at the same time (see Abadie, 2005 for a primer on DID estimation).
• In the “if” statement we restrict analysis to data for areas that are part of the experimental
measurement (treatment and control tracts). In addition, the statement restricts the time
period to two years before and after implementation using a sequential variable
• The “irr” specification at the end, returns coefficients as incident-ratios, allowing the
interpretation to approximate percent change in the base rate.
-Table 18.3 about here-
A few things are key to examine in your output.
• At the bottom of the output table (see table 18.3 above) you find the “LR test vs. nbinomial
model” indicating if the mixed-effect model offers an improvement versus the fixed-effect only
model; results indicate significant improvement (based on p< .000).
• The log likelihood at the top of the model, including significance (Prob>chi2) indicate if there is
significant improvement from a constant only model, indicating the independent/control
variables improve the overall prediction. Please note that when specifying alternative standard
error forms in Stata (i.e. robust), the LR-test may not display, you are encouraged -in such cases-
to additionally run normal standard error models to determine if the mixed-effect models are
necessary. You additionally may want to formally compare Poisson to negative binomial models
(see supplement for code examples) to determine which is most appropriate.
The results of our coefficients indicate no significant difference in control and treatment sites
(i.full2017, p=.320) and no significant difference in pre- v. post experimental periods for the entire
sample (i.exptime2017, p=.154), but does find a significant different reading for the experimental sites
since the start of the treatment and compared to control areas at the same time
(i.full2017#i.exptime2017, p<.000). What this DID coefficient tells us, is that reports for shots fired have
decreased by approximately 45% (1- .5468527 ) controlling for before/after effects as well as control
sites, and that this finding is significant (p<.000). As prior work has indicated (Mares & Blackburn, 2020)
a reduction in shots fired is not necessarily evidence that gunfire is reduced, rather reporting by citizens
may simply be replaced by AGDS notifications. Just the same, a reduction of about 45% is a substantial
change and indicates at the very least a marked change in reporting behavior since implementation.
If you want to know how much the overall (AGDS plus shots fired) response to gunshots has
changed, you can replace the dependent variable “shots” with variable “gunfire”. This might be another
useful indicator of measuring the attention AGDS tracts receive from law enforcement and can
reasonably be regarded as a proxy for additional hotspot patrols.
-Table 18.4 about here-
Arguably, the clearest evidence for treatment effects is if violent crimes committed with a
firearm are reduced. Replacing dependent variable “shots” with dependent variable “gunaggass” (sum
of all gun-involved aggravated assaults) gives a more valid picture of crime changes in the AGDS census
tracts (see table 18.4 above). A likelihood ratio tests indicates that this is most accurately modeled using
a Poisson model. Here we observe that serious gun violence is about 46% lower compared to the pre-
treatment period and compared to control tract simultaneously (p=.001).
To explore if this result is likely a valid indicator of the impact of AGDS implementation we
perform a battery of additional robustness test (see supplement), and explore, for example, if the
results can be linked to any one single census tract by dropping each census tract individually from the
analysis (this is especially important given the small number of tracts in the analysis). We also must
examine different model specifications and error structures. Virtually all alternative specifications
present substantially similar results, giving us greater confidence in the finding that the implementation
of AGDS in Cincinnati coincides with a reduction in aggravated assaults with firearms. Results favor
treatment impacts in aggravated assaults with firearms, but no impact in other crime categories
(assaults without firearms, robberies, etc.). In sum, the reductions in assaults with firearms appear
unique to this category and hold under a variety of modeling constraints and sensitivity checks.
Limitations and conclusions
The described evaluation approach is not a true RCT, but rather an example of a quasi-experimental
place-based evaluation, which means it suffers from some validity issues. The implementation of most
AGDS means that quasi-experimental designs are the best-case scenario. Additionally, it is unknown if
the presence of AGDS may have led to a differential treatment of experimental neighborhoods.
Certainly, the presence of new technology may have inspired police officers to take a broader proactive
approach to gun violence, and while the models control for directed patrols, this is difficult to measure
entirely. This may be compounded by the spatial clustering of treatment areas as these share
overlapping features (police activity, economic and social integration). In addition, variations of the
models, including distributional and error assumptions, show substantively similar results, indicating
robustness against model violations giving greater confidence in the key results (King & Roberts, 2015).
Nonetheless, problems remain. Assignment of treatment is not random, which could lead to
‘stacking the deck’ if surging neighborhoods are prioritized in treatment. Additionally, treatment
typically does not cease, meaning there are no true post-experimental observations. Furthermore, as
AGDS expands, it often absorbs former control sites creating additional measurement issues. It is also
important to note that the results reflect results for both the technology and the police response to
AGDS incidents, meaning the results could be a reflection of either the technology, or the specific police
response to these incidents.
The evaluation of AGDS experiments is quite involved and requires the combination of multiple
data sources, software packages and skills. Delineating the covered areas and finding equivalent control
areas are critical steps in developing a data set that allows you to measure the likely impact AGDS has on
crime levels, but there is some subjectivity in how to accomplish this. Once you have a data set your
next step is to model multiple variations to explore the likely impact. Given the particular, non-
randomized, nature of typical AGDS implementations (but see Ratcliffe et al. 2019) and the limited
availability of formal evaluations, little guidance is currently available as to expectations of typical
treatment effects. Whereas similar research in St. Louis (Mares & Blackburn, 2020) found no impact on
crime levels, the current example shows some more encouraging initial results. It is difficult to say why
such differences occur until more systemic research becomes available, allowing for comparisons of the
implementation process across agencies.
(1). Incident data often include geographic indicators. Latitude/longitude data you collect directly from
agencies is likely accurate, but if you use ‘Open Data’, location data are often randomized-accurate. In
the case of Cincinnati, visual inspection of the plotted CAD data reveals a block randomization pattern
that appears to span several streets, substantially impacting the accuracy. This can be seen in point
densities around high volume addresses, and/or by the random distances from street segments.
(2). There are several potential benefits in using time binaries in the analysis. First, it may reduce
autocorrelation, you can check this by running a linear model and including a lag term; if the lag term is
significant before dummy inclusion but not after, it is a good sign autocorrelation is not a major concern
with the dummies included; you may also explore the Durbin-Watson statistic but this is perhaps more
questionable given the nature of count data. Second, especially in shorter experiments, temporal
dummies increase pre/post equivalency.
Abadie, A. (2005). Semiparametric difference-in-difference estimators. The Review of Economic Studies,
Allison, P. D. (2009). Fixed effects regression models. Thousand Oaks: Sage.
Lawrence, D. S., La Vigne, N. G. & Thompson, P. S. (2019). Evaluation of gunshot detection technology to
aid in the reduction of firearms violence. National Institute of Justice, NCJ 254283.
King, G., & Roberts, M. E. (2015). How robust standard errors expose methodological problems they do
not fix, and what to do about it. Political Analysis, 23, 159–179.
Mares, D., & Blackburn, E. (2012). Evaluating the effectiveness of an acoustic gunshot location system in
St. Louis, MO. Policing: A Journal of Policy and Practice, 6, 26–42.
Mares, D & Blackburn, E. (2020). Acoustic gunshot detection systems: a quasi-experimental evaluation in
St. Louis, MO. Journal of Experimental Criminology, DOI:10.1007/s11292-019-09405-x.
Ratcliffe, J. H., Lattanzio, M., Kikuchi, G., & Thomas, K. (2018). A partially randomized field experiment
on the effect of an acoustic gunshot detection system on police incident reports. Journal of Experimental
Ronald L. Wasserstein, Allen L. Schirm & Lazar, N.A. (2019). Moving to a world beyond “p < 0.05”, The
American Statistician, 73, 1-19, DOI: 10.1080/00031305.2019.1583913.
Rosenfeld, R., Deckard, M., & Blackburn, E. (2014). The effects of directed patrol and self-initiated
enforcement on firearm violence. Criminology, 52, 428–449.
Watkins, C., Mazerolle, L. G., Rogan, D., & Frank, J. (2002). Technological approaches to controlling
random gunfire: results of a gunshot detection system field test. Policing, 25, 345–370.
FIGURES AND TABLES
Chapter 18. Evaluating an Acoustic Gunshot Detection System in Cincinnati
Each figure/table is on its own page
Figure 18.2 ShotSpotter incidents and Census Tracts in the 2017 Treatment area.
~2/3 and up
~1/4 to ~2/3
Table 18.1 ShotSpotter census tracts. Year of coverage in brackets.
2017 Treatment areas
Table 18.2. Descriptives and Mann-Whitney U-test for Time spent on AGDS and Shots fired. Mean time in
minutes, standard deviations in parentheses. *** = p<.001; n.s.= not significant
Mixed-effects nbinomial regression Number of obs =784
Group variable: tract Number of groups=16
Obs per group:
Integration method: mvaghermite Integration pts.=7
Wald chi2(24) =121.45
Log likelihood = -1726.0039 Prob > chi2=0.0000
[95% Conf. Interval]
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline incidence rate (conditional on zero random effects).
LR test vs. nbinomial model: chibar2(01) = 255.20 Prob >= chibar2 = 0.0000
Table 18.3 Stata output for “Shots Fired” changes in 2017 experimental tracts (time binaries excluded for
[95% Conf. Interval]
Table 18.4 Stata output for “Aggravated Assaults with Firearm” (time binaries and model diagnostics
excluded for brevity)