ArticlePDF Available

Bayesian Hierarchical Spatial Models: Implementing the Besag York Mollié Model in Stan

Authors:
Article

Bayesian Hierarchical Spatial Models: Implementing the Besag York Mollié Model in Stan

Abstract and Figures

This report presents a new implementation of the Besag-York-Mollié (BYM) model in Stan, a probabilistic programming platform which does full Bayesian inference using Hamiltonian Monte Carlo (HMC). We review the spatial auto-correlation models used for areal data and disease risk mapping, and describe the corresponding Stan implementations. We also present a case study using Stan to fit a BYM model for motor vehicle crashes injuring school-age pedestrians in New York City from 2005 to 2014 localized to census tracts. Stan efficiently fit our multivariable BYM model having a large number of observations (n=2095 census tracts) with small outcome counts < 10 in the majority of tracts. Our findings reinforced that neighborhood income and social fragmentation are significant correlates of school-age pedestrian injuries. We also observed that nationally-available census tract estimates of commuting methods may serve as a useful indicator of underlying pedestrian densities.
Content may be subject to copyright.
Bayesian Hierarchical Spatial Models: Implementing the Besag
York Mollié Model in Stan
Mitzi Morrisa, Katherine Wheeler-Martinb, Dan Simpsonc, Stephen J. Mooneyd, Andrew
Gelmane, Charles DiMaggiob,*
aInstitute for Social and Economic Research and Policy, Columbia University, New York NY, USA.
bDepartment of Surgery, New York University School of Medicine, New York, NY, USA.
cDepartment of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
dDepartment of Epidemiology, University of Washington, Seattle, WA, USA.
eDepartment of Statistics, Columbia University, New York NY, USA.
Abstract
This report presents a new implementation of the Besag-York-Mollié (BYM) model in Stan, a
probabilistic programming platform which does full Bayesian inference using Hamiltonian Monte
Carlo (HMC).
We review the spatial auto-correlation models used for areal data and disease risk mapping, and
describe the corresponding Stan implementations. We also present a case study using Stan to fit a
BYM model for motor vehicle crashes injuring school-age pedestrians in New York City from
2005–2014 localized to census tracts.
Stan efficiently fit our multivariable BYM model having a large number of observations (n=2095
census tracts) with small outcome counts < 10 in the majority of tracts. Our findings reinforced
that neighborhood income and social fragmentation are significant correlates of school-age
pedestrian injuries. We also observed that nationally-available census tract estimates of commuting
methods may serve as a useful indicator of underlying pedestrian densities.
Keywords
Bayesian inference; Intrinsic Conditional Auto-Regressive model; Besag-York-Mollié model;
probabilistic programming; Stan; pedestrian injuries
*Corresponding Author.
Mitzi Morris: Methodology, Software, Validation, Formal Analysis, Visualization, Writing – Original Draft, Writing- Review and
Editing.
Katherine Wheeler-Martin: Data Curation, Formal Analysis, Visualization, Writing – Original Draft, Writing-Review and Editing
Daniel Simpson: Methodology, Software, Writing-Review and Editing
Stephen Mooney: Methodology, Writing-Review and Editing
Andrew Gelman: Conceptualization, Methodology, Writing-Review and Editing
Charles DiMaggio: Conceptualization, Funding Acquisition, Methodology, Supervision.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of
the resulting proof before it is published in its final citable form. Please note that during the production process errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
HHS Public Access
Author manuscript
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Published in final edited form as:
Spat Spatiotemporal Epidemiol
. 2019 November ; 31: 100301. doi:10.1016/j.sste.2019.100301.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Introduction
Spatial auto-correlation is the tendency for adjacent areas to share similar characteristics.
Conditional Auto-Regressive (CAR) and Intrinsic Conditional Auto-Regressive (ICAR)
models, first introduced by Besag (1974), account for this by pooling information from
neighboring regions. The BYM model (Besag, York, and Mollié 1991), is a lognormal
Poisson model which includes both an ICAR component for spatial auto-correlation and an
ordinary random-effects component for non-spatial heterogeneity. Because either component
of the BYM model can account for most or all of the individual-level variance, it is difficult
to fit using MCMC methods. In this report we present an implementation of the BYM2
model (Riebler et al. 2016), a reparameterization of the BYM model, in Stan, a probabilistic
programming platform which does full Bayesian inference using Hamiltonian Monte Carlo
(HMC). Stan’s No U-Turn Sampler (NUTS) provides better and more robust estimates for
models such as the BYM model which have complex posteriors than samplers which use
Gibbs or Metropolis algorithms.
Part one of the paper reviews spatial modeling concepts and introduces the Stan language,
tools, and workflow. First, in section Models, we review the specification of the CAR and
ICAR models and show why it is much faster to compute log probability density of the
ICAR model instead of the CAR model. Then we review the original formulation of the
BYM model and present the BYM2 model, a reparameterization of the BYM model where
all parameters have clear interpretations and the choice of hyperpriors is straightforward.
The Stan Programs section is an introduction to both the Stan language and the the R
package rstan. As a first Stan program, we implement the ICAR model. The expressive
power of the Stan language allows for a straightforward translation from the mathematical
model to a Stan program. Using Stan’s vectorized operations, the joint specification of the
ICAR model in Stan corresponds directly to its mathematical formulation over the pairwise
differences between neighboring regions. To validate this model, we fit the areal map over
2095 New York City census tracts with rstan and use the R package ggplot2 to show how the
model recovers the spatial structure present in the data. The second Stan program
implements the BYM2 model.
Finally, in the Case Study section, we present a full, substantive example of a Stan spatial
analytic model using the BYM2 model to fit New York City motor vehicle crash data. The
study aims to map the geographic distribution of school-age pedestrian injuries at the census
tract level from 2005–2014, as well as explore sociodemographic factors associated with
their occurrence at the community level.
Models
Conditional Autoregressive Models
Areal data consists of a finite set of regions with well-defined boundaries, each of which has
a single measurement aggregated from its population. Counts of rare events in small-
population regions are noisy; removing this noise allows the underlying phenomena of
Morris et al. Page 2
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
interest to be seen more clearly. Conditional autoregressive (CAR) models smooth noisy
estimates by pooling information from neighboring regions. Given a set of
N
regions, the
binary
neighbor
relationship (written
i
~
j
where
i
j
) is 1 if regions
ni
and
nj
are neighbors
and is otherwise 0. For CAR models, the neighbor relationship is symmetric but not
reflexive; if
i
~
j
then
j
~
i
, but a region is not its own neighbor.
Spatial interactions between pairs of units
i
and
j
can be modeled conditionally as a normal
random variable
ϕ
, which is an
N
-length vector
ϕ
= (
ϕ
1,…,
ϕn
)T. In the full conditional
distribution, each
ϕi
is conditional on the sum of the weighted values of its neighbors (
wij
ϕj
)
and has unknown variance
ϕi|ϕj,ji, ∼ N
j= 1
n
wi jϕj,σ2.
Specification of the global, or joint distribution via the local specification of the conditional
distributions of the individual random variables defines a Gaussian Markov random field
(GMRF). Besag (1974) proved that the corresponding joint specification of
ϕ
is a
multivariate normal random variable centered at 0. The variance of
ϕ
is specified as a
precision matrix
Q
which is simply the inverse of the covariance matrix Σ, i.e. Σ =
Q
−1 so
that
ϕ N 0, Q1 .
For standard multivariate normal random variable
ϕ
, the precision matrix
Q
is constructed
from two matrices which describe the neighborhood structure of the
N
regions: the diagonal
matrix D and the adjacency matrix A. The diagonal matrix is an
N
×
N
matrix where each
diagonal entry
nii
contains the number of neighbors of region
ni
and all off-diagonal entries
are zero. The adjacency matrix is an
N
×
N
matrix where entry
nij
is 1 if regions
ni
and
nj
are
neighbors and 0 otherwise and all diagonal entries
nii
are zero.
The adjacency matrix encodes the neighborhood graph. If any region in the map can be
reached from any other region via a series of neighboring regions, then the map is a single,
fully connected component. The number of components of a neighborhood graph ranges
from 1 to N, in the case where all regions are islands. To see how this works, we construct a
simple example using a map over 4 regions (
n
1,
n
2,
n
3,
n
4) consisting of a single component
with neighbor relations: (1 ~ 2, 2 ~ 3, 3 ~ 4)
The adjacency matrix A is:
Morris et al. Page 3
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
n1
n2
n3
n4
n1n2n3n4
0100
1010
0101
0010
The diagonal matrix D is:
n1
n2
n3
n4
n1n2n3n4
1000
0200
0020
0001
To make the standard multivariate normal random variable
ϕ
have a proper joint probability
density, the precision matrix
Q
must be symmetric and positive definite. For the CAR model,
Q
is defined as
Q= D(I − αA)
where I is the identity matrix and 0 <
α
< 1. The term
α
is the CAR model parameter which
controls for the amount of spatial dependence, where
α
= 0 implies spatial independence.
Scaling A by
α
makes the quantity D(
I
α
A) positive definite. Because the neighbor
relationship
i
~
j
is symmetric by definition for CAR models, both A and
Q
are symmetric.
For the above example, when
α
= 0.5, D(
I
α
A) is:
n1
n2
n3
n4
n1n2n3n4
1 −0.5 0 0
−0.5 2 −0.5 0
0 −0.5 2 −0.5
0 0 −0.5 1
The log probability density of
ϕ
is proportional to
n
2log(det(Q)) − 1
2ϕT
where
n
is the number of components in the neighborhood graph. Computing the
determinant of
Q
requires
N
3 operations, e.g., when
N
= 100, det(
Q
) takes a million
operations and when
N
= 1000 it takes a billion operations. For large number of regions
N
,
this is computationally expensive for an MCMC sampler as the sampler recomputes the
probability density of
ϕ
for each new proposal.
Morris et al. Page 4
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Intrinsic Conditional Autoregressive Models
The intrinsic conditional autoregressive (ICAR) model sets
α
to 1, effectively eliminating
α
from the model so that the quantity D(
I
α
A) simplifies to D – A. For the above example,
D – A is
n1
n2
n3
n4
n1n2n3n4
1 −1 0 0
−1 2 −1 0
0 −1 2 −1
0 0 −1 1
Now the value of the determinant of
Q
is 0. The ICAR prior is improper but the posterior is
proper once you include some data.
MCMC samplers compute the log probability up to a proportionality constant. When
computing the log probability density of the ICAR model, the term n
2 log(det(
Q
)) is constant
and therefore drops out of the calculation. This reduces the number of operations needed to
compute the log density from
N
3 to
N
2, making it possible to fit datasets for large areal maps
with an MCMC sampler running on a modern laptop computer in only a few hours, instead
of many days.
In the ICAR model, each
ϕi
is normally distributed with a mean equal to the average of its
neighbors. Its variance decreases as the number of neighbors, denoted
di
, increases. The
conditional specification of the ICAR model is:
pϕiϕij= N
ijϕi
di
,
σi
2
di
where
σi
2 is the unknown variance.
The joint specification of the ICAR random vector
ϕ
centered at 0 with common variance 1
rewrites to the
pairwise difference
formulation:
p(ϕ) ∝ exp − 1
2
ij
ϕiϕj
2.
Writing the joint density as the pairwise difference makes it easy to reason about the
behavior of this model: each (
ϕi
ϕj
)2 contributes a penalty term based on the distance
between the values of neighboring regions; minimizing this term results in spatial
smoothing. The pairwise difference is non-identifiable; any constant added to
ϕ
washes out
of the term
ϕi
ϕj
. Adding the constraint Σ
N
ϕi
= 0 centers this model. With this constraint
the log probability density is defined because the domain of integration is restricted to the
set of parameters summing to 1.
Morris et al. Page 5
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
The Besag-York-Mollié Model
The BYM model is a lognormal Poisson model developed for disease risk mapping which
includes both an ICAR component for spatial smoothing and an ordinary random effects
component for non-spatial heterogeneity. The Poisson regression is used to estimate the
unknown log relative risk
ηi
for zone
i
, (
i
= 1, 2, …
n
), given
yi
, the observed number of
cases. The BYM model specifies:
ηi=μ++ϕ+θ
where:
μ
is the overall risk level, i.e., the fixed intercept.
x
is the matrix of explanatory spatial covariates such that
xi
is the vector of
covariates for areal unit
i
and
β
is vector of regression coefficients which are
constant across all regions, i.e., fixed effects.
ϕ
is an ICAR spatial component.
θ
is an ordinary random effects component for non-spatial heterogeneity.
The BYM model uses both spatial and non-spatial error terms to account for over-dispersion
not modelled by the Poisson variates. When the observed variance isn’t fully explained by
the spatial structure of the data, an ordinary random effects component will account for the
rest. However, this model becomes difficult to fit because either component can account for
most or all of the individual-level variance. Without any hyperpriors on
ϕ
and
θ
the sampler
will be forced to explore many extreme posterior probability distributions; the sampler will
go very slowly or fail to fit the data altogether. Riebler et al. (2016) provides an excellent
summary of the underlying problem as well as a survey of the subsequent refinements to the
parameterization and choice of priors for this model.
In order to fit the BYM model to their data using a custom Gibbs sampler, Besag, York, and
Mollié (1991) use gamma hyperpriors on the precision parameters
τϕ
and
τθ
, with carefully
chosen parameter values for each. Subsequent versions of this model use constraints
designed to create a “fair” prior which places equal emphasis on both spatial and non-spatial
variance, based on the formula from Bernardinelli, Clayton, and Montomoli (1995):
sd θi=1
τϕ
1
0.7 θ
sd ϕi
where m is the average number of neighbors across all regions in the dataset. Because the
values used for the gamma hyperprior on
τθ
depend on the value of m, the choice of
hyperpriors is dependent on the dataset being analyzed and therefore must be reevaluated for
each new dataset accordingly.
Morris et al. Page 6
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
The BYM2 Model
The BYM2 model (Riebler et al. 2016) follows the
Penalized Complexity
framework
(Simpson et al. 2017), which favors models where the parameters have clear interpretations,
allowing for assignment of sensible hyperparameters to each. Like the BYM model, the
BYM2 model includes both spatial and non-spatial error terms and like the alternative model
of Leroux, Lei, and Breslow (2000), it places a single precision (scale) parameter
σ
on the
combined components and a mixing parameter
ρ
for the amount of spatial/non-spatial
variation. In order for
σ
to legitimately be the standard deviation of the combined
components, it is critical that for each
i
, Var(
ϕi
) Var(
θi
) 1. This is done by adding a
scaling factor
s
to the model which scales the proportion of variance
ρ.
Because the scaling factor
s
depends on the dataset, it comes into the model as data. Riebler
et al. recommend scaling the model so the geometric mean of these variances is 1. This
scaling factor can be computed from the neighborhood graph in the transformed data block
of the Stan program, but here we compute this value using R’s INLA::inla.scale.model
function and pass it into the Stan model as data.
In the BYM2 model, the original BYM model’s combination of components
ϕ
+
θ
is
rewritten as
(ρ/s)ϕ*+( 1−ρ)θ*σ
where:
ρ
[0, 1] models how much of the variance comes from the spatially correlated
error terms and how much comes from the independent error terms
ϕ
* is the ICAR model
θ
* ~ N(0, n), where n is the number of connected subgraphs. When the
neighborhood graph is fully connected
θ
* ~ N(0, 1).
s
is the scaling factor computed from the neighborhood graph such that Var(
ϕi
)
1.
σ
≥ 0 is the overall standard deviation for the combined error terms
For BYM2 models over neighborhood graphs which are not fully connected, (i.e., n > 1),
each connected subgraph has its own variance, and must be scaled accordingly. The Stan
programming language is powerful enough to allow for disconnected subgraphs and island
regions, however the indexing required to keep track of each subgraph increases the
complexity of the code, therefore in this paper we present a Stan program for fully
connected neighborhood graphs.
Stan Programs
Stan is a highly-expressive general probabilistic programming language for the specification
of statistical models. A Stan program computes the joint log probability density of a set of
continuous parameters up to a proportional constant. Full Bayesian inference is carried out
Morris et al. Page 7
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
using Stan’s No U-Turn Sampler (NUTS) which uses Hamiltonian Monte Carlo (HMC) to
obtain a set of draws from the posterior. HMC samplers are more efficient and robust than
Gibbs and Metropolis samplers (Hoffman and Gelman 2014), allowing for better estimates
of models with complex posteriors such as the BYM model.
First Stan program: icar.stan
A Stan program consists of a set of named program blocks which occur in a fixed order. Stan
is an
imperative
programming language, thus the variable declarations and statements in
program blocks and user-defined functions are executed in program order. Stan is a strongly-
typed language, i.e., variable declarations specify the variable type and all operations must
respect the declared variable type. Variables must be declared before they can be referenced.
Data variables are declared in data and transformed data blocks. Parameter variables are
used in parameters and transformed parameters blocks. Declarations and statements are
terminated with a semicolon (;). Comments are delimited by a pair of forward slash
characters (//) and continue through to the end of the line.
Listing 1 presents a Stan program which computes the ICAR spatial random variable
ϕ
given
a set of neighboring regions.
Morris et al. Page 8
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Listing 1:
Program icar.stan
Lines 2–5 of icar.stan define a custom distribution function icar_normal_lpdf for an ICAR
random variable phi. This function computes the ICAR prior as the pairwise difference of
neighboring elements of phi and enforces the sum-to-zero constraint. The function name
ends in _lpdf which signals that this function defines a log probability density function.
_lpdf functions have the signature of return type real and the first argument is either type real
or type array of reals. It takes the following arguments:
the spatial random variable phi
N, the number of areal regions
integer array node1
integer array node2
Together node1 and node2 encode the neighbor relationships as a graph edgeset: node1
holds the set of indexes corresponding to
ϕi
and the node2 holds the indexes corresponding
to
ϕj
, where
i
<
j
. To see how this works, in the example in the previous section, there are 4
regions labeled 1 through 4 and 3 edges:
edge 1
edge 2
edge 3
node1 node2
1 2
2 3
3 4
Encoding the neighbor relations as an edgeset requires less memory than specifying a full N
× N adjacency matrix when the adjacency matrix is sparse. In our small example, the
adjacency matrix has 16 elements. The edgeset requires scalar variables N, and N_edges,
and 2 parallel arrays of indices for a total of 8 elements. In general, for a neighborhood of N
regions where the average number of neighbors for a region is K, the space required to store
Morris et al. Page 9
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
an edgeset is N × K, where K ≤ N. As N increases, K usually remains constant, thus the
edgeset encoding is more efficient.
Because Stan provides vectorized operations as well as multi-index expressions, line 3 of the
body of function icar_normal_lpdf
−0.5 * dot_self(phi[node1] − phi[node2])
is the direct translation of the pairwise difference formula
p(ϕ) ∝ exp − 1
2
ij
ϕiϕj
2.
The entries in arrays node1 and node2 are indexes for phi. The expressions phi[node1] and
phi[node2] are multiple indexing expressions; each evaluates to a vector of length N_edges
whose entries are values of phi at the indices in node1 and node2. Vector subtraction yields
the vector of pairwise differences. The Stan math library function dot_self multiplies this
vector by itself, the result is the sum of the squares of the pairwise differences.
The expression on line 4 enforces the sum-to-zero constraint on phi:
normal_lpdf(sum(phi) | 0, 0.001 * N);
Since the random vector phi sums to zero, it follows that the mean of phi must also be zero,
but instead of requiring the mean to be exactly zero, this constraint “soft-centers” the mean
by keeping it as close to zero as possible. This expression calls Stan’s implementation of the
normal probability density. The calling syntax for a probability density functions follows
probability function notation so that a vertical bar is used to separate the outcome from the
parameters of the distribution. The straightforward specification of this constraint is:
normal_lpdf(mean(phi) | 0, 0.001);
The mean is the sum of the vector elements divided by the vector length and division is a
relatively expensive operation. By multiplying the location and scala parameters by the
vector length, we remove the division operation from the formula.
For ICAR models the neighborhood structure comes into the model as data. Data variables
are declared in the data block, (lines 7–12). The variables N and N_edges specify the size
and range limits on the edgeset arrays node1 and node2, therefore the former are declared
before the latter. Constraints on the range of allowed values for a variable follow the variable
type name in the variable declaration. Because the variables N and N_edges hold size
information they are constrained to be greater than or equal to 0. The edgeset arrays are
indexes over the N areal regions, therefore these are constrained to be between 1 and N.
These constraints are enforced when the data is read in during model instantiation.
Morris et al. Page 10
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
The ICAR spatial random variable
ϕ
is declared in the parameters block, lines 13–15, and
the model block, lines 16–18 computes the log probability density up to a proportional
constant.
The model block computes the total log probability density by specifying the distribution of
phi (line 17) using a
sampling statement
. A sampling statement specifies that the expression
on the left hand side of the symbol ~ is distributed according to the right hand side log
probability density function, (or log probability mass function for discrete distributions).
Despite the name, this statement doesn’t actually perform sampling; it is functionally
equivalent to incrementing the total log probability density by the value returned by calling
the equivalent log probability density function with the with left hand side expression as the
first argument. E.g., line 17 is equivalent to incrementing the total log probability density by
the value returned by : icar_normal_lpdf(phi | N, node1, node2)
Fitting Stan Models to Data with RStan
We use the R package rstan to fit the ICAR model to the map of areal regions used in the
case study in the following section. This package contains functions to compile and fit Stan
models, generate reports, and save and reload model fits. The function stan compiles a Stan
program to C++, instantiates the compiled model together with the inputs specified in the
data block of the Stan program, and then runs the HMC sampler to produce a set of draws
from the target log probability density specified by the model block.
The spatial data in our example consists of the neighborhood graph over the New York City
2010 census tracts. After downloading the geographic datafiles from the US Census Bureau,
we used the R package spdep to get a list of all neighbors for the census tracts in the case
study (n=2095). The neighborhood graph was edited to create a fully connected graph.
Finally, we transformed the list into the set inputs to match the to the data variables in
program icar.stan: integer variables N and N_edges, and integer array variables node1 and
node2, the graph edgeset. The file bym2_nyc_data.R contains these inputs.
The script fit_icar_nyc.R sets up the R environment, loads the data, fits the model to the
data, and provides diagnostics and a summary of the resulting sample.
# fit model icar.stan to NYC census tracts neighborhood map
library(rstan);
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
source(file=“bym2_nyc.data.R”);
icar_nyc_stanfit = stan(“icar.stan”,
data=list(N,N_edges,node1,node2),
control=list(max_treedepth=15));
check_hmc_diagnostics(icar_nyc_stanfit);
print(icar_nyc_stanfit, probs=c(0.25, 0.75), digits_summary = 1);
Morris et al. Page 11
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
The stan function returns a stanfit object which contains both the posterior draws and
sampler diagnostic values produced by each chain used to detect problems with the model
fit. The arguments to the stan function specify the Stan program file, the list of data
variables, and controls to the sampler. The default settings will use the NUTS HMC sampler
to run 4 chains for 2000 iterations, where the first 1000 iterations are warmup and the last
1000 iterations are saved as output, producing a sample consisting of 4000 draws from the
posterior.
The check_hmc_diagnostics function checks that the sampler was able to able to fully and
effectively explore the joint distribution specified by the model. It reports on
divergences
, which signal that the HMC sampler cannot adequately explore all
regions of the posterior, resulting in a biased sample. Increasing the sampler’s
adapt_delta control can sometimes resolve this problem, otherwise it may be
necessary to reparameterize the model. See (Betancourt 2017).
treedepth
- iterations which exceed maximum treedepth result in slow sampling
time; to resolve this, increase the treedepth via the sampler’s max_treedepth
control. In this example, it was necessary to increase the max_treedepth control
above 12 in order to eliminate these warnings.
E_BFMI
- the Bayesian Fraction of Missing Information for each chain. This can
sometimes be resolved by increasing the number of warmup iterations, otherwise
it may be necessary to reparameterize the model.
The print function returns a set of summary statistics, described in the RStan vignette
Accessing the contents of a stanfit object.
The summary is a matrix with rows corresponding to parameters and columns to
the various summary quantities. These include the posterior mean, the posterior
standard deviation, and various quantiles computed from the draws. … For models
fit using MCMC, also included in the summary are the Monte Carlo standard error
(se_mean), the effective sample size (n_eff), and the R-hat statistic (Rhat).
Here we call the print function with optional arguments probs and digits_summary. The
probs argument specifies that only the 0.25 and 0.75 quantile estimates should be displayed.
In this example, 1 digit of precision is sufficient to check the summaries for all elements of
parameter phi. This call returns the following.
Inference for Stan model: icar.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 25% 75% n_eff Rhat
phi[1] 0.0 0.0 0.8 −0.5 0.6 3354 1
phi[2] 0.0 0.0 0.8 −0.5 0.5 3130 1
phi[2094] 0.0 0.0 1.3 −0.9 0.9 2110 1
Morris et al. Page 12
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
phi[2095] 0.0 0.0 1.3 −0.8 0.9 2058 1
Although the summary information is designed to be read from left to right, the column
which should always be checked first is the rightmost column labeled Rhat. The R-hat
statistic is a measure of convergence. When a chain fails to converge, the draws returned by
the sampler are not a sample from the posterior distribution and cannot be used for
estimation. All R-hat values should be extremely close to 1 and values greater than 1.1 are
an indication that one or more of the chains have failed to converge during warmup. All
values in the Rhat column of the summary are 1, indicating that the chains have converged.
The second column from the right labeled is labeled n_eff. The number of effective samples
(Neff) is the number of independent samples with the same estimation power as the N
autocorrelated samples. An MCMC sampler produces an
estimate
of the mean. The error in
that estimate depends on the number of effective samples Neff. The column se_mean is the
Monte Carlo sampler error (MCSE) which is is proportional to 1/ Neff instead of 1/ N . It is
computed by sd/ n_ eff . As Neff increases, the MCSE approaches 0 and the estimated
parameter mean approaches the true mean. Conversely, when Neff is low, so is the precision
of the estimate. In this example, all Neff values are above 1200 with median value 2598,
which is sufficient to estimate all parameters with reasonable precision.
Since the R-hat and (Neff) statistics indicate that this is a valid sample with sufficient number
of independant draws to estimate all parameters, the next step is to check the estimates for
all parameters in the model. The parameter phi is a multivariate normal random variable
centered at zero with precision matrix
Q
. The print summary column labeled mean shows
that the estimated mean for all elements of the vector phi are zero, indicating that this Stan
program is a correct implementation of the ICAR model.
The spatial structure implied by the ICAR prior phi is encoded in its covariance matrix Σ.
Since the mean of each
ϕi
depends on that of its neighbors, we expect to see high co-
variance between neighboring regions and we expect co-variance between non-neighboring
regions to be close to zero. To check this we extract the set of draws for the vector phi from
the fitted ICAR model using RStan’s extract function. Instead of working with the
covariance matrix we use the correlation matrix which standardizes the range of values to
[−1, 1]. We use the R package ggplot2 to plot the results.
The elements of phi are ordered by a numeric ID which consists of a borough code followed
by the census tract ID. The borough codes impose the following order on the five boroughs
of New York City: Manhattan, Bronx, Brooklyn, Queens, Staten Island. The borough code
ordering doesn’t correspond to the neighborhood graph over the boroughs; i.e., the Bronx is
not adjacent to Brooklyn, Queens doesn’t share a border with Staten Island. By plotting the
correlation matrix in input order without further clustering we expect to see that within a
borough, adjacent elements
n
and
n
+1 are likely to be neighbors and should be positively
correlated, while across boroughs elements are unlikely to be neighbors and should only be
weakly correlated or anti-correlated. Because this matrix is symmetric, we only show
correlations for the upper triangular matrix.
Morris et al. Page 13
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
In this plot, black lines mark the divisions between the five boroughs. Bright red indicates
strong correlation, white indicates no correlation, dark violet indicates anti-correlation, and
pale red or violet indicates weak correlation or anti-correlation, respectively. The overall
pattern shows high correlation (bright red) within each borough and weak anti-correlation
(pale violet) between boroughs.
The boroughs of Staten Island and Queens have no common border. Zooming in on the
upper right corner, the census tracts of Staten Island and part of Queens, show the pattern of
high correlation within each borough and weak anti-correlation between the two boroughs.
These diagnostics and plots validate the Stan model fit of the New York City neighborhood
graph to the ICAR prior.
Second Stan program: bym2.stan
The Stan program bym2.stan implements the BYM2 model for a fully connected
neighborhood graph. The log probability density is a Poisson GLM with a fixed intercept
and vector of coefficients together with a combined random effects component consisting of
an ICAR model for spatial smoothing and an ordinary random effects component for non-
spatial heterogeneity. This combined random effects component is is scaled by a parameter
for the overall standard deviation. The Poisson regression is specified in the model block as
y ∼ poisson_log(log_E + betaΘ + x * betas + convolved_re * sigma);
The combined random effects components is specified as:
(ρ/s)ϕ*+( 1−ρ)θ* .
This is coded as the variable convolved_re which is declared and defined in the transformed
parameters block:
convolved_re = sqrt rho / scaling_factor * phi
+ sqrt 1 − rho * theta;
In a Stan program, the model block contains the specification of the likelihood and priors.
The parameters block is a declarations-only block; parameters are declared here and
constraints are specified in the model block. The transformed parameters block uses the set
of proposed (unconstrained) parameters to compute derived values; these derived values are
used to compute the likelihood. The declarative nature of a Stan program makes it easy to
see the role that every variable plays in the model, however for complicated models,
implementation logic is spread across the program blocks, as is the case here.
The complete implementation of the BYM2 model is shown in Listing 2.
Morris et al. Page 14
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Listing 2:
Program bym2.stan
The functions block (lines 1–6) contains the definition of the function icar_normal_lpdf
which computes the ICAR prior, as in the program icar.stan.
The data block (lines 7–17) contains the definitions for the four variables which specify
neighborhood structure, as well as the data and outcomes from the disease mapping study,
Morris et al. Page 15
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
i.e., the observed counts per region, population (offset), number of regions, dimensions of
the vector of covariates, design matrix.
The transformed data block (lines 18–20) puts the offset term on the log scale. The data and
transformed data blocks only executed once, when Stan instantiates the model together with
the data. This program specifies the scaling factor as data; however it is possible to compute
this directly from the neighborhood graph, in which case, this variable would be declared in
the transformed data block, along with the statements required to compute this value.
The ensemble of the parameters block (lines 21–28), transformed parameters block (lines
29–33), and model block (lines 34–42), specifies the model parameters and the likelihood
and priors. For every step of the sampler, the statements in the transformed parameters and
model block are computed in order using the set of proposed (unconstrained) parameters.
The total log probability density is incremented by the sampling statements in the model
block.
The generated quantities block (lines 43–56) computes additional quantities of interest. This
block is executed once per iteration, at the point where the sampler proposal has been
accepted. The quantities of interest are computed using the (unconstrained) values of all
parameters and transformed parameters for that draw. Here we use the generated quantities
block to generate two quantities of interest based on the parameters for that draw:
mu - the estimated of input y (lines 44–45)
y_rep - an estimate of new data y (lines 46–55).
On line 54 we use Stan’s poisson_log_rng function to generate a new observation y_rep
based on the data and estimated parameters for that draw. Lines 47–51 guard against
potential numerical problems which may occur during warmup. We use the generated y_rep
values and the Stan’s bayesplot package for R to carry out posterior predictive checks (PPC),
(Gabry et al. 2017). PPC is a model checking procedure in which a model is used to generate
new data from from the current data. From Gabry et al.:
The idea behind posterior predictive checking is simple: if a model is a good fit we
should be able to use it to generate data that resemble the data we observed.
In the case study section we fit the BYM2 model with the New York City motor vehicle
crash data, a dataset consisting of areal observations for 2095 census tracts and then use the
PPC model checking procedure to evaluate the fit.
Comparison of Stan to ICAR Model Implementations in BUGS and INLA
We compared the Stan implementation of the ICAR prior to the corresponding models in
BUGS and INLA using the Scotland lip cancer dataset, first discussed in Clayton and Kaldor
(1987), which is available from the R INLA package as dataset Scotland. For this
comparison we used a simpler model than the model we recommend in this paper, as that
model (including the prior specification) cannot be easily fit in BUGS.
In order to compare estimates, we implemented approximately1 the same model, using the
same likelihood and priors for all platforms by reducing the BYM model to a simpler model
Morris et al. Page 16
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
consisting of a hierarchical Poisson regression with an ICAR component for spatial
correlation as the model:
ηi=μ++ϕ
where:
μ
is the overall risk level, i.e., the fixed intercept.
x
is the matrix of explanatory spatial covariates such that
xi
is the vector of
covariates for areal unit
i
and
β
is vector of regression coefficients which are
constant across all regions, i.e., fixed effects.
ϕ
is an ICAR spatial component.
The programs, data, and scripts used to run these comparisons are included in the
supplemental materials for this paper. We ran the same number of chains and iterations for
both BUGS and Stan. Table 1 presents a comparison of the running times as reported by R’s
system.time command. Column “User” gives the CPU time spent by the R session, column
“System” gives the CPU time spent by the operating system on behalf of the R process, and
column “Elapsed” give the wall clock time taken to run the process. On this small dataset,
Stan took less processing time than BUGS and INLA, but INLA took less time overall.
Table 2 presents the estimates for the slope, intercept, and ICAR component. BUGS and
INLA compute the precision tau which is the inverse variance, i.e. 1
/σ
2, while Stan
computes the standard deviation sigma which is the square root of the variance (
σ
2), therefor
we use Stan’s generated quantities block to compute tau via the statement real tau = sigmaˆ
−2;. All three systems arrived at approximately the same estimates for the slope and
intercept parameters of the linear regression as well as for all elements of phi.
Stan Model Extension and Expansion
Because Stan is an expressive probabilistic programming language, it is fairly
straightforward to modify the implementations presented in this paper to deal with more
complicated data. For example, the BYM2 model can be extended to disconnected graphs
using the ideas in Freni-Sterrantino (2018). Furthermore, it is a one line change to the code
to replace the Poisson assumption on the data with other observation processes. The
structure of Stan makes it possible for even every data point to have a different likelihood.
Moreover, the BYM2 model can be nested as a component in any hierarchical or non-
hierarchical Bayesian model. The only real restriction is that model parameters must be
continuous variables, although some models, such as finite mixture models, which can be
written with discrete random variables can also be expressed without them, (see the chapter
on “Latent Discrete Parameters” in the Stan User’s Guide (2019)).
1The three models implement the sum to zero constraint differently. INLA and Stan use a soft centering approach, while BUGS
subtracts the mean off each sample. The procedure in BUGS is known to be mathematically incorrect.
Morris et al. Page 17
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Case Study: Youth Pedestrian Injuries in NYC, 2005–2014
In New York City (NYC) pedestrians account for approximately half of all traffic fatalities
(Fung and Conderino 2017). Small-area spatiotemporal modeling using Bayesian models
such as the Besag-York-Mollie (BYM) model can be a useful tool to explore areas of high
risk for pedestrian crashes and to evaluate the joint role of sociodemographic and traffic
related risk factors (DiMaggio 2015). This case study focuses on school-age pedestrian
crashes using ten years of recent data from 2005–2014. We used the Stan platform to fit the
BYM2 model to dataset consisting of census tract counts of school-age pedestrian crashes,
exploring the effects of commuting patterns, vehicular traffic density, social fragmentation,
and income.
Methods
Measures—We obtained motor vehicle collision data from the New York City Department
of Transportation for the ten most recent years of data available at the time of request (2005–
2014). Within this dataset, we identified collisions involving school age children 5–18 years
of age as pedestrians. We then assigned each crash to the census tract in which it occurred,
using boundaries from the 2010 United States Census.
We obtained 2010 US Census counts of youths aged 5–18 in each census tract from the US
Census Bureau (“American Factfinder,” n.d.) We also obtained the Census Bureau’s
American Community Survey (ACS) five-year estimates of median household income and
the percentage of commuters who traveled to work by means other than a private vehicle
(i.e., by walking, bicycling, or using public transportation) for each tract for 2010–2014. We
constructed an index of social fragmentation based on the work of Peter Congdon (Congdon
2012), as described in our previous study (DiMaggio 2015), using updated ACS estimates of
vacant housing units, householders living alone, non-owner occupied housing units, and
population having moved within the previous year. We standardized each of these metrics
with a mean of zero and added them together as a single index. Finally, we obtained street
level annual average daily traffic (AADT) data from the New York State Department of
Transportation on the New York Open Data portal (“Annual Average Daily Traffic (Aadt):
Beginning 1977,” n.d.). We created a spatial overlay of streets and census tracts to assign
each census tract the maximum AADT value of its underlying streets in 2015.
We used the spdep::poly2nb R function to assign adjacency between census tracts, allowing
water boundaries. We manually added contiguity between the Rockaway peninsula and the
rest of Queens (which are separated by a Jamaica Bay, a large body of water) for a fully
connected map. We excluded parks, cemeteries, and any other census tracts for which the
population of children between ages 5 and age 18 was five or fewer resident children.
Analysis—We computed descriptive statistics and applied the BYM2 model in Stan to
create smoothed estimates of youth pedestrian crash rates while quantifying the effects of
pedestrian and public transit commute methods, traffic density, income, and social
fragmentation. We log transformed both traffic counts and income in order to normalize
their distributions, as the model initially failed to converge with the non-transformed data.
The specification of the model was as follows, where the unit of analysis is census tracts:
Morris et al. Page 18
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
y = count of school age pedestrians ages 5–18 injured in traffic crashes.
x1 (“pct_commute”) = percent commuters using means other than private vehicle
(i.e. walking, bicycling, or public transit).
x2 (“log_income”) = log of median household income.
x3 (“std_frag_index”) = standardized index of social fragmentation (vacancy,
rentals, living alone, recently moved).
x4 (“log_aadt”) = log of maximum AADT value in each tract in 2015.
an offset term for the youth population ages 5–18 in each census tract.
the BYM2 convolved random effects term, comprising parameters rho, phi, and
theta, the proportion of spatial variance, the spatial ICAR term, and the non-
spatial vector of normal random variates, respectively.
the overall variance of the convolved random effects term sigma.
Results
Descriptive Statistics—From 2005–2014 there were 17,529 crashes (1,753 per year, on
average) injuring school age pedestrians in NYC (Figure 3) of which 17,193 (98.1%)
occurred in populated census tracts. There was a range of 0 to 57 and a median of 6 such
crashes per census tract (Table 3), which exhibited a strong Poisson distribution (Figure 4).
Table 3 summarizes the distribution of pedestrian injuries (ages 5–18 yrs) and
sociodemographic measures by NYC census tracts having youth population > 5 (n=2095).
Median household income ranged in these tracts from $9,000 to $232,000 in 2010–2014,
with a median of $53,890. The proportion of workers who traveled to work by means other
than a private vehicle (e.g. walking, bicycling, or taking public transportation) ranged from
10% to 100%, and was heavily right-skewed with most census tracts having >50% of
workers commuting by walking, bicycling, or using public transportation. By definition,
social fragmentation was centered around zero. The census tract maximums of annual
average daily traffic volumes (AADT) per underlying segment within each tract ranged from
800 to 277,000 vehicles per day with most tracts having maximum AADT counts below
50,000 vehicles. Histograms and maps of each of these measures are included in a
supplemental appendix.
Model Results—To fit the BYM2 model to the New York City pedestrian crash data using
RStan we ran 4 chains of 2000 iterations each where the first 1000 draws were warmup and
the last 1000 draws were saved as output for a total of 4000 draws from the posterior.
Running the 4 chains in parallel on a MacBook Pro laptop computer with a dual-core 3.1
GHz processor and 16GB of memory took 21 minutes, as measured by R’s proc.time
function. The reported elapsed time (time from start to finish) was 1296 seconds and the user
processing time (total time across all threads) was 4839 seconds, thus running 4 chains
sequentially would take 4 times as long. The RStan function check_hmc_diagnostics found
no problems encountered by the sampler and the RHat values for all parameters were
extremely close to 1.0, indicating that the model had successfully converged.
Morris et al. Page 19
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
To carry out posterior predictive checking on the fitted model we first obtain the generated
quantity y_rep from the stanfit object, and then use Stan’s bayesplot (R) package to generate
a visual comparison of the data
y
and the simulated new data y using the ppc_dens_overlay
function.
The dark line is the distribution of the observed outcomes y and each of the 50 lighter lines
is the kernel density estimate of one of one of the rows in y_rep. The lighter overlays follow
the distribution of y, with a tendency towards the mean value. To further investigate, we use
the ppc_stat function which plots the distribution of y_rep against the distribution of the data
y for some test statistic. The default test statistic is the mean, therefore we run the command
as: ppc_stat(y, y_rep). As with the density plot, the test statistic plot indicates that the BYM2
model fits the data.
Table 4 shows the summary of the parameter estimates from the fitted model. Because the
commute data is recorded as a percentage between 0 and 1 and not on the scale 0 to 100, it is
necessary to divide the “pct_commute” regression coefficient by 100 in order to properly
interpret its contribution. Thus for every percentage point increase in population commuting
by means other than a private vehicle, there was a exp(0.005) = 0.5% increase in the
expected count of youth pedestrian injuries, controlling for income, vehicular traffic, social
fragmentation, and population. The credible interval ranged from a exp(0.002) = 0.2% to
exp(0.009) = 0.9% increase in pedestrian injuries per percentage point increase in on-foot
commuters. There was a 1.2% decrease in youth pedestrian injuries per 10% increase in
median household income. Social fragmentation was also significantly associated with youth
pedestrian injuries, with an exp(0.1) = 10% increase in youth pedestrian injuries per standard
deviation increase in the combined index (i.e. vacancy, non-owner occupied housing, recent
moves, and householder living alone), controlling for other model covariates. The credible
interval for the effect of daily traffic included zero in our fully adjusted model after
controlling for social fragmentation and pedestrian/bicyclist/public transit commute rates.
The parameter sigma, the overall variance of the combined random effects term was exp(0.8)
= 2.2, indicating substantial overall variance. Nearly half of that variance, parameter rho,
was spatially structured exp(0.4) = 49%.
From the fitted model we also obtain the quantity of interest mu, which is the estimate of
school-age pedestrian injuries for each populated census tract. Because BYM models
contain both a spatial and non-spatial random effects component, they are able to account
for almost all of the over-dispersion not modelled by the Poisson variates (Figure 6). The
overall burden of youth pedestrian injuries was most heavily concentrated in the Bronx,
northern Manhattan/Harlem, and central Brooklyn, as well as some pockets of Queens and
Staten Island, (Figure 7).
Discussion of the Case Study—NYC has embraced initiatives such as the national Safe
Routes to School program (“Safe Routes,” n.d.) and Vision Zero (“Vision Zero: Traffic
Safety by Sweden,” n.d.) in order to build on progress to date, recognizing that no traffic
fatality should be considered acceptable. Our analysis explored spatial associations with
traffic crashes injuring school age pedestrians in NYC, while illustrating the utility of Stan
for computationally-intensive hierarchical modeling.
Morris et al. Page 20
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
We found that income and social fragmentation were significant predictors of risk; that is,
accounting for traffic and pedestrian-dominant travel, there was an effect of transience in
low-income communities having high levels of rentals, vacancies, relocations, and residents
living alone. All of these data are publicly available and frequently updated via the
American Community Survey for census tracts throughout the US, which may provide
communities with a useful tool to help identify areas of increased risk for pedestrian injury
in the absence of other readily available data.
Spatial correlation, that is the tendency of higher rates of injury to cluster around other areas
with high injury rates, played a moderate role in risk prediction for youth pedestrian injuries
in NYC. About half of the random variance in our model was attributable to spatial
correlation, accounting for commute method, income, traffic, and social fragmentation.
Accordingly, the fitted map demonstrated gradual spatial smoothing, lending stability to the
visualization of areas of high risk for a relatively rare outcome at the fine spatial scale of
census tracts.
Despite the complexity of our model and the large number of samples, the model
successfully converged in Stan in 21 minutes on an ordinary desktop computer. Moreover, as
described in the model development portion of this manuscript, we believe the use of
Hamiltonian Monte Carlo simulation provides an improved method of sampling the
posterior distribution compared with Gibbs and similar random-walk style Markov chain
sampling.
Notably, the neighborhoods of central Brooklyn, the Bronx and northern Manhattan are
more predominant in our map of youth pedestrian injuries compared with maps of total
pedestrian injuries, which are relatively more concentrated in central and lower Manhattan
(Viola, Roe, and Shin 2010). One primary reason for the differences between youth and all-
age pedestrian injury maps is most likely the influence of daytime commuter population
influx in central and lower Manhattan, as noted by Viola et al (2010). Areas having high
frequencies of youth pedestrian injuries also tend to overlap with areas having the largest
youth populations in NYC.
Because our analysis included a population offset term, we excluded large parks such as
Central Park in Manhattan, Prospect Park in Brooklyn, and Van Cortlandt Park in the Bronx,
even though such parks have both vehicular and pedestrian travel. Census tracts, moreover,
are defined for the purposes of counting residents, and their boundaries do not necessarily
have etiologic relevance to the study of pedestrian injuries (i.e. there can be diverse road
types and traffic patterns within a census tract). In our previous work, for example, we have
demonstrated the utility of virtual street audits to identify specific features of the built
environment associated with pedestrian crashes at smaller spatial scales (Mooney et al.
2016). Future work should explore the impacts of infrastructure and other safety
interventions on localized pedestrian crash rates, capitalizing on the efficiency of BYM
modeling in Stan.
Conclusion of the Case Study—Stan proved to be an efficient and precise platform to
build a hierarchical spatial model for youth pedestrian injuries in NYC. We confirmed prior
Morris et al. Page 21
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
findings that neighborhoods with higher social fragmentation and lower median incomes are
disproportionately affected by pedestrian injuries. Our findings also demonstrate that the
proportion of workers commuting to work by walking, bicycling, and public transit is
correlated with youth pedestrian risk. This nationally and publicly available metric may
serve as a useful surrogate index of pedestrian density in the absence of other readily
available data. Finally, the performance and results obtained using Stan demonstrate its
utility and strength for future spatial and spatiotemporal epidemiologic research, especially
with large datasets.
Disclaimer—This report utilizes information which was originally compiled by the New
York City Department of Transportation (DOT) for governmental purposes; the information
has subsequently been stratified and aggregated for analysis by the authors of this
manuscript. DOT and the City of New York make no representation as to the accuracy or
usefulness of the information provided by this application or the information’s suitability for
any purpose and disclaim any liability for omissions or errors that may be contained therein.
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
References
“American Factfinder.” n.d. US Census Bureau. https://factfinder.census.gov/faces/nav/jsf/pages/
index.xhtml.
“Annual Average Daily Traffic (Aadt): Beginning 1977.” n.d. New York State Department of
Transportation, Highway Data Services Bureau. http://data.ny.gov/Transportation/Annual-Average-
Daily-Traffic-AADT-by-Roadway-Segme/8e88-2p29.
Bernardinelli L, Clayton D, and Montomoli C. 1995 “Bayesian Estimates of Disease Maps: How
Important Are Priors?” Statistics in Medicine 14 (21–22). Wiley Online Library: 2411–31.
[PubMed: 8711278]
Besag Julian. 1974 “Spatial Interaction and the Statistical Analysis of Lattice Systems” Journal of the
Royal Statistical Society. Series B (Methodological). JSTOR, 192–236.
Besag J, York J, and Mollié A. 1991 “Bayesian Image Restoration with Two Applications in Spatial
Statistics.” Ann Inst Stat Math 43: 1–59. https://doi.org/10.1007.
Betancourt Michael. 2017 “Diagnosing Biased Inference with Divergences.” 2017 https://
betanalpha.github.io/assets/case_studies/identifying_mixture_models.html.
Clayton David, and Kaldor John. 1987 “Empirical Bayes Estimates of Age-Standardized Relative
Risks for Use in Disease Mapping” Biometrics. JSTOR, 671–81.
Congdon Peter. 2012 “Assessing the Impact of Socioeconomic Variables on Small Area Variations in
Suicide Outcomes in England” International Journal of Environmental Research and Public Health
10 (1). Multidisciplinary Digital Publishing Institute: 158–77. [PubMed: 23271304]
DiMaggio Charles. 2015 “Small-Area Spatiotemporal Analysis of Pedestrian and Bicyclist Injuries in
New York City” Epidemiology 26 (2). LWW: 247–54. [PubMed: 25643104]
Freni-Sterrantino Anna, Ventrucci Massimo, and Rue Håvard. 2018 “A Note on Intrinsic Conditional
Autoregressive Models for Disconnected Graphs” Spatial and Spatio-Temporal Epidemiology 26.
Elsevier: 25–34.
Fung L, and Conderino S. 2017 “Pedestrian Fatalities in New York City.” New York City Department
of Health; Mental Hygiene; Epi Data Brief(86). 2017 http://www1.nyc.gov/assets/doh/
downloads/pdf/epi/databrief86.pdf.
Gabry Jonah, Simpson Daniel, Vehtari Aki, Betancourt Michael, and Gelman Andrew. 2017
“Visualization in Bayesian Workflow.” arXiv Preprint arXiv:1709.01449.
Morris et al. Page 22
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Hoffman Matthew D, and Andrew Gelman. 2014 “The No-U-Turn Sampler: Adaptively Setting Path
Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15 (1): 1593–1623.
Leroux Brian G, Xingye Lei, and Norman Breslow. 2000 “Estimation of Disease Rates in Small Areas:
A New Mixed Model for Spatial Dependence” In Statistical Models in Epidemiology, the
Environment, and Clinical Trials, 179–91. Springer.
Mooney Stephen J, DiMaggio Charles J, Lovasi Gina S, Neckerman Kathryn M, Bader Michael DM,
Teitler Julien O, Sheehan Daniel M, Jack Darby W, and Rundle Andrew G. 2016 “Use of Google
Street View to Assess Environmental Contributions to Pedestrian Injury” American Journal of
Public Health 106 (3). American Public Health Association: 462–69. [PubMed: 26794155]
Riebler Andrea, Sigrunn H Sørbye Daniel Simpson, and Rue Håvard. 2016 “An Intuitive Bayesian
Spatial Model for Disease Mapping That Accounts for Scaling” Statistical Methods in Medical
Research 25 (4). SAGE Publications Sage UK: London, England: 1145–65. [PubMed: 27566770]
“Safe Routes.” n.d. National Center for Safe Routes to School. http://www.saferoutesinfo.org/.
Simpson Daniel, Rue Håvard, Riebler Andrea, Martins Thiago G, Sørbye Sigrunn H, and others. 2017
“Penalising Model Component Complexity: A Principled, Practical Approach to Constructing
Priors” Statistical Science 32 (1). Institute of Mathematical Statistics: 1–28.
Stan Development Team. 2019 “Stan User’s Guide.” 2019 https://mcstan.org/docs/stan-users-guide/
index.html.
Viola Rob, Roe Matthew, and Shin Hyeon-Shic. 2010 “New York City Pedestrian Safety Study &
Action Plan.” New York (NY). Dept. of Transportation.
“Vision Zero: Traffic Safety by Sweden.” n.d. http://www.visionzeroinitiative.com/.
Morris et al. Page 23
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 1:
Correlation Matrix for NYC Census Tracts ordered by Borough, Tract ID
Morris et al. Page 24
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 2:
Correlations Between Queens and Staten Island Census Tracts
Morris et al. Page 25
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 3:
School age pedestrians injured in traffic crashes, NYC 2005–2014
Morris et al. Page 26
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 4:
Histogram of school-age pedestrian injury counts per census tract, NYC 2005–2014
Morris et al. Page 27
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 5:
Posterior Predictive Check, density overlay
Morris et al. Page 28
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 6:
Posterior Predictive Check, test statistic
Morris et al. Page 29
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 7:
Per-tract injuries, actual counts vs. fitted BYM2 model estimates
Morris et al. Page 30
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 8:
Fitted BYM2 model estimated counts of school-age pedestrian crash injuries, 2005–2014
Morris et al. Page 31
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Morris et al. Page 32
Table 1:
Comparison of running times
User System Elapsed
Stan 0.46 0.17 11.73
BUGS 8.53 0.74 15.36
INLA 0.81 0.68 1.65
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Morris et al. Page 33
Table 2:
Comparison of parameter estimates
Mean Lower 5% CI Upper 95% CI
Stan BUGS INLA Stan BUGS INLA Stan BUGS INLA
beta0 −0.20 −0.20 −0.20 −0.41 −0.40 −0.40 −0.01 0.00 0.00
beta1 0.35 0.34 0.35 0.14 0.12 0.13 0.55 0.56 0.56
tau 1.81 1.78 1.80 1.01 1.04 1.03 2.88 2.78 2.86
phi1 1.17 1.17 1.18 0.68 0.69 0.70 1.65 1.63 1.66
phi2 1.11 1.12 1.11 0.81 0.81 0.80 1.41 1.42 1.42
phi3 1.02 1.03 1.05 0.55 0.56 0.57 1.46 1.47 1.49
phi55 −0.57 −0.56 −0.55 −1.10 −1.12 −1.08 −0.09 −0.08 −0.06
phi56 −0.47 −0.48 −0.46 −1.01 −1.00 −0.99 0.03 0.02 0.04
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Morris et al. Page 34
Table 3:
Distribution of measures by census tract
med min mean max
Youth pedestrian injuries, 2005–14 6 0 8.2 57
Population ages 5–18 years, 2010 510 6 596.4 3,315
Med. household income in USD, 2010–14 $53,890 $9,327 $58,497 $232,266
Pct. commute by walk/cycle/public trans, 2010–14 73.9 9.7 69.8 100
Standardized social fragmentation index −0.1 −6.7 0 18.7
Traffic Volume (AADT), 2015 19,178 843 37,248 276,476
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Morris et al. Page 35
Table 4:
Parameter estimates from BYM2 Model
mean se_mean sd X2.5 X97.5 N_eff R.hat
intercept −3.5 0 0.5 −4.5 −2.5 1255 1
commute 0.5 0 0.2 0.2 0.9 777 1
log income −0.1 0 0.0 −0.2 0.0 1204 1
std frag index 0.1 0 0.0 0.0 0.1 1527 1
log traffic 0.0 0 0.0 0.0 0.0 2551 1
rho 0.4 0 0.1 0.3 0.5 219 1
sigma 0.8 0 0.0 0.8 0.9 301 1
Spat Spatiotemporal Epidemiol
. Author manuscript; available in PMC 2020 November 01.
... This is usually denoted as Besag-York-Mollié (BYM) model (Besag et al., 1991), and in this case, v j follows an Intrinsic Conditional Autoregressive (ICAR) distribution that smoothes the data according to a certain neighborhood structure (Moraga, 2019). The BYM model presents some difficulties, though, as discussed in Banerjee et al. (2003); for instance, the non-identifiability of v, and the problem in setting sensible priors on the precision of the random effects (Morris et al., 2019). In this regard, a model with different parameterization, namely BYM2 (Riebler et al., 2016), can be employed. ...
... In this regard, a model with different parameterization, namely BYM2 (Riebler et al., 2016), can be employed. In the BYM2 model, and regarding the two random effects, a single precision parameter is assigned to the combined components, and a mixing parameter determines the amount of spatial and non-spatial variation (Morris et al., 2019). ...
... As an alternative, CARBayes (Lee, 2013) can be used when fitting, among others, a BYM or a Leroux (Leroux et al., 2000) model. Also, we can fit the BYM2 model using RStan (Stan Development Team, 2021;Morris et al., 2019). Both CARBayes and RStan are based on Markov chain Monte Carlo (MCMC) methods. ...
Article
Full-text available
Major infectious diseases such as COVID-19 have a significant impact on population lives and put enormous pressure on healthcare systems globally. Strong interventions, such as lockdowns and social distancing measures, imposed to prevent these diseases from spreading, may also negatively impact society, leading to jobs losses, mental health problems, and increased inequalities, making crucial the prioritization of riskier areas when applying these protocols. The modeling of mobility data derived from contact-tracing data can be used to forecast infectious trajectories and help design strategies for prevention and control. In this work, we propose a new spatial-stochastic model that allows us to characterize the temporally varying spatial risk better than existing methods. We demonstrate the use of the proposed model by simulating an epidemic in the city of Valencia, Spain, and comparing it with a contact tracing-based stochastic compartment reference model. The results show that, by accounting for the spatial risk values in the model, the peak of infected individuals, as well as the overall number of infected cases, are reduced. Therefore, adding a spatial risk component into compartment models may give finer control over the epidemic dynamics, which might help the people in charge to make better decisions.
... We followed the implementation of the modified Besag-York-Mollie (BYM2) model described by Morris et al. [21][22][23] to incorporate the spatial structure of municipalities in our model. We selected the BYM2 model because it does not presume spatial autocorrelation; the model includes a mixing parameter which distributes random variance in model estimates across spatial and non-spatial effects, making it an appropriate choice when the presence of spatial autocorrelation is uncertain [24]. ...
... For municipality i, where φ i represents an intrinsic conditional auto-regressive spatial component, θ i is a non-spatial random component, ρ models the proportion of variance from the spatially structured effect, and S is a scaling factor computed from the spatial adjacency matrix. The scaling factor is chosen such that the Var(φ i ) � 1; additionally, the prior on θ is fixed such that Var(θ i ) � 1, making σ the overall standard deviation of the combined random effects component [21][22][23]. ...
Article
Full-text available
Reliable subnational estimates of TB incidence would allow national policy makers to focus disease control resources in areas of highest need. We developed an approach for generating small area estimates of TB incidence, and the fraction of individuals missed by routine case detection, based on available notification and mortality data. We demonstrate the feasibility of this approach by creating municipality-level burden estimates for Brazil. We developed a mathematical model describing the relationship between TB incidence and TB case notifications and deaths, allowing for known biases in each of these data sources. We embedded this model in a regression framework with spatial dependencies between local areas, and fitted the model to municipality-level case notifications and death records for Brazil during 2016–2018. We estimated outcomes for 5568 municipalities. Incidence rate ranged from 8.6 to 57.2 per 100,000 persons/year for 90% of municipalities, compared to 44.8 (95% UI: 43.3, 46.8) per 100,000 persons/year nationally. Incidence was concentrated geographically, with 1% of municipalities accounting for 50% of incident TB. The estimated fraction of incident TB cases receiving diagnosis and treatment ranged from 0.73 to 0.95 across municipalities (compared to 0.86 (0.82, 0.89) nationally), and the rate of untreated TB ranged from 0.8 to 72 cases per 100,000 persons/year (compared to 6.3 (4.8, 8.3) per 100,000 persons/year nationally). Granular disease burden estimates can be generated using routine data. These results reveal substantial subnational differences in disease burden and other metrics useful for designing high-impact TB control strategies.
... INLA framework was used to fit Bayesian Hierarchical model via conditional autoregressive (CAR) model and Gaussian or Independent and identically distributed (iid) model. The CAR model was specified to account for autocorrelation (spatial clustering) by assessing whether the areal data depicted a spatial structure such that observations from neighboring regions exhibited higher correlation than regions far apart [18,19]. ...
Article
Full-text available
Ghana might not meet the SDGs target 3.2 of reducing neonatal mortality to 12 deaths per 1000 live births by 2030. Identifying core determinants of neonatal deaths provide policy guidelines and a framework aimed at mitigating the effect of neonatal deaths. Most studies have identified household and individual-level factors that contribute to neonatal mortality. However, there are relatively few studies that have rigorously assessed geospatial covariates and spatiotemporal variations of neonatal deaths in Ghana. This study focuses on modeling and mapping of spatiotemporal variations in the risk of neonatal mortality in Ghana using Bayesian Hierarchical Spatiotemporal models. This study used data from the Ghana Demographic and Health Surveys (GDHS) conducted in 1993, 1998, 2003, 2008, and 2014. We employed Bayesian Hierarchical Spatiotemporal regression models to identify geospatial correlates and spatiotemporal variations in the risk of neonatal mortality. The estimated weighted crude neonatal mortality rate for the period under consideration was 33.2 neonatal deaths per 1000 live births. The results obtained from Moran’s I statistics and CAR model showed the existence of spatial clustering of neonatal mortality. The map of smooth relative risk identified Ashanti region as the most consistent hot-spot region for the entire period under consideration. Small body size babies contributed significantly to an increased risk of neonatal mortality at the regional level [Posterior Mean: 0.003 (95% CrI: 0.00,0.01)]. Hot spot GDHS clusters exhibiting high risk of neonatal mortality were identified by LISA cluster map. Rural residents, small body size babies, parity, and aridity contributed significantly to a higher risk of neonatal mortality at the GDHS cluster level. The findings provide actionable and insightful information to prioritize and distribute the scarce health resources equitably to tackle the menace of neonatal mortality. The regions and GDHS clusters with excess risk of neonatal mortality should receive optimum attention and interventions to reduce the neonatal mortality rate.
... However, in the BHIS, area sample sizes are small and samples are not available from each area, therefore smoothing over the areas may be considered. Bayesian hierarchical models offer a flexible approach to fitting such models and are used in this setting [12], in particular, a so-called Besag-York-Mollie (BYM) model that assigns a Conditional Autoregressive (CAR) distribution to the random effect to account for proximity of municipalities that share a common boundary [13][14][15]. A penalizing Complexity (PC) prior, using a scaled spatially structured component and an unstructured component [15] was used for the random effects. ...
Article
Full-text available
Background Smoking is one of the leading causes of preventable mortality and morbidity worldwide, with the European Region having the highest prevalence of tobacco smoking among adults compared to other WHO regions. The Belgian Health Interview Survey (BHIS) provides a reliable source of national and regional estimates of smoking prevalence; however, currently there are no estimates at a smaller geographical resolution such as the municipality scale in Belgium. This hinders the estimation of the spatial distribution of smoking attributable mortality at small geographical scale (i.e., number of deaths that can be attributed to tobacco). The objective of this study was to obtain estimates of smoking prevalence in each Belgian municipality using BHIS and calculate smoking attributable mortality at municipality level. Methods Data of participants aged 15 + on smoking behavior, age, gender, educational level and municipality of residence were obtained from the BHIS 2018. A Bayesian hierarchical Besag-York-Mollie (BYM) model was used to model the logit transformation of the design-based Horvitz-Thompson direct prevalence estimates. Municipality-level variables obtained from Statbel, the Belgian statistical office, were used as auxiliary variables in the model. Model parameters were estimated using Integrated Nested Laplace Approximation (INLA). Deviance Information Criterion (DIC) and Conditional Predictive Ordinate (CPO) were computed to assess model fit. Population attributable fractions (PAF) were computed using the estimated prevalence of smoking in each of the 589 Belgian municipalities and relative risks obtained from published meta-analyses. Smoking attributable mortality was calculated by multiplying PAF with age-gender standardized and stratified number of deaths in each municipality. Results BHIS 2018 data included 7,829 respondents from 154 municipalities. Smoothed estimates for current smoking ranged between 11% [Credible Interval 3;23] and 27% [21;34] per municipality, and for former smoking between 4% [0;14] and 34% [21;47]. Estimates of smoking attributable mortality constituted between 10% [7;15] and 47% [34;59] of total number of deaths per municipality. Conclusions Within-country variation in smoking and smoking attributable mortality was observed. Computed estimates should inform local public health prevention campaigns as well as contribute to explaining the regional differences in mortality.
... In undirected dependency structures, influence of adjacent groups can go both ways, with spatial autocorrelation being the most common example [17,91,182]. For example, in (spatial) conditional autoregressive (CAR) structures [17], we could write down the prior on the group coefficients as ...
Preprint
Full-text available
Probabilistic (Bayesian) modeling has experienced a surge of applications in almost all quantitative sciences and industrial areas. This development is driven by a combination of several factors, including better probabilistic estimation algorithms, flexible software, increased computing power, and a growing awareness of the benefits of probabilistic learning. However, a principled Bayesian model building workflow is far from complete and many challenges remain. To aid future research and applications of a principled Bayesian workflow, we ask and provide answers for what we perceive as two fundamental questions of Bayesian modeling, namely (a) ``What actually \emph{is} a Bayesian model?'' and (b) ``What makes a \emph{good} Bayesian model?''. As an answer to the first question, we propose the PAD model taxonomy that defines four basic kinds of Bayesian models, each representing some combination of the assumed joint distribution of all (known or unknown) variables (P), a posterior approximator (A), and training data (D). As an answer to the second question, we propose ten utility dimensions according to which we can evaluate Bayesian models holistically, namely, (1) causal consistency, (2) parameter recoverability, (3) predictive performance, (4) fairness, (5) structural faithfulness, (6) parsimony, (7) interpretability, (8) convergence, (9) estimation speed, and (10) robustness. Further, we propose two example utility decision trees that describe hierarchies and trade-offs between utilities depending on the inferential goals that drive model building and testing.
... Joint priors are a Bayesian success story more generally. For example, joint priors can help to improve the predictive performance of individual additive terms parameterized by more than one parameter, for example, splines, spatial, or monotonic effects (Bürkner and Charpentier, 2020;Wood, 2017;Morris et al., 2019). With very few exceptions (Fuglstad et al., 2019;Goodrich et al., 2020), the development of joint priors for Bayesian multilevel models has been limited to individual additive terms. ...
Preprint
Full-text available
The training of high-dimensional regression models on comparably sparse data is an important yet complicated topic, especially when there are many more model parameters than observations in the data. From a Bayesian perspective, inference in such cases can be achieved with the help of shrinkage prior distributions, at least for generalized linear models. However, real-world data usually possess multilevel structures, such as repeated measurements or natural groupings of individuals, which existing shrinkage priors are not built to deal with. We generalize and extend one of these priors, the R2-D2 prior by Zhang et al. (2020), to linear multilevel models leading to what we call the R2-D2-M2 prior. The proposed prior enables both local and global shrinkage of the model parameters. It comes with interpretable hyperparameters, which we show to be intrinsically related to vital properties of the prior, such as rates of concentration around the origin, tail behavior, and amount of shrinkage the prior exerts. We offer guidelines on how to select the prior's hyperparameters by deriving shrinkage factors and measuring the effective number of non-zero model coefficients. Hence, the user can readily evaluate and interpret the amount of shrinkage implied by a specific choice of hyperparameters. Finally, we perform extensive experiments on simulated and real data, showing that our prior is well calibrated, has desirable global and local regularization properties and enables the reliable and interpretable estimation of much more complex Bayesian multilevel models than was previously possible.
Article
Full-text available
An increasing share of urinary tract infections (UTIs) are caused by extraintestinal pathogenic Escherichia coli (ExPEC) lineages that have also been identified in poultry and hogs with high genetic similarity to human clinical isolates. We investigated industrial food animal production as a source of uropathogen transmission by examining relationships of hog and poultry density with emergency department (ED) visits for UTIs in North Carolina (NC). ED visits for UTI in 2016–2019 were identified by ICD-10 code from NC's ZIP code-level syndromic surveillance system and livestock counts were obtained from permit data and aerial imagery. We calculated separate hog and poultry spatial densities (animals/km²) by Census block with a 5 km buffer on the block perimeter and weighted by block population to estimate mean ZIP code densities. Associations between livestock density and UTI incidence were estimated using a reparameterized Besag-York-Mollié (BYM2) model with ZIP code population offsets to account for spatial autocorrelation. We excluded metropolitan and offshore ZIP codes and assessed effect measure modification by calendar year, ZIP code rurality, and patient sex, age, race/ethnicity, and health insurance status. In single-animal models, hog exposure was associated with increased UTI incidence (rate ratio [RR]: 1.21, 95 % CI: 1.07–1.37 in the highest hog-density tertile), but poultry exposure was associated with reduced UTI rates (RR: 0.86, 95 % CI: 0.81–0.91). However, the reference group for single-animal poultry models included ZIP codes with only hogs, which had some of the highest UTI rates; when compared with ZIP codes without any hogs or poultry, there was no association between poultry exposure and UTI incidence. Hog exposure was associated with increased UTI incidence in areas that also had medium to high poultry density, but not in areas with low poultry density, suggesting that intense hog production may contribute to increased UTI incidence in neighboring communities.
Preprint
Full-text available
The objective of disease mapping is to model data aggregated at the areal level. In some contexts, however, (e.g. residential histories, general practitioner catchment areas) when data is arising from a variety of sources, not necessarily at the same spatial scale, it is possible to specify spatial random effects, or covariate effects, at the areal level, by using a multiple membership principle (MM) (Petrof et al. 2020, Gramatica et al. 2021). A weighted average of conditional autoregressive (CAR) spatial random effects embeds spatial information for a spatially-misaligned outcome and estimate relative risk for both frameworks (areas and memberships). In this paper we investigate the theoretical underpinnings of these application of the multiple membership principle to the CAR prior, in particular with regard to parameterisation, properness and identifiability. We carry out simulations involving different numbers of memberships as compared to number of areas and assess impact of this on estimating parameters of interest. Both analytical and simulation study results show under which conditions parameters of interest are identifiable, so that we can offer actionable recommendations to practitioners. Finally, we present the results of an application of the multiple membership model to diabetes prevalence data in South London, together with strategic implications for public health considerations
Article
Full-text available
In this note we discuss (Gaussian) intrinsic conditional autoregressive (CAR) models for disconnected graphs, with the aim of providing practical guidelines for how these models should be defined, scaled and implemented. We show how these suggestions can be implemented in two examples on disease mapping.
Article
Full-text available
In recent years, disease mapping studies have become a routine application within geographical epidemiology and are typically analysed within a Bayesian hierarchical model formulation. A variety of model formulations for the latent level have been proposed but all come with inherent issues. In the classical BYM (Besag, York and Mollié) model, the spatially structured component cannot be seen independently from the unstructured component. This makes prior definitions for the hyperparameters of the two random effects challenging. There are alternative model formulations that address this confounding; however, the issue on how to choose interpretable hyperpriors is still unsolved. Here, we discuss a recently proposed parameterisation of the BYM model that leads to improved parameter control as the hyperparameters can be seen independently from each other. Furthermore, the need for a scaled spatial component is addressed, which facilitates assignment of interpretable hyperpriors and make these transferable between spatial applications with different graph structures. The hyperparameters themselves are used to define flexible extensions of simple base models. Consequently, penalised complexity priors for these parameters can be derived based on the information-theoretic distance from the flexible model to the base model, giving priors with clear interpretation. We provide implementation details for the new model formulation which preserve sparsity properties, and we investigate systematically the model performance and compare it to existing parameterisations. Through a simulation study, we show that the new model performs well, both showing good learning abilities and good shrinkage behaviour. In terms of model choice criteria, the proposed model performs at least equally well as existing parameterisations, but only the new formulation offers parameters that are interpretable and hyperpriors that have a clear meaning.
Technical Report
Full-text available
In recent years, disease mapping studies have become a routine application within geographical epidemiology and are typically analysed within a Bayesian hierarchical model formulation. A variety of model formulations for the latent level have been proposed but all come with inherent issues. In the classical BYM (Besag, York and Mollié) model, the spatially structured component cannot be seen independently from the unstructured component. This makes prior definitions for the hyperparameters of the two random effects challenging. There are alternative model formulations that address this confounding, however, the issue on how to choose interpretable hyperpriors is still unsolved. Here, we discuss a recently proposed parameterisation of the BYM model that leads to improved parameter control as the hyperparameters can be seen independently from each other. Furthermore, the need for a scaled spatial component is addressed, which facilitates assignment of interpretable hyperpriors and make these transferable between spatial applications with different graph structures. The hyperparameters themselves are used to define flexible extensions of simple base models. Consequently, penalised complexity (PC) priors for these parameters can be derived based on the information-theoretic distance from the flexible model to the base model, giving priors with clear interpretation. We provide implementation details for the new model formulation which preserve sparsity properties, and we investigate systematically the model performance and compare it to existing parameterisations. Through a simulation study, we show that the new model performs well, both showing good learning abilities and good shrinkage behaviour. In terms of model choice criteria, the proposed model performs at least equally well as existing parameterisations, but only the new formulation offers parameters that are interpretable and hyperpriors that have a clear meaning.
Article
Full-text available
The issue of setting prior distributions on model parameters, or to attribute uncertainty for model parameters, is a difficult issue in applied Bayesian statistics. Although the prior distribution should ideally encode the users' prior knowledge about the parameters, this level of knowledge transfer seems to be unattainable in practice and applied statisticians are forced to search for a "default" prior. Despite the development of objective priors, which are only available explicitly for a small number of highly restricted model classes, the applied statistician has few practical guidelines to follow when choosing the priors. An easy way out of this dilemma is to re-use prior choices of others, with an appropriate reference. In this paper, we introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user-defined scaling parameter for that model component. These priors are invariant to reparameterisations, have a natural connection to Jeffreys' priors, are designed to support Occam's razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions.
Article
Full-text available
Ecological studies of suicide and self-harm have established the importance of area variables (e.g., deprivation, social fragmentation) in explaining variations in suicide risk. However, there are likely to be unobserved influences on risk, typically spatially clustered, which can be modeled as random effects. Regression impacts may be biased if no account is taken of spatially structured influences on risk. Furthermore a default assumption of linear effects of area variables may also misstate or understate their impact. This paper considers variations in suicide outcomes for small areas across England, and investigates the impact on them of area socio-economic variables, while also investigating potential nonlinearity in their impact and allowing for spatially clustered unobserved factors. The outcomes are self-harm hospitalisations and suicide mortality over 6,781 Middle Level Super Output Areas.
Article
Objectives: To demonstrate an information technology-based approach to assess characteristics of streets and intersections associated with injuries that is less costly and time-consuming than location-based studies of pedestrian injury. Methods: We used imagery captured by Google Street View from 2007 to 2011 to assess 9 characteristics of 532 intersections within New York City. We controlled for estimated pedestrian count and estimated the relation between intersections' characteristics and frequency of injurious collisions. Results: The count of pedestrian injuries at intersections was associated with the presence of marked crosswalks (80% increase; 95% confidence interval [CI] = 2%, 218%), pedestrian signals (156% increase; 95% CI = 69%, 259%), nearby billboards (42% increase; 95% CI = 7%, 90%), and bus stops (120% increase; 95% CI = 51%, 220%). Injury incidence per pedestrian was lower at intersections with higher estimated pedestrian volumes. Conclusions: Consistent with in-person study observations, the information-technology approach found traffic islands, visual advertising, bus stops, and crosswalk infrastructures to be associated with elevated counts of pedestrian injury in New York City. Virtual site visits for pedestrian injury control studies are a viable and informative methodology. (Am J Public Health. Published online ahead of print January 21, 2016: e1-e8. doi:10.2105/AJPH.2015.302978).
Article
This study quantifies the spatiotemporal risk of pedestrian and bicyclist injury in New York City at the census tract level over a recent 10-year period, identifies areas of increased risk, and evaluates the role of socioeconomic and traffic-related variables in injury risk. Crash data on 140,835 pedestrian and bicyclist injuries in 1908 census tracts from 2001 to 2010 were obtained from the New York City Department of Transportation. We analyzed injury counts within census tracts with Bayesian hierarchical spatial models using integrated nested Laplace approximations. The model included variables for social fragmentation, median household income, and average vehicle speed and traffic density, as well as a spatially unstructured random effect term, a spatially structured conditional autoregression term, a first-order random walk-correlated time variable, and an interaction term for time and place. Incidence density ratios, credible intervals, and probability exceedances were calculated and mapped. The yearly rate of crashes involving injuries to "pedestrians" (including bicyclists) decreased 16.2% over the study period, from 23.7 per 10,000 population to 16.2 per 10,000. The temporal term in the spatiotemporal model indicated that much of the decrease over the study period occurred during the first 4 years of the study period. Despite an overall decrease, the model identified census tracts that were at persistently high risk of pedestrian injury throughout the study period, as well as areas that experienced sporadic annual increases in risk. Aggregate social, economic, and traffic-related measures were associated with pedestrian injury risk at the ecologic level. Every 1-unit increase in a standardized social fragmentation index was associated with a 19% increase in pedestrian injury risk (incidence density ratio = 1.19 [95% credible interval = 1.16 - 1.23]), and every 1 standardized unit increase in traffic density was associated with a 20% increase in pedestrian injury risk (1.20 [1.15 - 1.26]). Each 10-mile-per-hour increase in average traffic speed in a census tract was associated with a 24% decrease in pedestrian injury risk (0.76 [0.69 - 0.83]). The risk of a pedestrian or bicyclist being struck by a motor vehicle in New York City decreased from 2001 to 2004 and held fairly steady thereafter. Some census tracts in the city did not benefit from overall reductions or experienced sporadic years of increased risk compared with the city as a whole. Injury risk at the census tract level was associated with social, economic, and traffic-related factors.
Article
A new model is proposed for spatial dependence that includes separate parameters for overdispersion and the strength of spatial dependence. The new dependence structure is incorporated into a generalized linear mixed model useful for the estimation of disease incidence rates in small geographic regions. The mixed model allows for log-linear covariate adjustment and local smoothing of rates through estimation of the spatially correlated random effects. Computer simulation studies compare the new model with the following sub-models: intrinsic autoregression, an independence model, and a model with no random effects. The major finding was that regression coefficient estimates based on fitting intrinsic autoregression to independent data can have very low precision compared with estimates based on the full model. Additional simulation studies demonstrate that penalized quasi-likelihood (PQL) estimation generally performs very well although the estimates are slightly biased for very small counts.
Article
The formulation of conditional probability models for finite systems of spatially interacting random variables is examined. A simple alternative proof of the Hammersley–Clifford theorem is presented and the theorem is then used to construct specific spatial schemes on and off the lattice. Particular emphasis is placed upon practical applications of the models in plant ecology when the variates are binary or Gaussian. Some aspects of infinite lattice Gaussian processes are discussed. Methods of statistical analysis for lattice schemes are proposed, including a very flexible coding technique. The methods are illustrated by two numerical examples. It is maintained throughout that the conditional probability approach to the specification and analysis of spatial interaction is more attractive than the alternative joint probability approach.