Content uploaded by Robert Lund
Author content
All content in this area was uploaded by Robert Lund on Apr 22, 2018
Content may be subject to copyright.
Content uploaded by Robert Lund
Author content
All content in this area was uploaded by Robert Lund on Apr 22, 2018
Content may be subject to copyright.
A Large Scale Spatio-temporal Binomial
Regression Model for Estimating
Seroprevalence Trends
Stella Watson Self1, Christopher McMahan1,∗, D. Andrew Brown1,
Robert Lund1, Jenna Gettings2, and Michael Yabsley2,3
1Department of Mathematical Sciences
Clemson University
Clemson, SC 29634-0975
2Southeastern Cooperative Wildlife Disease Study
Department of Population Health
The University of Georgia
Athens, GA 30602
3Warnell School of Forestry and Natural Resources
The University of Georgia
Athens, GA 30602
∗mcmaha2@clemson.edu
Abstract
This paper develops a large-scale Bayesian spatio-temporal binomial regression model for the
purpose of investigating regional trends in antibody prevalence to Borrelia burgdorferi, the
causative agent of Lyme disease. The proposed model uses Gaussian predictive processes to
estimate the spatially varying trends and a conditional autoregressive model to account for
1
spatio-temporal dependence. Careful consideration is made to develop a novel framework
that is scalable to large spatio-temporal data. The proposed model is used to analyze ap-
proximately 16 million Borrelia burgdorferi test results collected on dogs located throughout
the conterminous United States over a sixty month period. This analysis identifies several
regions of increasing canine risk. Specifically, this analysis reveals evidence that Lyme dis-
ease is getting worse in some endemic regions and that it could potentially be spreading to
other non-endemic areas. Further, given the zoonotic nature of this vector-borne disease,
this analysis could potentially reveal areas of increasing human risk.
Keywords: Borrelia burgdorferi, CAR model, chromatic sampling, Gaussian predictive pro-
cesses, Lyme disease
1. Introduction
Lyme disease is a vector-borne disease that impacts both humans and several other mam-
malian species, with domestic dogs being particularly sensitive to infection (Little et al.,
2010). Disease occurs as a result of infection by Borrelia burgdorferi, a spirochetal bacteria
that is transmitted by ticks. Incidence of disease in humans is considered to be emerging,
with a growing number of high incidence counties (Adams, 2017). Humans and dogs are in-
fected by the same vectors (Little et al., 2010), and so, unsurprisingly, the risks of exposure
for both are closely related. In fact, dogs are often considered to be sentinels for the regional
risk of Lyme disease in humans (Mead et al., 2011).
Dogs are tested regularly for exposure to B. burgdorferi as part of their annual wellness
examinations. Commonly, veterinarians use a serologic test that detects antibodies against
the C6 peptide that is present in the blood of infected animals. The presence of C6 is
indicative of an intermediate or late-term infection, and is often detectable 3 to 6 weeks
2
after exposure (Wagner et al., 2012). Among dogs that are infected, only approximately
5% develop any clinical signs of Lyme disease (Levy and Magnarelli, 1992). This practice of
routine testing provides a unique opportunity to measure the seroprevalence of B. burgdorferi
within a relatively healthy canine population visiting veterinary clinics.
Monitoring seroprevalence is useful for many reasons, despite the low incidence of disease.
Directly, it provides an estimate for the risk of exposure within a region, allowing veteri-
narians to make accurate preventative care and testing recommendations. Indirectly, the
seroprevalence of B. burgdorferi can identify the approximate range of the Ixodes spp. tick
vectors. Especially because Ixodes spp. are capable of transmitting several other pathogens,
including Anaplasma,Ehrlichia muris eauclairensis and Babesia microti (Nelder et al., 2016),
several of which are zoonotic. The shared vector allows dogs to serve as sentinels for human
risk. Therefore, modeling trends in canine seroprevalence should inform changing risk of
exposure to B. burgdorferi in humans.
The goal of this paper is to identify US regions that are experiencing an increase in canine
seroprevalence of B. burgdorferi, and by proxy identify regions where the risk of human
exposure could also be increasing. The data analyzed here contain 16,571,562 serologic test
results for B. burgdorferi conducted on domestic dogs in the conterminous United States (US)
from January 2012 - December 2016, aggregated by county and month. Figure 1 displays
the raw prevalence estimates after aggregating over all sixty months in the study; i.e., the
proportion of positive tests. There are 3,109 distinct US counties and county-equivalent
regions in the conterminous US, not all of which report test data every month. Data were
reported from 69,876 county-month pairs. As our goal is to determine where seroprevalence
is increasing, our model must have a temporal trend component that is spatially varying. To
facilitate more reliable inference, the strong positive spatio-temporal dependence of the tests
is also taken into account. The size of this data set and its large spatio-temporal support
3
motivates some of our methodological choices.
Gaussian processes (GPs) are popular geostatistical modeling tools due to their flexibility
and ability to quantify uncertainty in nonparametric regressions (O’Hagan, 1978; Neal, 1998).
Overviews of GP modeling can be found in Cressie (1993), Rasmussen and Williams (2006),
Cressie and Wikle (2011), and Gelfand and Schliep (2016). Banerjee et al. (2015) discuss
Bayesian aspects of GPs. Objective prior specification for GP models is studied in Berger
et al. (2001). GPs have become standard tools in a wide variety of applications, including
oceanography (Jona-Lasinio et al., 2012), water quality analysis (Zhang and El-Shaarawi,
2009), image classification (Morales- ´
Alvarez et al., 2017), neuroimaging (Lazar, 2008), and
computer experiments (Santner et al., 2003). GPs are also used to model disease prevalence,
including dengue fever (Johnson et al., 2017), Malaria (Andrade-Pacheco et al., 2015), and
influenza (Senanayake et al., 2016). Gelfand et al. (2003) used GPs to allow linear model
coefficients to vary smoothly over space, an approach used here to localize regional trends
in B. burgdorferi seroprevalence.
Gaussian process modifications and algorithms for analyzing big spatial data sets have
received significant attention in the recent literature, including fixed rank kriging (Cressie
and Johannesson, 2008) and LatticeKrig (Nychka et al., 2015) approaches. Both methods
employ basis function expansions of spatial random effects to reduce the dimension of the
associated covariance matrices. Katzfuss (2017) took a similar approach by applying basis
functions to a succession of refined resolutions. Spatial partitioning (e.g., Sang et al., 2011;
Heaton et al., 2017a) can be used to split regions into smaller, more manageable sub-regions
with computation being accelerated via a conditional independence assumption. Covariance
tapering (Furrer et al., 2006) uses a covariance function with compact support to induce
sparsity. Nearest neighbor processes (Datta et al., 2016) achieve computational efficiency by
conditioning on a subset of nearby observations. A similar idea was used by Gramacy and
4
Apley (2015) to find the largest number of neighbors computationally feasible for predic-
tion, optimally chosen by minimizing prediction variance. Heaton et al. (2017b) provide an
overview and comparison of these procedures and others. The approach used here involves
Gaussian predictive processes (GPPs) (Banerjee et al., 2008), which are discussed further in
Section 2.
The most common approach for modeling spatially dependent areal data involves Gaus-
sian Markov random fields (GMRFs; Rue and Held, 2005), with Gaussian conditional autore-
gressive (CAR) models (Banerjee et al., 2015) being particularly popular. As special cases of
Markov random fields (Besag, 1974), GMRFs are collections of jointly distributed Gaussian
random variables satisfying a Markov dependence structure quantified through a precision
matrix. GMRFs are extended to flexible degrees of smoothness in Brezger et al. (2007) and
Yue and Speckman (2010). Brown et al. (2017a) adjust the CAR precision matrix to build a
unified model for independent and dependent cases and study neighborhood structures other
than those based on physical adjacency. GMRF and GP connections are explored in Rue
and Tjemland (2002), Song et al. (2008), and Lindgren et al. (2011). CAR models are now
standard in disease mapping problems (e.g., Waller et al., 1997).
To achieve our goals, we develop a large scale spatio-temporal binomial regression model
that has both GPP and CAR components. The former is used to capture spatially varying
trends by treating the trend coefficient as a non-parametric surface over the spatial domain
of interest, while the latter accounts for spatio-temporal correlation. Through data augmen-
tation steps and the use of a novel sampling strategy, we establish a modeling framework that
is computationally scalable to large non-Gaussian spatio-temporal data sets. In particular,
straightforward Gibbs sampling is facilitated via a data augmentation step involving latent
P´olya-Gamma random variables. To avoid computationally expensive matrix calculations,
we use a chromatic sampling strategy in the Gibbs sampler. Our proposed methodology eas-
5
ily handles missing data. The finite sample properties of our proposed approach is studied
via simulation before our B. burgdorferi seroprevalence analysis.
The remainder of this paper is organized as follows: Section 2 describes the model and
our GPP and CAR structures. Section 3 discusses model fitting procedures, emphasizing
computational tractability with large scale spatio-temporal data. Section 4 presents a simu-
lation study supporting our proposed approach, and Section 5 analyzes the canine serology
data described above. We offer concluding remarks in Section 6.
2. Modeling Methods
Let Yst denote the number of cases (e.g., positive B. burgdorferi tests) observed in nst tests
in region sat time t, for s= 1, . . . , S and t= 1, . . . , T . We let Ys= (Ys1, . . . , YsT )0,
Y= (Y0
1,...,Y0
S)0∈RST ,ns= (ns1, . . . , nsT )0, and n= (n0
1,...,n0
S)0∈NST . In addition
to the disease surveillance data, covariates Zstq and Xstp, for q= 1, . . . , Q and p= 1, . . . , P ,
are assumed to be available. The Zstq are covariates whose associated effects are constant
over the study area, while Xstq are covariates whose associated effects vary by region.
To relate the observed test data to the available covariates, a Bayesian generalized
linear mixed model (MuCullagh and Nelder, 1989; Diggle et al., 1998; Banerjee et al.,
2015) is adopted. The general model for our data is a binomial regression: Yst |nst, pst ∼
Binomial(nst, pst ) with
νst := g−1(pst) = Z0
stδ+X0
stβ(`s) + ξst ;s= 1, . . . , S;t= 1, . . . , T, (1)
where g:R→(0,1) is a known link function (e.g., logistic) relating the linear predictor νst
to the prevalence pst,Zst = (1, Zst1, . . . , ZstQ)0∈RQ+1,Xst = (Xst1, . . . , XstP )0∈RP,δ=
(δ0, . . . , δQ)0are global regression coefficients, β(·)=(β1(·), . . . , βP(·))0are spatially varying
6
regression coefficients, `s= (`s1, `s2)0is a vector of spatial coordinates (e.g., latitude and
longitude) that identify the centroid of region s, and ξst is a spatio-temporal random effect.
Following Gelfand et al. (2003), the spatially varying regression coefficients are regarded as
unknown surfaces over the study region. To model these unknown surfaces while maintaining
computational tractability, we use GPPs.
A Gaussian process is a stochastic process whose finite dimensional distributions are
multivariate normal. A GP βp(·)|θp∼ GP (µp(·), C(·,·;θp)) is uniquely determined by
its mean and covariance, µp(`s) := E[βp(`s)] and C(`s,`s0;θp) := Cov(βp(`s), βp(`s0)) =
σ2
pρp(`s,`s0;θp), where ρp(·,·;θp) is a correlation function depending on the parameter vec-
tor θp. For smoothing and interpolation, it is often sufficient to take a constant mean
(Bayarri et al., 2007). In our case, we a priori posit that µp(·)≡0 for all p. Thus,
βp= (βp(`1), . . . , βp(`S))0, S ∈N,follows a multivariate normal distribution with mean 0
and covariance matrix Cp=σ2
pRp, where (Rp)ss0=ρp(`s,`s0;θp). In general, the covariance
matrix inversions and factorizations associated with estimating posterior GPs are O(S3) in
computational complexity and in an MCMC context, these operations are repeated thou-
sands of times. Thus, as Sgrows large, GP’s quickly become computationally prohibitive.
To reduce the dimension of the problem, the Gaussian predictive process (GPP; Banerjee
et al., 2008) considers a “parent” process based on a strategically chosen set of knots, and then
interpolates the process to the points of interest via kriging. Let {`∗
1,...,`∗
S∗
p}denote the knot
set with S∗
pS. Define β∗
p= (βp(`∗
1), . . . , βp(`∗
S∗
p))0and note that β∗
p|σ2
p,θp
ind
∼N(0,C∗
p), for
all p, where C∗
p=σ2
pR∗
pand (R∗
p)ss0=ρp(`∗
s,`∗
s0;θp). The joint distribution of βpand β∗
pis
again multivariate normal:
βp
β∗
p
σ2
p,θp∼N
0, σ2
p
Rpe
R∗
p
e
R∗0
pR∗
p
,(2)
7
where e
R∗
pis an S×S∗
pmatrix with the (s, s0)th element being ρp(`s,`∗
s0;θp). Exploiting this
relationship, the Gaussian predictive process simply replaces βpwith e
βp:= E(βp|β∗
p;θp) =
e
R∗
p(R∗
p)−1β∗
p. When S∗
pis not large, (R∗
p)−1can be quickly computed. For more on GPPs,
see Banerjee et al. (2008).
Fully specifying a GPP requires selecting knot locations. Banerjee et al. (2008) discuss
several methods of knot selection, including placing them on a regular grid, selecting them
at random from the observation locations, and methods which place more knots in areas
with more observations. Finley et al. (2009) suggest choosing knot locations to minimize the
conditional variance at observed data locations; Guhaniyogi et al. (2011) propose an adaptive
knot selection strategy where the knot locations are treated as a point process. Following
Eidsvik et al. (2012), our knots are chosen via K-means clustering with S∗
pclusters; i.e.,
using K-means clustering we partition the Scounties into S∗
pclusters based on the `s. The
knot locations are subsequently taken to be the centroids of the S∗
pclusters. For further
details on K-means clustering see Hartigan and Wong (1979).
We account for the spatio-temporal dependence that exists in the data by allowing a
spatial GMRF to evolve over time. One way of doing so is presented in Waller et al. (1997),
who allow the variance of the CAR model to depend on time. However, we use a first-order
vector autoregression with errors following a GMRF; i.e.,
ξt=ζξt−1+φt(3)
where ξt= (ξ1t, ...ξSt )0,ζ∈(−1,1) is a temporal correlation parameter, and we assume
ξ0=0without loss of generality. We take φtto be independent and identically distributed
as a proper intrinsically autoregressive model (Besag and Kooperberg, 1995); i.e., φt∼
N (0, τ 2(D−ωW)−1), where τ2>0 and ω∈(0,1) is a so-called ‘propriety parameter’ that
8
ensures the precision matrix is non-singular (Banerjee et al., 2015). The neighborhood matrix
W∈RS×Sis such that (W)ss0is equal to 1 if and only if location sis adjacent to location
s0, s 6=s0, 0 otherwise, and D= diag PS
j=1(W)sj , s = 1, . . . , S. To avoid confounding
with the intercept, we impose the standard sum-to-zero constraint (i.e., PT
t=1 PS
s=1 ξst = 0).
We complete the proposed model by specifying prior distributions on the regression coef-
ficients and the variance and correlation parameters. In the absence of strong prior informa-
tion, the hyperparameters are chosen so that the prior distributions are vague. A Gaussian
prior is taken on the global regression coefficients and inverse Gamma (IG) priors on the
variance components for conditional conjugacy. Likewise, a truncated Gaussian prior whose
support is confined to (−1,1) is specified for ζ, again for conditional conjugacy. We take a
Beta(αω, υω) prior on ωand concentrate it close to one, since previous empirical work has
shown that ω≈1 is necessary to induce noticeable spatial association (Banerjee et al., 2015).
These specifications lead to the following hierarchy:
Yst|nst , νst
indep.
∼Binomial (nst, pst =g(νst )) , s = 1, . . . , S;t= 1, . . . , T ;
β∗
p|σ2
p,θp
indep.
∼N(0, σ2
pR∗
p(θp)), p = 1, . . . , P ;
σ2
p
indep.
∼IG(ασ2
p, υσ2
p), p = 1, . . . , P ;
θp
i.i.d.
∼π(θp), p = 1, . . . , P ;
δ∼N(0, σ2
δI), σ2
δ>0;
ξt|ξt−1, τ 2, ω, ζ ∼Nζξt−1, τ2(D−ωW)−1, t = 1, . . . , T ;
τ2∼IG(ατ2, υτ2), ατ2, υτ2>0;
ω∼Beta(αω, υω), αω, υω>0;
ζ∼Truncated Normal(0, σ2
ζ,−1,1), σ2
ζ>0,
(4)
where νst =Z0
stδ+X0
st e
β(`s) + ξst,e
β(`s) = ( e
β1(`s), ..., e
βP(`s))0, and each coefficient in e
β(`s)
9
is obtained from the Ppredictive processes via e
βp=e
R∗
p(R∗
p)−1β∗
p, and ξ0=0. Appropriate
(identical) priors for θ1, . . . θPdepend on the selected correlation function in the GPP model.
3. Posterior Sampling
3.1 Data Augmentation
We assume conditional independence given the covariate effects and spatio-temporal effects
and observe that Ydepends on the regression coefficients and random effects only through
ν= (ν11, . . . , ν1T, ν21, . . . , νST )0. Hence, the likelihood can be expressed as
f(Y|ν)∝
T
Y
t=1
S
Y
s=1
g(νst)Yst {1−g(νst )}nst−Yst .(5)
To develop a posterior sampling algorithm, we take g(·) to be the logistic link function.
Other link functions are possible and can be implemented following Albert and Chib (1993)
or Gamerman (1997). Metropolis-Hastings steps (Metropolis et al., 1953; Hastings, 1970) can
be used either componentwise or in blocks, but such samplers can be difficult to tune in high
dimensions. To facilitate the derivation of a Gibbs sampler for the regression coefficients and
spatio-temporal random effects, we use a data augmentation scheme that leads to sampling
these parameters from Gaussian full conditional distributions.
Our data augmentation approach follows that of Polson et al. (2013). This scheme relies
on the fact that exp(ν)a{1 + exp(ν)}−b= 2−bexp(κν)R∞
0exp(−ψν2/2) p(ψ|b, 0)dψ, where
a∈R,b∈R+,κ=a−b/2, and p(· | b, 0) is the probability density function of a P´olya-
Gamma random variable with parameters band 0. Thus, under the logistic link, (5) can be
10
written as
f(Y|ν)∝
T
Y
t=1
S
Y
s=1
exp(κstνst )Z∞
0
exp(−ψstν2
st/2)p(ψst |nst,0)dψst
∝
T
Y
t=1
S
Y
s=1 Z∞
0
f(Yst, ψst |νst )dψst,
where κst =Yst −nst/2. By introducing the ψst as latent random variables to be sampled
via MCMC, we obtain
f(Y,ψ|ν)∝exp(−ν0Dψν/2 + κ0ν)
T
Y
t=1
S
Y
s=1
p(ψst|nst ,0),
where ψ= (ψ11, . . . , ψ1T, ψ21, . . . , ψST )0,Dψ= diag(ψ), and κ= (κ11, . . . , κ1T, κ21 , . . . , κST )0.
We see, then, that data augmentation yields a Gaussian density in νup to a normalizing
constant. Consequently, the full conditional distributions for most of the parameters are of
a known form and are easy to sample from; i.e., the full conditional distribution of ψst is
P´olya-Gamma, β∗
pis multivariate normal, δis multivariate normal, σ2
pis inverse gamma,
τ2is inverse gamma, and ζis truncated normal. The Supplementary Material provides the
complete set of full conditional distributions.
Given the data augmentation, a posterior sampling algorithm involving Gibbs steps for
the aforementioned parameters can be constructed in the usual manner. Metropolis-Hastings
steps are used to sample θpand ω. Under the considered data augmentation scheme, the
full conditional distribution of ξtis multivariate normal. However, sampling this parameter
is computationally expensive due to its high dimension. To facilitate more efficient repeated
updates of ξt, we employ chromatic sampling, which is described next.
11
3.2 Chromatic Sampling
The full conditional distributions of ξt, t = 1, . . . , T , are each multivariate normal. Block
sampling from these full conditionals is reasonable when the number of spatial units is
relatively small (Furrer and Sain, 2010), but becomes unwieldy as Sincreases due to the as-
sociated Cholesky factorizations and memory requirements. As an alternative, we propose to
use so-called chromatic sampling (Gonzalez et al., 2011; Brown et al., 2017b). The chromatic
sampler exploits the Markov structure of the CAR model to parallelize single-site updates,
thereby avoiding time consuming matrix calculations such as Cholesky factorizations. Under
chromatic sampling, the computing time scales approximately linearly in Sand T.
Let {A1,...,AK}be a partition in which Akis an index set identifying a collection of
spatial regions that are not adjacent to one another; i.e., for all s, s0∈ Ak,Wss0= 0. A
greedy algorithm for finding such a partition on an irregular lattice is given by Brown et al.
(2017b). For a vector a= (a1, . . . , aS)0and an index set C, define a(C) := (as:s∈ C)0.
The Markov property of the CAR model implies that the elements of ξt(Ak), given ξt(Ac
k),
are conditionally independent. Therefore, by conditioning on ξt(Ac
k), the elements of ξt(Ak)
can be sampled from their univariate full conditional distributions in parallel (or through
‘vectorized’ calculations). This approach can handle an extremely large number of spatial
regions (e.g., S > 100,000) when they are sparsely connected. For further details, see Brown
et al. (2017b), who compare block sampling to chromatic sampling for GMRFs.
3.3 A Note on Missing Data
In our application, data are not reported at all location-time pairs. To capture this effect, let
Rbe the set of ordered pairs (s, t) for which data are available. The augmented likelihood
12
is then
f(Y(R),ψ(R)|ν(R)) ∝exp(−ν(R)0Dψ(R)ν(R)/2 + κ(R)0ν(R)) Y
(s,t)∈R
p(ψst|nst ,0),
where ν(R) = Z(R)δ+X(R)e
b+I(R)ξand we use the convention that A(R) is the
matrix formed by retaining the rows of Awhose indices are in R. Here Z= (Z0
1· · · Z0
S)0∈
RST ×(Q+1) with Zs= (Zs1· · · ZsT )0. Similarly, X=LS
s=1 Xs∈RST ×SP with Xs=
(Xs1,...,XsT )0,Iis the identity matrix, and e
b= (e
β0(`1),...,e
β0(`S))0∈RSP . Since ξ∈
RST is the vector of spatial random effects over all locations within the study region for all
time points, we obtain a well-defined full conditional distribution for ξ, provided that the
prior on ξis proper. This representation of the joint density allows the model to be extended
to the entire study region by imputing the missing effects via posterior realizations.
4. A Simulation Study
In this section, we study how well the proposed method estimates model coefficients and
how GPP knot selection influences results via simulation. Data are generated on a regularly
spaced 13×13 grid over 60 time points and drawing Yst|nst, pst
indep.
∼Binomial(nst, pst ), where
g−1(pst) = δ0+e
β1(`s)t/60 + ξst, s = 1,...,132;t= 1,...,60,
and g(·) is the logistic link. The test counts nst are randomly sampled from a discrete uniform
distribution ranging from 100 to 200. The random effects ξst are generated from the CAR
model defined in Section 2, with ζ= 0.9, τ2= 0.005, ω= 0.9, and the neighborhood matrix
Wset so that two areas are neighbors if and only if they share a common edge or corner.
The true intercept is δ0=−1 and the surface e
β1(·) at each study location is generated from
13
a GPP model. In particular, a realization of the parent process is first simulated on a 5 ×5
grid of equally spaced knots. The parent process has µ1(`∗
s)≡1 and ρ(`∗
s,`∗
s0;θ1) = θd2
ss0
1,
where dss0is the Euclidean distance between `∗
sand `∗
s0,θ1= 0.6 and σ2
1= 1.5. The resulting
true surface e
β1(·) is depicted in Figure 2. Using this surface, 500 independent data sets are
generated from the assumed data generating model.
We fit our model to each of the 500 data sets using three separate knot set configurations.
The first configuration uses the same knots as those used to generate the true surface,
representing an ideal situation. The other two configurations take 4 ×4 and 7 ×7 grids
of equally spaced knots. For the model (4) priors, we take ασ2
1=υσ2
1=ατ2=υτ2= 2,
σ2
δ= 1000, αω= 900, υω= 100, and σ2
ζ= 10. In the GPP, the correlation function is taken
to be ρ(`s,`s0;θ1) = θd2
s,s0
1,the same as the true GPP, and we specify a Uniform(0,1) prior
on θ1. For each data set, we retain 5,000 MCMC iterates after a burn-in of 5,000 samples.
Convergence of the chains were assessed via trace plots.
Figure 3 displays a summary of the simulation results for the temporal trend parameter
e
β1(·). This summary includes a spatial depiction of the arithmetic average of the 500 point
estimates, as well as empirical bias and mean squared error, where for each simulated data
set a point estimate of e
β1(·) was obtained as the mean of the 5,000 retained MCMC iterates.
The results suggest that our model estimates the spatially varying regression coefficient well;
i.e., the mean estimates show little bias. The variability of the estimators tends to increase
near the region’s edges. This boundary effect is expected and is common to non-parametric
smoothers. Further, little difference between the estimates obtained under the three different
knot configurations is seen, demonstrating that the methods can recover the true coefficient
surface across the entire study region (assuming the model is correct up to choice of knots).
14
5. Lyme Analysis
5.1 Background
Our data consist of 16,571,562 test results from domestic dogs living throughout the con-
terminous United States from January 2012 - December 2016. The data were provided by
IDEXX Laboratories, Inc. to the Companion Animal Parasite Council (CAPC), who made
them available online at https://www.capcvet.org. The data are aggregated by month
and county, resulting in 69,876 county-month pairs reporting data.
In general, the spatial distribution of a vector-borne disease is strongly influenced by the
environment and the vector’s hosts, leading to correlated data (Legendre, 1993). Indeed, a
strong spatial correlation is seen in these data, as indicated by Figure 1 and a Moran’s I
statistic of 0.378 (p-value ≈0). Such data are also positively temporally correlated. Figure
4 displays raw county-level seroprevalence estimates over all months in the respective years
of 2012 and 2016. These figures suggest where a significant increase in seroprevalence is
expected and include western Pennsylvania, Virginia, West Virginia, Minnesota, and Iowa.
5.2 Model Building and Seasonality
Given the seasonality of Ixodes spp. activity, seasonality could manifest itself in B. burgdorferi
seroprevalence. To investigate this, the model
νst =δ0+e
β1(`s)I1(t) + e
β2(`s)I2(t) + e
β3(`s)I3(t) + e
β4(`s)t+ξst (6)
is posited, where tdenotes time (rescaled to the unit interval) and Ip(t) is a seasonal indicator,
for p= 1,2,3. Seasons are defined as follows: winter (December-February), spring (March-
May), summer (June-August), and fall (September-November), where winter regarded as the
15
baseline. This model allows for spatially varying seasonal effects and spatially varying trend
effects. While covariates such as county level temperatures and precipitations are available,
these are not used in the regression since our goal is to quantify any trends, not determine
specific drives of any trends.
The model in (6) was fit with the prior specifications and correlation function described
in Section 4. Two specifications for the GPP model are considered, using 50 and 100 knots,
respectively. In both cases, knot placement for all GPP models is done by K-means clustering
as described in Section 2.1. For sampling, 30,000 MCMC iterates are generated, with the
last 10,000 retained for inference. Convergence of the MCMC chains was assessed using
trace plots. We stress the computational scalability of this approach. This model consists of
four a priori independent coefficient surfaces, each with 3109 spatial locations, and 186,540
spatio-temporal random effects.
Two primary findings arise. First, there are no appreciable differences between the es-
timates using 50 and 100 knots. As both specifications are computationally feasible, all
subsequent analyses use 100 knots. Second, there is evidence of seasonality in the location
parameters, but these appear constant across space. Thus, the simpler model
νst =δ0+δ1I1(t) + δ2I2(t) + δ3I3(t) + e
β1(`s)t+ξst (7)
was fit. Credible intervals at level 95% indicate that the model can be futher reduced to
νst =δ0+δ1I∗
1(t) + e
β1(`s)t+ξst,(8)
where I∗
1(t) is a seasonal indicator that equals one if tis between March and November,
and zero otherwise. Approximate 95% credible intervals for δ0and δ1are [−3.95,−3.82] and
[−0.20,−0.10], respectively.
16
For further insight, the model in (8) is compared against the nonseasonal model
νst =δ0+e
β1(`s)t+ξst.(9)
For this model, an approximate 95% credible interval for δ0is [−4.08,−4.03]. Figure 5
displays the estimates of e
β1(·) from both models. Very similar large-scale patterns in the
estimated trends are seen; hence, while seasonality exists in the location parameters, its
effect on trends seems negligible.
The spatial trend e
β1(·) is a large-scale regional trend with low spatial frequency, and
is not intended to explain local (county-level) trends. While regional trends are useful for
estimating behavior in areas reporting little data, it may be desirable to separate local
heterogeneity in the trends and provide a county level assessment. Our proposed modeling
framework can accomplish this. Specifically, let υ(g)
s, for posterior sample realization g=
1, ..., G, be the slope estimate obtained at county sby fitting a simple linear regression to
{(t, ν(g)
st ) : t= 1, ..., T }. Then υ(g)
scan be regarded as a Monte Carlo realization of the
county level trend. Using the υ(g)
sas a random sample, point estimation and inference for
county level trends proceeds in the usual way.
5.3 Results
Figure 5 displays the estimated posterior mean of the regional trend e
β1. The regional rate of
change in B. burgdorferi seroprevalence between January 2012 and December 2016 is positive
in all states that are currently recognized as having high human Lyme disease incidence
(Centers for Disease Control and Prevention, 2017), including portions of the Northeast
and the Upper Midwest. The regional rate of increase varies spatially, with high incidence
regions generally exhibiting the greatest changes. These regions include Maine, south to
17
West Virginia and Virginia, and northern parts of Minnesota and Wisconsin.
Figure 6 displays estimated posterior means of the county-level trend υs. Significantly
increasing local trends are seen in much of the Northeast, extending southwards through
West Virginia and Virginia and into North Carolina and Tennessee. This conclusion is not
surprising as this region entails localities where Lyme disease has been reported in increasing
incidence. Also seen are increasing local trends in parts of northwestern Minnesota, north-
ern Wisconsin, and southeastern Iowa. In the Great Lakes region, increasing trends are
observed in eastern Ohio, Indiana, and western Michigan. Of note, in much of eastern New
England which is the region where Lyme first emerged in people, the prevalence appears to
be remaining stable, albiet high. Figure 7 graphically depicts counties where local trends
are significantly positive, using approximate 95% equal-tailed credible intervals to assess
significance. This graphic further supports the above statements.
6. Discussion
This paper develops a computationally feasible binomial regression model for a large spatio-
temporal data set that can identify localized trends in canine seroprevalence. Our novel
approach combines several recent advances in large-scale spatial modeling and MCMC sam-
pling. The end product is a flexible, scalable methodology for modern spatio-temporal data.
Our approach was used to identify regions of the U.S. experiencing increasing canine risk
for B. burgdorferi infection. Since human and canine risk are similar, these regions are likely
experiencing increasing human exposure as well. And while human Lyme disease data may be
private and in many regions scarce due to lack of testing, our canine seroprevalence data con-
sist of over 16 million spatio-temporally referenced test results. The size of the domain (3109
spatial locations and 60 time points) creates computational challenges. While monthly and
18
county-level aggregation reduces the size of the response vector from 16,581,562 test results
to 69,876 county-month pairs, a binomial response in an MCMC context typically requires
sampling via Metropolis-Hastings steps, which can be difficult to tune in extremely high
dimensions (over 180,000, in our case). Under the logistic link, a recently proposed Poly´a-
Gamma data augmentation was used to facilitate direct Gibbs sampling on full conditional
distributions. Gaussian predictive processes were used to model smoothly varying, high-
dimensional coefficients through a low-dimensional representation. Local spatio-temporal
heterogeneity was modeled by random effects following a time-varying Gaussian CAR distri-
bution. Chromatic sampling was used on GMRFs to construct an efficient MCMC algorithm.
The motivation for this study is the rise in reported Lyme disease cases in the United
States (Adams, 2017) and, in particular, rising incidence in states not traditionally considered
to be endemic, such as West Virginia. Our results suggest that 1) canine seroprevalence is
rising in conjunction with reports of human cases (Kugeler et al., 2015; Hendricks and Mark-
Carew, 2017; Centers for Disease Control and Prevention, 2017), 2) rates are increasing
most in areas where the pathogen has recently encroached, and 3) seroprevalence in dogs is
rising outside of the states considered to be high incidence for humans (Centers for Disease
Control and Prevention, 2017) (suggesting that risk may be increasing for humans in those
areas). Several studies have recognized increasing risk in low incidence areas, including
human disease incidence, tick density, and presence of the pathogen. These areas include
Illinois (Herrmann et al., 2014), Iowa (Lingren et al., 2005), North Dakota (Russart et al.,
2014), Ohio (Wang et al., 2014), and Michigan (Lantos et al., 2017). We also observe
significant increases in canine seroprevalence in several states that have not yet reported
significant human incidence. Given the proximity of these areas to recognized high-incidence
states, it is reasonable to propose that canine seroprevalence is more sensitive to changes in
risk of exposure and thus may be used as an early warning system for changes in human
19
risk.
Examining local trends, as opposed to regional effects, shows that some adjacent counties
are exhibiting trends in opposite directions. To fully understand this heterogeneity, further
ecological analyses are needed. Possible factors to consider include the presence of urban
centers, degree of forestation or other habitat factors, tick populations, reservoir presence
and densities, vaccination, and preventative medication use in dogs. The latter are likely
driven by socioeconomic factors whereas other factors are related to climate or changing
habitats. Areas with significantly positive trends include the Appalachian mountains from
upstate New York to North Carolina, the Upper Midwest, and Iowa. The West Virginia,
western Pennsylvania, and eastern Ohio regions can be viewed as a leading edge of rising
seroprevalence in Lyme’s westward expansion. This is supported by evidence in increased
reports of ticks in these regions (Eisen et al., 2016).
Our approach makes several simplifying assumptions. We treat the link function in the
GLM as known, and it might be poorly specified. As this can induce bias in the estimates
of the covariate effects (Neuhaus, 1999), relaxing this assumption could be fruitful. We also
assumed that the spatially varying coefficients follow independent and identically distributed
Gaussian processes. A more flexible approach would allow these coefficients to be correlated
through a multivariate GP (Ver Hoef and Barry, 1998), but these are more difficult to use
and challenges remain in their development (e.g., Fricker et al., 2013). The observed sero-
prevalences suggest that smoothness of the random effects may change by region, suggesting
that a heteroskedastic GP might be more appropriate (Binois et al., 2016). Further, GMRFs
are known to oversmooth salient features (Smith and Fahrmeir, 2007) and do not directly
correspond to any valid covariance function in a GP. However, approximating GPs with
GMRFs via stochastic PDEs to maintain computational feasibility (Lindgren et al., 2011)
could be promising for our application. In addition to statistical challenges, future applica-
20
tions of our model include human Lyme disease data and heartworm disease, ehrlichiosis,
and anaplasmosis in canines. The ecological, entomological and environmental implications
of the canine Lyme seroprevalence analysis presented in this work is the subject of ongoing
research.
7. Supplementary Material
The supplementary material for this article includes the full conditional distributions required
to develop the proposed posterior sampling procedure.
Acknowledgements:
The authors thank IDEXX Laboratories, Inc. for their data contribution. This mate-
rial is based upon work partially supported by the National Science Foundation Grants
DMS-1127914, DMS-1407480, CMMI-1563435, and EEC-1744497, and National Institutes
of Health Grant R01 AI121351. JRG is supported by The Boehringer Ingelheim Vetmedica-
CAPC Infectious Disease Postdoctoral Fellowship.
References
Adams, D. A. (2017). Summary of notifiable infectious diseases and conditions- United
States, 2015. Morbidity and Mortality Weekly Report, 64.
Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response
data. Journal of the American Statistical Association, 88(422):669–679.
Andrade-Pacheco, R., Mubangizi, M., Quinn, J., and Lawrence, N. (2015). Monitoring
21
short term changes in infectious disease in Uganda with Gaussian processes. In Douzal-
Chouakria, A., Vilar, J. A., and Marteau, P.-F., editors, Advanced Analysis and Learning
on Temporal Data, pages 95–110. Springer.
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015). Hierarchical Modeling and Analysis
for Spatial Data. Chapman and Hall/CRC, Boca Raton, 2nd edition.
Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussian predictive
process models for large spatial datasets. Journal of the Royal Statistical Society: Series
B (Methodological), 70(4):825–48.
Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Cafeo, J. A., Cavendish, J., Lin, C.-
H., and Tu, J. (2007). A framework for validation of computer models. Technometrics,
49(2):138–154.
Berger, J. O., de Oliveira, V., and Sanso, B. (2001). Objective Bayesian analysis of spatially
correlated data. Journal of the American Statistical Association, 96(456):1361–1374.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal
of the Royal Statistical Society: Series B (Methodological), 36(2):192–236.
Besag, J. and Kooperberg, C. (1995). On conditional and intrinsic autoregressions.
Biometrika, 82:733–746.
Binois, M., Gramacy, R. B., and Ludkovski, M. (2016). Practical heteroskedastic Gaussian
process modeling for large simulation experiments. Arxiv preprint 1611:05902.
Brezger, A., Fahrmeir, L., and Hennerfeind, A. (2007). Adaptive Gaussian Markov random
fields with applications in human brain mapping. Journal of the Royal Statistical Society:
Series C (Applied Statistics), 56(3):327–345.
22
Brown, D. A., Datta, G. S., and Lazar, N. A. (2017a). A Bayesian generalized CAR model
for correlated signal detection. Statistica Sinica, 27:1125–1153.
Brown, D. A., McMahan, C. S., and Watson, S. C. (2017b). Sampling strategies for fast
updating of Gaussian Markov random fields. Arxiv preprint 1702:05518.
Centers for Disease Control and Prevention (2017). Data and statistics: Lyme disease.
Cressie, N. (1993). Statistics for Spatial Data. Wiley Press.
Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial datasets.
Journal of the Royal Statistical Society: Series B (Methodological), 70:209–226.
Cressie, N. and Wikle, C. K. (2011). Statistics for Spatio-temporal Data. Wiley Press.
Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016). Hierarchical nearest-
neighbor Gaussian process models for large geostatistical datasets. Journal of the American
Statistical Association, 111:800–812.
Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). Model-baesd geostatistics. Journal of
the Royal Statistical Society: Series C (Applied Statistics), 47(3):299–350.
Eidsvik, J., Finley, A. O., Banerjee, S., and Rue, H. (2012). Approximate Bayesian inference
for large spatial datasets using predictive process models. Computational Statistics and
Data Analysis, 56(6):1362 – 1380.
Eisen, R. J., Eisen, L., and Beard, C. B. (2016). County-scale distribution of Ixodes scapularis
and Ixodes pacificus (Acari : Ixodidae) in the continental United States. Journal of Medical
Entomology, 53(January):349–386.
23
Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009). Improving the perfor-
mance of predictive process modeling for large datasets. Computational Statistics and
Data Analysis, 53(8):2873 – 2884.
Fricker, T. E., Oakley, J. E., and Urban, N. M. (2013). Multivariate Gaussian process
emulators with nonseparable covariance structures. Technometrics, 55(1):47–56.
Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering for interpolation of
large spatial datasets. Journal of Computational and Graphical Statistics, 15:502–523.
Furrer, R. and Sain, S. R. (2010). spam: A sparse matrix R package with emphasis on MCMC
methods for Gaussian Markov random fields. Journal of Statistical Software, 36(10):1–25.
Gamerman, D. (1997). Sampling from the posterior distribution in generalized linear mixed
models. Statistics and Computing, 7(1):57–68.
Gelfand, A. E., Kim, H. J., Sirmans, C. F., and Banerjee, S. (2003). Spatial modeling with
spatially varying coefficient processes. Journal of the American Statistical Association,
98(462):387–396.
Gelfand, A. E. and Schliep, E. M. (2016). Spatial statistics and Gaussian processes: A
beautiful marriage. Spatial Statistics, 18(Part A):86 – 104.
Gonzalez, J., Low, Y., Gretton, A., and Guestrin, C. (2011). Parallel Gibbs sampling:
From colored fields to thin junction trees. In Gordon, G., Dunson, D., and Dud´ık, M.,
editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence
and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 324–332,
Fort Lauderdale, FL, USA. PMLR.
Gramacy, R. and Apley, D. (2015). Local Gaussian process approximation for large computer
experiments. Journal of Computational and Graphical Statistics, 24:561–578.
24
Guhaniyogi, R., Finley, A. O., Banerjee, S., and Gelfand, A. E. (2011). Adaptive Gaussian
predictive process models for large spatial datasets. Environmetrics, 22(8):997–1007.
Hartigan, J. A. and Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108.
Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their appli-
cation. Biometrika, 57:97–109.
Heaton, M. J., Christensen, W. F., and Terres, M. A. (2017a). Nonstationary Gaussian
process models using spatial hierarchical clustering from finite differences. Technometrics,
59:93–101.
Heaton, M. J., Datta, A., Finley, A., Furrer, R., Guhaniyogi, R., Gerber, R., Gramacy, R.,
et al. (2017b). Methods for analyzing large spatial data: A review and comparision. Arxiv
preprint 1710.05013.
Hendricks, B. and Mark-Carew, M. (2017). Using exploratory data analysis to identify
and predict patterns of human Lyme disease case clustering within a multistate region,
2010–2014. Spatial and Spatio-temporal Epidemiology, 20:35–43.
Herrmann, J. A., Dahm, N. M., Ruiz, M. O., and Brown, W. M. (2014). Temporal and spatial
distribution of tick-borne disease cases among humans and canines in Illinois (2000–2009).
Environmental Health Insights, 8(Supplement 2):15.
Johnson, L. R., Gramacy, R. B., Cohen, J., Mordecai, E. A., Murdock, C., Rohr, J., Ryan,
S. J., Stewart-Ibarra, A. M., and Weikel, D. (2017). Phenomenological forecasting of
disease incidence using heteroskedastic Gaussian processes: A dengue case study. Annals
of Applied Statitics, In Press.
25
Jona-Lasinio, G., Gelfand, A., and Jona-Lasinio, M. (2012). Spatial analysis for wave direc-
tion data using wrapped Gassian processes. Annals of Applied Statistics, 6(4):1478–1498.
Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. Journal
of the American Statistical Association, 112:201–214.
Kugeler, K. J., Farley, G. M., Forrester, J. D., and Mead, P. S. (2015). Geographic distribu-
tion and expansion of human Lyme disease, United States. Emerging Infectious Diseases,
21(8):1455–1457.
Lantos, P. M., Tsao, J., Nigrovic, L. E., Auwaerter, P. G., Fowler, V. G., Ruffin, F., Foster,
E., and Hickling, G. (2017). Geographic expansion of Lyme disease in Michigan, 2000–
2014. In Open Forum Infectious Diseases, volume 4. Oxford University Press.
Lazar, N. A. (2008). The Statistical Analysis of Functional MRI Data. Springer Science +
Business Media, LLC, New York.
Legendre, P. (1993). Spatial autocorrelation: Trouble or new paradigm? Ecology,
74(6):1659–1673.
Levy, S. and Magnarelli, L. (1992). Relationship between development of antibodies to
Borrelia burgdorferi in dogs and the subsequent development of limb/joint borreliosis.
Journal of the American Veterinary Medical Association, 200(3):344–347.
Lindgren, F., Rue, H., and Lindstr¨om, J. (2011). An explicit link between Gaussian fields
and Gaussian Markov random fields: The stochastic partial differential equation approach.
Journal of the Royal Statistical Society: Series B (Methodological), 73(4):423–498.
Lingren, M., Rowley, W., Thompson, C., and Gilchrist, M. (2005). Geographic distribution
of ticks (Acari: Ixodidae) in Iowa with emphasis on Ixodes scapularis and their infection
with Borrelia burgdorferi.Vector-Borne and Zoonotic Diseases, 5(3):219–226.
26
Little, S. E., Heise, S. R., Blagburn, B. L., Callister, S. M., and Mead, P. S. (2010). Lyme
borreliosis in dogs and humans in the USA. Trends in Parasitology, 26(4):213–218.
Mead, P., Goel, R., and Kugeler, K. (2011). Canine serology as adjunct to human Lyme
disease surveillance. Emerging Infectious Diseases, 17(9):1710.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953).
Equation of state calculations by fast computing machines. Journal of Chemical Physics,
21:1087–1091.
Morales-´
Alvarez, P., P´erez-Suay, A., Molina, R., and Camps-Valls, G. (2017). Remote sensing
image classification with large-scale Gaussian processes. IEEE Transactions on Geoscience
and Remote Sensing, PP(99):1–12.
MuCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman and
Hall/CRC, London, 2nd edition.
Neal, R. M. (1998). Regression and classification using Gaussian process priors. In Bernardo,
J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., editors, Bayesian Statistics 6.
Oxford University Press, New York.
Nelder, M. P., Russell, C. B., Sheehan, N. J., Sander, B., Moore, S., Li, Y., Johnson, S.,
Patel, S. N., and Sider, D. (2016). Human pathogens associated with the blacklegged tick
Ixodes scapularis: A systematic review. Parasites and Vectors, 9(1):265.
Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary
regression. Biometrika, 86(4):843–855.
Nychka, D., Bandyopadhyay, S., Hammerling, D., Lindgren, F., and Sain, S. (2015). A
multiresolution Gaussian process model for the analysis of large spatial datasets. Journal
of Computational and Graphical Statistics, 24:579–599.
27
O’Hagan, A. (1978). Curve fitting and optimal design for prediction. Journal of the Royal
Statistical Society: Series B (Methodological), 40:1–42.
Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian inference for logistic models
using P´olya-Gamma latent variables. Journal of the American Statistical Association,
108(504):1339–1349.
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning.
The MIT Press.
Rue, H. and Held, L. (2005). Gaussian Markov random fields: theory and applications.
Chapman and Hall/CRC.
Rue, H. and Tjemland, H. (2002). Fitting Gaussian Markov random fields to Gaussian fields.
Scandinavian Journal of Statitsics, 29:31–49.
Russart, N. M., Dougherty, M. W., and Vaughan, J. A. (2014). Survey of ticks (Acari:
Ixodidae) and tick-borne pathogens in North Dakota. Journal of Medical Entomology,
51(5):1087–1090.
Sang, H., Jun, M., and Huang, J. Z. (2011). Covariance approximation for large multivariate
spatial datasets with an application to multiple climate model errors. Annals of Applied
Statistics, 5(4):2519–2548.
Santner, T. J., Williams, B. J., and Notz, W. I. (2003). The Design and Analysis of Computer
Experiments. Springer-Verlag, New York.
Senanayake, R., O’Callaghan, S., and Ramos, F. (2016). Predicting spatio-temporal propa-
gation of seasonal influenza using variational Gaussian process regression. In Proceedings
of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pages 3901–3907.
AAAI Press.
28
Smith, M. and Fahrmeir, L. (2007). Spatial Bayesian variable selection with application to
functional magnetic resonance imaging. Journal of the American Statistical Association,
102(478):417–431.
Song, H.-R., Fuentes, M., and Ghosh, S. (2008). A comparative study of Gaussian geo-
statistical models and Gaussian Markov random fields. Journal of Multivariate Analysis,
99:1681–1697.
Ver Hoef, J. M. and Barry, R. P. (1998). Constructing and fitting models for cokriging and
multivariable spatial prediction. Journal of Statistical Planning and Inference, 69:275–294.
Wagner, B., Freer, H., Rollins, A., Garcia-Tapia, D., Erb, H. N., Earnhart, C., Marconi, R.,
and Meeus, P. (2012). Antibodies to Borrelia burgdorferi OspA, OspC, OspF, and C6
antigens as markers for early and late infection in dogs. Clinical and Vaccine Immunology,
19(4):527–535.
Waller, L. A., Carlin, B. P., Xia, H., and Gelfand, A. E. (1997). Hierarchical spatio-temporal
mapping of disease rates. Journal of the American Statistical Association, 92(438):607–
617.
Wang, P., Glowacki, M. N., Hoet, A. E., Needham, G. R., Smith, K. A., Gary, R. E., and
Li, X. (2014). Emergence of Ixodes scapularis and Borrelia burgdorferi, the Lyme disease
vector and agent, in Ohio. Frontiers in Cellular and Infection Microbiology, 4.
Yue, Y. and Speckman, P. L. (2010). Nonstationary spatial Gaussian Markov random fields.
Journal of Computational and Graphical Statistics, 19(1):96–116.
Zhang, H. and El-Shaarawi, A. (2009). On spatial skew-Gaussian processes and applications.
Environmetrics, 21(1):33–47.
29
Figure 1: Observed seroprevalence of B. burgdorferi, aggregated over January 2012 to De-
cember 2016. White counties are those that did not report any test results.
30
Figure 2: The true e
β1surface used to generate 500 independent data sets in the simulation
example.
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
31
Figure 3: Summary of the posterior estimates of e
β1obtained in the simulation example.
Presented results include the sample mean of the posterior estimates (top row), empirical
bias (middle row), and empirical mean squared error (bottom row). From left to right the
columns correspond to the use of a 4 ×4, 5 ×5, and 7 ×7 grid of knots.
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
-0.04
-0.02
0.00
0.02
0.04
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
-0.04
-0.02
0.00
0.02
0.04
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
-0.04
-0.02
0.00
0.02
0.04
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.000
0.002
0.004
0.006
0.008
0.010
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.000
0.002
0.004
0.006
0.008
0.010
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.000
0.002
0.004
0.006
0.008
0.010
32
Figure 4: Raw reported canine seroprevalences in 2012 (top) and 2016 (bottom). White
counties did not report any tests.
33
Figure 5: Estimate of the regional trend e
β1from the seasonal model (8) (top) and nonsea-
sonal model (9) (bottom) used to analyze the seroprevalence data.
34
Figure 6: County-level trends. The top graphic displays the posterior mean estimate of υs
from model (8), and the bottom from model (9).
-3.88 - -1.69
-1.69 - -1.24
-1.24 - -0.85
-0.85 - -0.55
-0.55 - -0.30
-0.30 - -0.05
-0.05 - 0.27
0.27 - 0.68
0.68 - 1.26
1.26 - 2.77
-3.90 - -1.69
-1.69 - -1.24
-1.24 - -0.85
-0.85 - -0.55
-0.55 - -0.30
-0.30 - -0.05
-0.05 - 0.27
0.27 - 0.68
0.68 - 1.26
1.26 - 2.86
7
35
Figure 7: Counties where υswas significantly positive at the 95 % confidence level. The top
graphic corresponds to model (8), and bottom to model (9).
1RW6LJQLILFDQWO\,QFUHDVLQJ6LJQLILFDQWO\,QFUHDVLQJ1RW6LJQLILFDQWO\,QFUHDVLQJ6LJQLILFDQWO\,QFUHDVLQJ
36