ArticlePDF Available

Dynamic Population Models with Temporal Preferential Sampling to Infer Phenology

Authors:

Abstract

To study population dynamics, ecologists and wildlife biologists typically use relative abundance data, which may be subject to temporal preferential sampling. Temporal preferential sampling occurs when the times at which observations are made and the latent process of interest are conditionally dependent. To account for preferential sampling, we specify a Bayesian hierarchical abundance model that considers the dependence between observation times and the ecological process of interest. The proposed model improves relative abundance estimates during periods of infrequent observation and accounts for temporal preferential sampling in discrete time. Additionally, our model facilitates posterior inference for population growth rates and mechanistic phenometrics. We apply our model to analyze both simulated data and mosquito count data collected by the National Ecological Observatory Network. In the second case study, we characterize the population growth rate and relative abundance of several mosquito species in the Aedes genus. Supplementary materials accompanying this paper appear on-line.
Dynamic Population Models with Temporal Preferential
Sampling to Infer Phenology
Michael R. Schwob, Mevin B. Hooten, Travis McDevitt-Galles
Abstract
To study population dynamics, ecologists and wildlife biologists use relative abundance data,
which are often subject to temporal preferential sampling. Temporal preferential sampling
occurs when sampling effort varies across time. To account for preferential sampling, we
specify a Bayesian hierarchical abundance model that considers the dependence between
observation times and the ecological process of interest. The proposed model improves
abundance estimates during periods of infrequent observation and accounts for temporal
preferential sampling in discrete time. Additionally, our model facilitates posterior inference
for population growth rates and mechanistic phenometrics. We apply our model to analyze
both simulated data and mosquito count data collected by the National Ecological Observatory
Network. In the second case study, we characterize the population growth rate and abundance
of several mosquito species in the Aedes genus.
Key words: abundance, ignorability, mechanistic modeling, phenometrics, time series
1 Introduction
The timing of ecological events is a fundamental question in the field of phenology (Forrest
& Miller-Rushing 2010). Conventional approaches to studying phenology involve summary
1
arXiv:2212.05180v2 [stat.ME] 13 Dec 2022
statistics, or phenometrics, that capture the timing of cyclical biological events (Demidova
et al. 2021). Phenometrics for population dynamics may include the first or last date of
positive population growth (Anderson et al. 2013, Bewick et al. 2016). By contrast, ecologists
commonly use hierarchical temporal models to learn about how and why population size
changes over time (Potts & Elith 2006). We seek to make both phenological and ecological
inference on population dynamics using modern hierarchical Bayesian models that allow for
embedded mechanistic and observation processes. We specify the model in such a way to
extract relevant phenometrics and associated uncertainties as derived quantities based on our
defined mechanistic process (i.e., the latent population growth rate).
Ecologists working in temperate environments tend to design field studies that align with
the growing season (Pau et al. 2011). The resulting data sets are collected using preferential
sampling designs, which may induce nonignorable missingness (Zachmann et al. 2022). In the
presence of temporal preferential sampling, inference during periods of infrequent observation
will be biased (Monteiro et al. 2019). We adapt preferential sampling approaches for the
discrete temporal domain to improve posterior inference on the abundance process during
periods of infrequent observation.
Remedies for spatial preferential sampling have been more popular than those for temporal
preferential sampling in ecology (Watson 2021). However, phenological studies may be subject
to temporal preferential sampling, including frequent and short gaps (Zhang et al. 2009),
fewer and longer gaps (Roleˇcek et al. 2007), and asynchronous and inconsistent gaps in data
collection (Cole et al. 2012). Records of population dynamics have infrequent and long gaps
in observation because the data are collected seasonally. However, the existing remedies to
temporal preferential sampling focus solely on frequent and short gaps in observations (Karcher
et al. 2016, Monteiro et al. 2019, 2020). Furthermore, existing temporal preferential sampling
remedies are only applicable in the continuous-time domain, whereas most abundance studies
collect counts for discrete-time population dynamics. In studies that use hidden Markov
models, temporal preferential sampling is referred to as state-dependent missingness. Several
2
recent algorithms accommodate state-dependent missingness in observations of homogeneous
Markov chains (Speekenbrink & Visser 2021, Hoskovec et al. 2022); however, the existence of
seasonal variation in population dynamics precludes the use of these algorithms. Therefore,
existing solutions to temporal preferential sampling and state-dependent missingness are not
applicable to phenological or abundance data sets. We account for discrete-time temporal
preferential sampling in the form of a probit regression that is suitable for many phenological
and abundance data sets that exhibit seasonal variation.
We propose a hierarchical Bayesian abundance model in section 2 that characterizes
population dynamics using relative abundance data from irregular surveys. We specify
the process model to characterize the population growth rate and infer phenometrics, and
we modify the data model to account for temporal preferential sampling. In section 3, we
demonstrate the proposed model along with its non-preferential variant on simulated data and
a mosquito case study with data collected by the National Ecological Observatory Network
(NEON). The data collection associated with the NEON study involved temporal preferential
sampling. We seek to determine if posterior inference resulting from a preferential and a
non-preferential model differs under the NEON sampling protocol. We also show that the
bias varies when interpolating population dynamics during infrequently observed periods. In
section 4, we conclude with a discussion of mechanistic modeling and temporal preferential
sampling for phenology and population dynamics.
2 Methods
We let
T
denote the number of days throughout the study period and
J
denote the number of
species in the study. We also let
y
(
t
)
(
y1
(
t
)
, ..., yJ
(
t
))
0
and
λ
(
t
)
(
λ1
(
t
)
, ..., λJ
(
t
))
0
denote
the observed and true abundance of all species on day
t {
1
, ..., T }
and aim to recover
the latent abundance process
Λ
(
λ
(1)
, ..., λ
(
T
)) using temporal state-space modeling.
The specification of the observation distribution [
y
(
t
)
|λ
(
t
)] and the process distribution
3
[
λ
(
t
)
|λ
(
t
1)] are fundamental to obtaining interpretable inference in temporal state-space
models. We denote the data model as
y
(
t
) =
H
(
λ
(
t
);
Θd
;
(
t
)), where
H
is an observation
function that maps the latent process
λ
(
t
) to the data with noise process
(
t
). Similarly, we
can write the latent process evolution model as
λ
(
t
) =
M
(
λ
(
t
1);
Θe
;
η
(
t
)), where
M
is the
evolution operator and
η
(
t
) is a noise process (Wikle & Hooten 2010). In general, the noise
processes may be Gaussian or non-Gaussian. We assume a first-order Markov process for the
evolution model, which is common in ecological studies of population dynamics (Usher 1979).
2.1 The Process Model
We model abundance dynamically and focus on specifications involving mechanistic population
models. A common mechanistic approach to modeling population dynamics is to specify the
latent process evolution as
M(λ(t1); Θe;η(t)) = exp (Alog λ(t1) + Bx(t) + η(t)) ,(1)
where
Θe {A,B}
,
η
(
t
)
N
(
0, σ2I
), and
x
(
t
)
(
x1
(
t
)
, ..., xp
(
t
))
0
are species-independent
covariates of interest at time
t
(Ross et al. 2015). In the multivariate setting, we let
exp
(
·
)
and
log
(
·
) be element-wise operations over vectors. We let
Adiag
(
α1, ..., αJ
) and
I
be
the
J×J
identity matrix. Species
j
experiences density dependence when
αj<
1, positive
association with population density when
αj>
1, and density independence when
αj
= 1
(Dennis et al. 2006). We let
B
(
β1, ..., βJ
)
0
, where
βj
(
βj,1, ..., βj,p
) are the time-invariant
coefficients of covariates xfor species j. The process model in (1) can be written as
log λ(t)NAlog λ(t1) + Bx(t), σ2I,(2)
where the species-level parameters βjconvey how the abundance of species jresponds to a
change in environment and resources.
Most phenology studies focus on species with population dynamics that experience a high
4
degree of seasonality. We modify the process model in (2) for species with population dynamics
that are intrinsically tied to the environment and, therefore, seasonal. The more seasonality
influences population dynamics, the more the dynamic is centered on environmental trends.
We account for the seasonality of population dynamics by anchoring the log-abundance using
the environmental trend Bx(t) such that
log λ(t)Bx(t)NA(log λ(t1) Bx(t1)), σ2I,(3)
where
|αj|<
1 because the population is assumed to be density dependent (Royama 2012);
this provides a trend-stationary and autoregressive stochastic model for population dynamics
that are heavily influenced by the environment (Shumway & Stoffer 2000). Thus, our process
model for log-abundance can be expressed as
log λ(t)NBx(t)ABx(t1) + Alog λ(t1), σ2I,(4)
for
t
= 2
, ..., T
. When
t
= 1, we assume
log λ
(1)
N
(
µ1, σ2
1I
). Although trend-stationary
stochastic processes are commonly found in econometric studies, they are underutilized in
population dynamic studies for species with seasonal abundance processes. When applicable,
accounting for trend stationarity improves the stability and performance of MCMC algorithms
(McCulloch & Tsay 1994).
The Gompertz form of density dependence has performed well in various population
dynamic studies (Dennis et al. 2006, Knape & de Valpine 2012). Our process model implies
stochastic Gompertz population growth, which can be written as the heterogeneous Malthusian
growth function
λ
(
t
) = (
I
+
diag
(
g
(
t
)))
λ
(
t
1)
,
where
g
(
t
)
(
g1
(
t
)
, ..., gJ
(
t
))
0
. The species-
level per capita growth rate on day tcan be written as
gj(t) = exp βj(x(t)αjx(t1))λj(t1)αj1ej,t 1,(5)
5
where
j,t N
(0
, σ2
). We consider the posterior median of the log-normal distributed random
variable
ej,t
, which assumes an absolute loss function; we assume this loss function because
it is less sensitive to skewness resulting from exponentiating the log-abundance process.
Additionally, the posterior median is more efficient and robust than the posterior mean for
log-normally distributed random variables with
σ >
0
.
546, which is true in our case studies
(Zellner 1971, Rao & D’Cunha 2016). Thus, our derived phenometric is
ψmin t {1, ..., T }: exp βj(x(t)αjx(t1))λj(t1)αj11>0,(6)
which is the first day that the posterior median of
gj
(
t
) is positive. We use posterior densities
to quantify uncertainty for the derived phenometric
ψ
, which is not possible using conventional
phenological approaches. We present the derivation of gj(t) and its median in Appendix A.
2.2 The Data Model
For the observation function H, we specify
yj(t)Pois(λj(t)·ω(t)),(7)
where
ω
(
t
) captures heterogeneity in sampling effort. The conditional Poisson data model is
a common choice in temporal abundance modeling (Hooten & Hefley 2019). For invertebrate
trapping studies, we specify
ω
(
t
) =
w
(
t
)
·h
(
t
), where
w
(
t
) is the proportion of the trap
sampled and
h
(
t
) is the sampling duration of the trap standardized to a single day length in
hours. If sampling effort was constant across traps, we set ω(t) = 1,t.
To account for temporal preferential sampling, we introduce new notation and indices for
time. We let
N
denote the number of days within the study period and
n
denote the number
of observed days. Further, we let
ti
denote the
i
th day of the study and
˜
tk
denote the
k
th
6
observed day of the study. We define Tas the set of observed days, such that
T {˜
t1,˜
t2, ..., ˜
tn}⊆{t1, t2, ..., tN}.(8)
In the event that every day is observed, temporal preferential sampling is not present and
T={t1, t2, ..., tN}. We let τ(τ1, τ2, ..., τN)0be a binary vector of length Nsuch that
τi=
1, ti T
0, ti6∈ T
.(9)
Finally, we let
λ
(
ti
) denote the abundance processes for all species at time
ti
. The abundance
processes exist throughout the entire study regardless of which days were observed.
We express the joint distribution of the underlying process
Λ
, observation indicators
τ
,
and the observations of the underlying process Y(y(1), ..., y(n)) as
[Λ,τ,Y]=[Y|τ,Λ][τ|Λ][Λ],(10)
where “[
·|·
]” represents the conditional probability density or distribution function (Gelfand
& Smith 1990). We condition
τ
on
Λ
because the observation of days depends on the process
of interest. The observation time model is usually specified as a discretized continuous-time
inhomogeneous Poisson process, which is ideal for continuous time series with many short
gaps of missing data (Karcher et al. 2016, Monteiro et al. 2019). However, it is more common
that there are fewer, longer gaps of missing data in discrete-time population dynamic studies.
Therefore, we specify [
τ|Λ
] for discrete-time abundance counts that experience long, seasonal
gaps in data collection. We consider the following distribution of observed days conditioned
on the process of interest:
[τi|θ,λ(ti),˜
λ]Bern(p(ti)) (11)
7
with the probit link function
p(ti)=Φθ0+θ1
1
{λ(ti)01˜
λ},(12)
where Φ(
µ
) is the standard normal cumulative distribution function evaluated at
µ
and
θ
(
θ0, θ1
)
0
are probit regression coefficients. The indicator function
1
{·}
evaluates to 1 when
the number of observed mosquitoes exceeds the threshold
˜
λ
, and 0 otherwise. We specify a
prior for the threshold as ˜
λGa(αλ, βλ).
Probit regression is a common regression technique for analyzing binary ecological data
(Trexler & Travis 1993, Hooten & Hefley 2019). We consider probit regression because we can
induce dependence of the binary vector
τ
on the abundance process using data augmentation
(Albert & Chib 1993). Thus, we introduce the latent variables
z
(
ti
) and model [
τi|θ,λ
(
ti
)
,˜
λ
]
using
τi=
1, z(ti)>0
0, z(ti)0
,(13)
z(ti)Nθ0+θ1
1
{λ(ti)01˜
λ},1,(14)
which implies the same model as (11)-(12); however, the specification in (13)-(14) results in
conjugate updates for both the latent variables
z
(
ti
) and
θ
assuming a multivariate normal
prior on
θ
. The observations
Y
are no longer assumed to be conditionally independent
because we explicitly define the data model to account for the dependence of
τ
on
Λ
. To
complete the data model, we specify
yj
(
˜
tk
)
Pois
(
λj
(
˜
tk
)
·ω
(
˜
tk
)) based on the observations
in set T.
Tests for temporal preferential sampling are underdeveloped (Watson 2021). However,
θ1
in (12) provides a convenient model-based quantity that can be used to assess discrete-
time temporal preferential sampling. When sampling mechanisms favor periods of higher
8
abundance, we expect
θ1>
0. Thus, we may test for the presence of temporal preferential
sampling using posterior inference on θ1.
The full Bayesian hierarchical model is provided in Appendix B. We implemented the
model in Julia to reduce computation time (Bezanson et al. 2017); alternatively, it could be
fit using standard MCMC software. We used conventional, conjugate priors for
θ
,
µβ
,
Σβ
,
α
,
σ2, and ˜
λto facilitate computation.
3 Application
We analyzed two abundance data sets subject to temporal preferential sampling: A simulated
data set and a terrestrial site observed by the National Ecological Observatory Network. To
understand the effect of temporal preferential sampling on posterior inference, we compared
the proposed preferential model to its non-preferential variant, which does not contain the
preferential component in (13)-(14). The discrepancy between the resulting inference indicates
the effects of neglecting the presence of temporal preferential sampling. We implemented
both case studies using MCMC algorithms and 500,000 iterations with a burn-in of 200,000
iterations.
3.1 Simulated Abundance
We simulated a log-abundance process for a single species using the process model in (4). With
J= 1, we let A= 0.98 be a scalar, B= (β1, β2) = (0.1,0.3) be a row vector, and σ2= 0.03.
For the environmental covariates
x
, we used an intercept and growing degree day (GDD)
estimates at Harvard Forest in Massachusetts, USA, from 2014-2016. We quantified GDD
using estimated temperature from the Oregon State PRISM Climate data with a baseline
temperature of 10
C and a cutoff temperature of 30
C (Neteler et al. 2011, PRISM Climate
Group 2019). We simulated the sequence of fully observed counts
y
(
y
(
t1
)
, ..., y
(
tN
))
0
using a Poisson distribution with abundance as the intensity; we let
ω
(
˜
tk
) = 1 because we
9
assume homogeneous sampling effort across traps.
We considered three common sampling mechanisms found in abundance studies to de-
termine which counts in
y
were retained for analysis: Random sampling, where each day
in the study was observed with equiprobability (0.3); Preferential switch sampling, where
each day that abundance exceeded a threshold (15) was observed with equiprobability (0.3)
and each day that abundance did not exceed that threshold was unobserved; and logistic
sampling, where the probability of observing day
ti
was
logit1
(
10 + 0
.
4
·λ
(
ti
)). We fit the
preferential and non-preferential models to the retained observed counts to determine the
effect of ignoring temporal preferential sampling on posterior inference. Observations and
posterior estimates for abundance under the three scenarios are shown in Figure 1.
To quantify the error for each model in each scenario, we considered root-mean-squared
error (RMSE) for the abundance process
{λ
(
ti
)
}
, which is conventional in ecological model
comparisons for species richness (Mouillot & Lepretre 1999). Table 2 contains the RMSE for
the preferential and non-preferential models for the three sampling scenarios.
With random sampling, we did not expect the competing models to provide significantly
different results because the sampling mechanism did not engage in temporal preferential
sampling. Empirically, the addition of the preferential data model in (13)-(14) did not
improve posterior inference because the missingness in the data was ignorable. The posterior
estimates for the parameters governing the log-abundance process were recovered by both
models, as depicted in Figure 3. The 95% posterior credible interval for
θ1
was (-0.53,0.46),
which implies that temporal preferential sampling is not detected under the preferential
model.
Under preferential switch sampling, abundance was overestimated using the non-preferential
model during periods of infrequent observation, whereas the preferential model resulted in
more accurate estimates during such periods. Additionally, the RMSE for the preferential
model was lower than that for the non-preferential model. Both models adequately recovered
σ2
. The posterior median for
α
under the preferential model was closer to the true value
10
than under the non-preferential model. Finally, the 95% posterior credible intervals for
β1
and β2contained the true value under only the preferential model.
For the logistic sampling scenario, the preferential model outperformed the non-preferential
model because the observed days were highly concentrated during summers when abundance
was notably high. During periods of infrequent observation, the estimated abundance from the
preferential model was much closer to the truth than the estimates from the non-preferential
model. Both models accurately recovered
σ2
. Additionally, the posterior medians for the
remaining parameters were close to the true values under the preferential model. The true
values for
α
and
β2
were recovered under the non-preferential model; however, the 95%
credible interval for β1did not contain the true value.
In the presence of temporal preferential sampling, the preferential model resulted in
more precise estimates for the abundance process throughout the entire study period, as
evidenced by the narrower 95% credible intervals in Figure 1. The abundance process was
recovered accurately and precisely under the preferential model during periods of infrequent
observation, whereas the non-preferential model tended to overestimate abundance during
these periods. Additionally, the true value for each parameter in the log-abundance process
was captured under the preferential model, whereas
β1
and
β2
were poorly estimated under
the non-preferential model for both preferential sampling scenarios. Thus, we obtained
inadequate posterior inference via the non-preferential model in the presence of temporal
preferential sampling.
3.2 Mosquito Abundance
We analyzed mosquito count data collected by NEON. The NEON mosquito monitoring
program followed a standardized sampling protocol conducted across a broad geographical
range and aimed to collect adult mosquito activity data for phenological modeling (Hoekman
et al. 2016, National Ecological Observatory Network 2021). NEON collected data throughout
each year with two distinct sampling approaches based on levels of mosquito activity: An
11
“off-season” with infrequent sampling during periods of low abundance, and a “field season,”
where sampling intensity increased concurrently with abundance. Because the decision to
start and stop consistent data collection was motivated by the abundance process, NEON
engaged in temporal preferential sampling. We sought to determine if posterior inference
resulting from the preferential and non-preferential model differed under the NEON sampling
protocol.
We expressed the environmental covariates as a convolution to account for latency in the
reproductive and larval stages of the mosquito life cycle:
x(t) = ZG(φ, t, ˜τ)˜
xτ)d˜τ , (15)
where
˜
x
(
˜τ
) represents the original continuous-time environmental variables,
x
(
t
) is a smoothed
version of
˜
x
(
˜τ
) resulting from the convolution, and
G
is a diagonal matrix of basis function
gl
(
φ, t, ˜τ
) for
l
= 1
, ..., p
associated with each covariate. The convolution in (15) is generalized
to account for temporal lag in population dynamics (Aukema et al. 2008, Chi & Zhu 2008).
We constructed the convolution in the form of a backward moving average to account for the
lagged effect of environmental conditions on the mosquito life cycle. If the basis functions are
specified as
gl(φ, t, ˜τ) =
1
φ, t φ < ˜τ < t
0,otherwise
,(16)
then the population growth rate is a function of the average environmental conditions over the
past
φ
days. The parameter
φ
can be specified using expert knowledge of mosquito biology
or estimated in the model when enough data exist to identify it in addition to other model
parameters. We used a 14-day backward moving average of GDD at each site throughout the
study period because GDD has been reported to influence mosquito dynamics (Field et al.
2019); the 14-day time frame was chosen to account for the stages in the mosquito life cycle
preceding sexual maturity (Crans 2004).
12
We restricted analysis to the Aedes genus, in which several species are vectors for dengue
fever, yellow fever, and the Zika virus (Suwanmanee & Luplertlop 2017). Of the NEON
terrestrial sites where the Aedes genus was present, we analyzed counts at the University
of Notre Dame Environmental Research Center (UNDE) because the counts were obtained
daily during the field season. UNDE observed counts for three species in the Aedes genus:
Aedes canadensis,Aedes excrucians, and Aedes punctor. We analyzed counts from 2016 to
2019 in this study.
We considered an absolute loss function when providing inference because it is robust
to skewness that may result when exponentiating the log-abundance process. Therefore, we
reported posterior medians for the abundance process of Aedes punctor in Figure 4; the
analogous plots for Aedes canadensis and Aedes excrucians can be found in Supplemental
Figures 1 and 2. We also reported posterior interquartile ranges (IQR) to accompany the
posterior point estimates. As expected, some observed counts exceed the posterior IQR due
to the inflated counts resulting from extended trapping duration. Similarly, we computed
population growth rates by considering the posterior median of
g
(
t
) in (5), which depends
on
α
and
B
. The estimated growth rates are also provided with each species’ abundance
process.
We inferred the first day in which a species has positive population growth by computing
the derived phenometric
ψ
. The posterior estimates of
ψ
are depicted in Supplemental
Figure 3 for 2017-2019 alongside the same phenometric derived using conventional summary
statistics (Jones & Daehler 2018). Many phenological approaches cannot provide uncertainty
quantification, whereas our mechanistic hierarchical model formally allowed us to quantify
uncertainty with ψvia posterior densities.
The posterior inference provided by the preferential and non-preferential models matched
throughout the entire study period for each species. Neither model overestimated the relatively
low mosquito abundance during the winter, a behavior evident in the non-preferential model
in the simulated case study. The resulting inference from both models was similar because
13
NEON consistently obtained enough zero counts preceding the annual rise and following
the annual decline in mosquito populations; fitting these zero-counts to the non-preferential
model resulted in adequate estimates for
α
and
B
. Although NEON explicitly engaged in
temporal preferential sampling, their sampling mechanism mitigated the effect that temporal
preferential sampling had on posterior inference by sampling infrequently during the off-season
until a mosquito is detected.
We also analyzed a subset of the UNDE data, where observations were removed for
days in which zero mosquitoes were detected; we refer to this subset as the second scenario,
which resembles many abundance studies that only experience positive counts (Turchin 2013).
The posterior inference under the competing models significantly differed under the second
scenario, as seen in Figure 5. The first observed counts of Aedes punctor were positive in
2016, 2017, and 2019. Consequently, abundance was overestimated via the non-preferential
model during the springs of 2016, 2017, and 2019. However, the resulting inference from the
preferential model was similar to that provided in Figure 4 and better coincides with the
removed data and scientific knowledge of the abundance process of mosquitoes; this indicates
that more robust inference is obtained under the preferential model for population dynamics
at the rise and decline of abundance.
In the second scenario, the posterior inference for the phenometric
ψ
substantially changed
for the non-preferential model. As depicted in Figure 6, the non-preferential model more
heavily weighted the beginning of the year, whereas the preferential model provided similar
inference as in the first scenario (see Supplemental Figure 3). The inference provided by the
preferential model aligns with scientific knowledge of mosquito population dynamics under
both scenarios. Posterior estimates for
α
and
B
were inadequate under the non-preferential
model in the second scenario, which significantly affected the ability to obtain robust estimates
for the phenometric.
14
4 Discussion
We augmented a hierarchical abundance model to account for temporal preferential sampling
and to obtain phenometrics. Temporal preferential sampling is common in abundance studies,
and our extension of the model improved abundance estimates during periods of infrequent
observation using the readily available data
τ
. Consequently, the derived phenometrics are
greatly improved when accounting for preferential sampling. We derived population growth
rates explicitly and inferred phenometrics by relating the latent abundance process to the
stochastic Gompertz population growth function in a way that accounts for preferential
sampling. The phenometrics can be compared to existing phenological studies to confirm
or present new results. In addition, our model formally allowed us to quantify uncertainty
associated with ψ, whereas most phenological approaches lack uncertainty quantification in
the relevant phenometrics.
Our first case study revealed the advantages of accounting for temporal preferential
sampling in the abundance model under three sampling scenarios. Under preferential switch
sampling and logistic sampling, fitting the preferential model resulted in more precision when
predicting unobserved abundances. In data sets with fewer, longer gaps of missing data, the
proposed preferential model will outperform its non-preferential variant.
In the mosquito case study, we found that enough zero-abundance counts were recorded
in the UNDE data for the non-preferential model to estimate all model parameters and,
consequently, abundance. Thus, the NEON sampling mechanism did not affect posterior
inference on abundance. However, the non-preferential model overestimated the growth rate
during periods of infrequent observation when analyzing the subset of NEON data that only
included positive counts. The second scenario is realistic because many abundance studies
observe strictly positive counts. For the second scenario, the posterior inference under the
non-preferential model was imprecise during the mosquito off-season, indicating that temporal
preferential sampling should be accounted for when working with abundance data that are
mostly positive.
15
During abundance study design, ecologists and wildlife biologists may apply the proposed
temporal preferential sampling methods to obtain more accurate inference with less data.
Researchers may allocate less resources to collect data at the start and end of the off-season
and more resources during the field season when studying species with population dynamics
that depend on the environment. Additionally, smaller ecological surveys may find the
preferential model useful if they lack the resources to collect data consistently during an
off-season.
Temporal preferential sampling has been accounted for exclusively in the continuous-time
domain. Applications include monitoring air pollution (Shaddick & Zidek 2014), clinical
trials (Monteiro et al. 2019), and neuroimaging (Poeppel 2003). By contrast, we account
for preferential sampling in discrete time to align with data from many abundance studies.
When explicitly accounting for the dependence of observations on the process of interest, we
obtain more robust inference during periods of infrequent observation.
The field of temporal preferential sampling is rapidly developing, and there is much work
to be done regarding its theory and methodology. Additionally, the implementation of models
that account for preferential sampling may be improved. For example, the development of
algorithms that induce conjugacy for the abundance process may increase the efficiency of
MCMC implementations (e.g., Bradley et al. 2018). Recursive and distributed computing
approaches would increase the computational efficiency of temporal preferential sampling
models (e.g., Hooten et al. 2021). If we seek to fit a hidden Markov model while accounting for
preferential sampling, then the use of particle Gibbs filters or forward-backward algorithms
may significantly improve computational efficiency (Beal et al. 2001, Tripuraneni et al. 2015).
Finally, interspecies interactions can be accounted in a multivariate version of our model to
improve species conservation, network analysis, and community ecology.
16
Acknowledgements
Data were collected by the National Ecological Observatory Network and the Oregon State
PRISM Climate Group. The National Ecological Observatory Network is a program sponsored
by the National Science Foundation (NSF) and operated under cooperative agreement by
Battelle. This material is based in part upon work supported by the NSF through the NEON
Program. Any use of trade, firm, or product names is for descriptive purposes only and does
not imply endorsement by the U.S. Government.
Funding
This research was supported by the NSF Graduate Research Fellowship Program.
Disclosure Statement
The authors report that there are no competing interests to declare.
Data Availability
The datasets analyzed during this study are publicly available from the National Ecological Ob-
servatory Network (DOI: dp1.10043.001) and PRISM climate group (https://prism.oregonstate.edu/).
Code Availability
We will provide a GitHub link containing the code for publication.
17
Appendix A
We solve for the species-level per capita growth rate
gj
(
t
) in (5) and the posterior median of
gj(t) in (6). From (4), we can write the abundance of species jon day tas
λj(t) = exp βj(x(t)αjx(t1)) + αjlog λj(t1) + j,t(17)
= exp βj(x(t)αjx(t1))λj(t1)αjej,t ,(18)
where
j,t N
(0
, σ2
). If we consider the heterogeneous Malthusian growth function
λj
(
t
) =
(1 + gj(t))λj(t1), we can solve for gj(t) to obtain
gj(t) = λj(t)
λj(t1) 1 (19)
=exp βj(x(t)αjx(t1))λj(t1)αjej,t
λj(t1) 1 (20)
= exp βj(x(t)αjx(t1))λj(t1)αj1ej,t 1.(21)
Finally,
exp
(
j,t
) is log-normally distributed with parameters
µ
= 0 and
σ2
. Thus,
M
(
j,t
) =
exp(µ) = 1 and
M(gj(t)) = exp βj(x(t)αjx(t1))λj(t1)αj11,(22)
which is the expression in (6). We let M(·) denote the median of the random variable.
18
Appendix B
The full Bayesian hierarchical model that we used in our case studies is
yj(˜
tk)Pois(λj(˜
tk)·ω(˜
tk)),(23)
τi=
1, z(ti)>0
0, z(ti)0
,(24)
z(ti)Nθ0+θ1
1
{λ(ti)01˜
λ},1,(25)
log λ(ti)NBx(ti)ABx(ti1) + Alog λ(ti1), σ2I, i = 2, ..., N, (26)
log λ(t1)N(µ1, σ2
1I),(27)
β0
jN(µβ,Σβ),(28)
µβN(µ0,Σ0),(29)
ΣβIW(Ψ, ν),(30)
αN(µα, σ2
αI),(31)
θN(µθ,Σθ),(32)
σ2IG(q, r),(33)
˜
λGa(αλ, βλ),(34)
for
k
= 1
, ..., n
,
j
= 1
, ..., J
, and
i
= 1
, ..., N
with the exception of (26). We let
Adiag
(
α
),
B(β1, ..., βJ)0, and βj(βj,1, ..., βj,p).
19
References
Albert, J. H. & Chib, S. (1993), ‘Bayesian analysis of binary and polychotomous response
data’, Journal of the American Statistical Association 88(422), 669–679.
Anderson, J. J., Gurarie, E., Bracis, C., Burke, B. J. & Laidre, K. L. (2013), ‘Modeling
climate change impacts on phenology and population dynamics of migratory marine species’,
Ecological Modelling 264, 83–97.
Aukema, B. H., Carroll, A. L., Zheng, Y., Zhu, J., Raffa, K. F., Dan Moore, R., Stahl,
K. & Taylor, S. W. (2008), ‘Movement of outbreak populations of mountain pine beetle:
influences of spatiotemporal patterns and climate’, Ecography 31(3), 348–358.
Beal, M., Ghahramani, Z. & Rasmussen, C. (2001), ‘The infinite hidden Markov model’,
Advances in Neural Information Processing Systems 14, 577–584.
Bewick, S., Cantrell, R. S., Cosner, C. & Fagan, W. F. (2016), ‘How resource phenology
affects consumer population dynamics’, The American Naturalist 187(2), 151–166.
Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. (2017), ‘Julia: A fresh approach to
numerical computing’, SIAM Review 59(1), 65–98.
Bradley, J. R., Holan, S. H. & Wikle, C. K. (2018), ‘Computationally efficient multivariate
spatio-temporal models for high-dimensional count-valued data (with discussion)’, Bayesian
Analysis 13(1), 253–310.
Chi, G. & Zhu, J. (2008), ‘Spatial regression models for demographic analysis’, Population
Research and Policy Review 27(1), 17–42.
Cole, H., Henson, S., Martin, A. & Yool, A. (2012), ‘Mind the gap: The impact of missing data
on the calculation of phytoplankton phenology metrics’, Journal of Geophysical Research:
Oceans 117(C8).
20
Crans, W. J. (2004), ‘A classification system for mosquito life cycles: Life cycle types for
mosquitoes of the northeastern United States’, Journal of Vector Ecology 29, 1–10.
Demidova, A. V., Druzhinina, O. V., Masina, O. N. & Petrov, A. A. (2021), ‘Synthesis and
computer study of population dynamics controlled models using methods of numerical
optimization, stochastization and machine learning’, Mathematics 24, 3303.
Dennis, B., Ponciano, J. M., Lele, S. R., Taper, M. L. & Staples, D. F. (2006), ‘Estimating
density dependence, process noise, and observation error’, Ecological Monographs
76
(3), 323–
341.
Field, E. N., Tokarz, R. E. & Smith, R. C. (2019), ‘Satellite imaging and long-term mosquito
surveillance implicate the influence of rapid urbanization on Culex vector populations’,
Insects 10(9), 269.
Forrest, J. & Miller-Rushing, A. J. (2010), ‘Toward a synthetic understanding of the role of
phenology in ecology and evolution’, Philosophical Transactions of the Royal Society B:
Biological Sciences 365(1555), 3101–3112.
Gelfand, A. E. & Smith, A. F. (1990), ‘Sampling-based approaches to calculating marginal
densities’, Journal of the American Statistical Association 85(410), 398–409.
Hoekman, D., Springer, Y. P., Gibson, C., Barker, C., Barrera, R., Blackmore, M., Bradshaw,
W., Foley, D., Ginsberg, H., Hayden, M., Holzapfel, C. M., Juliano, S. A., Kramer, L. D.,
LaDeau, S. L., Livdahl, T. P., Moore, C. G., Nasci, R. S., Reisen, W. K. & Savage, H. M.
(2016), ‘Design for mosquito abundance, diversity, and phenology sampling within the
National Ecological Observatory Network’, Ecosphere 7(5), e01320.
Hooten, M. B. & Hefley, T. J. (2019), Bringing Bayesian Models to Life, CRC Press.
Hooten, M. B., Johnson, D. S. & Brost, B. M. (2021), ‘Making recursive Bayesian inference
accessible’, The American Statistician 75(2), 185–194.
21
Hoskovec, L., Koslovsky, M. D., Koehler, K., Good, N., Peel, J. L., Volckens, J. & Wilson, A.
(2022), ‘Infinite hidden Markov models for multiple multivariate time series with missing
data’, arXiv preprint arXiv:2204.06610 .
Jones, C. A. & Daehler, C. C. (2018), ‘Herbarium specimens can reveal impacts of climate
change on plant phenology: A review of methods and applications’, PeerJ 6, e4576.
Karcher, M. D., Palacios, J. A., Bedford, T., Suchard, M. A. & Minin, V. N. (2016),
‘Quantifying and mitigating the effect of preferential sampling on phylodynamic inference’,
PLoS Computational Biology 12(3), e1004789.
Knape, J. & de Valpine, P. (2012), ‘Are patterns of density dependence in the global population
dynamics database driven by uncertainty about population abundance?’, Ecology Letters
15(1), 17–23.
McCulloch, R. E. & Tsay, R. S. (1994), ‘Bayesian inference of trend and difference-stationarity’,
Econometric Theory 10(3-4), 596–608.
Monteiro, A. A. F. O., Menezes, R. & Silva, M. E. (2019), ‘Modelling preferential sampling
in time’, Sociedad de Estad´ıstica e Investigaci´on Operativa (SEIO) .
Monteiro, A. A. F. O., Menezes, R. & Silva, M. E. (2020), ‘Modelling irregularly spaced time
series under preferential sampling’, Instituto Nacional de Estat´ıstica (INE) .
Mouillot, D. & Lepretre, A. (1999), ‘A comparison of species diversity estimators’, Researches
on Population Ecology 41(2), 203–215.
National Ecological Observatory Network (2021), ‘Mosquitoes sampled from CO2 traps
(dp1.10043.001), release-2021.’.
URL: https://data.neonscience.org
Neteler, M., Roiz, D., Rocchini, D., Castellani, C. & Rizzoli, A. (2011), ‘Terra and Aqua
22
satellites track tiger mosquito invasion: modelling the potential distribution of Aedes
albopictus in north-eastern Italy’, International Journal of Health Geographics
10
(1), 1–14.
Pau, S., Wolkovich, E. M., Cook, B. I., Davies, T. J., Kraft, N. J., Bolmgren, K., Betancourt,
J. L. & Cleland, E. E. (2011), ‘Predicting phenology by integrating ecology, evolution and
climate science’, Global Change Biology 17(12), 3633–3643.
Poeppel, D. (2003), ‘The analysis of speech in different temporal integration windows: Cerebral
lateralization as asymmetric sampling in time’, Speech Communication 41(1), 245–255.
Potts, J. M. & Elith, J. (2006), ‘Comparing species abundance models’, Ecological Modelling
199(2), 153–163.
PRISM Climate Group (2019), ‘PRISM gridded climate data’.
URL: https://prism.oregonstate.edu
Rao, K. A. & D’Cunha, J. G. (2016), ‘Bayesian inference for median of the lognormal
distribution’, Journal of Modern Applied Statistical Methods 15(2), 32.
Roleˇcek, J., Chytr´y, M., ajek, M., Lvonˇcık, S. & Tich´y, L. (2007), ‘Sampling design in
large-scale vegetation studies: Do not sacrifice ecological thinking to statistical purism!’,
Folia Geobotanica 42(2), 199–208.
Ross, B. E., Hooten, M. B., DeVink, J.-M. & Koons, D. N. (2015), ‘Combined effects
of climate, predation, and density dependence on greater and lesser scaup population
dynamics’, Ecological Applications 25(6), 1606–1617.
Royama, T. (2012), Analytical Population Dynamics, Vol. 10, Springer Science & Business
Media.
Shaddick, G. & Zidek, J. V. (2014), ‘A case study in preferential sampling: Long term
monitoring of air pollution in the UK’, Spatial Statistics 9, 51–65.
23
Shumway, R. H. & Stoffer, D. S. (2000), Time Series Analysis and its Applications, Vol. 3,
Springer.
Speekenbrink, M. & Visser, I. (2021), ‘Ignorable and non-ignorable missing data in hidden
Markov models’, arXiv preprint arXiv:2109.02770 .
Suwanmanee, S. & Luplertlop, N. (2017), ‘Dengue and Zika viruses: Lessons learned from the
similarities between these Aedes mosquito-vectored arboviruses’, Journal of Microbiology
55(2), 81–89.
Trexler, J. C. & Travis, J. (1993), ‘Nontraditional regression analyses’, Ecology
74
(6), 1629–
1637.
Tripuraneni, N., Gu, S. S., Ge, H. & Ghahramani, Z. (2015), ‘Particle Gibbs for infinite
hidden Markov models’, Advances in Neural Information Processing Systems 28.
Turchin, P. (2013), Complex Population Dynamics, Princeton University Press.
Usher, M. B. (1979), ‘Markovian approaches to ecological succession’, The Journal of Animal
Ecology 48, 413–426.
Watson, J. (2021), ‘A perceptron for detecting the preferential sampling of locations and
times chosen to monitor a spatio-temporal process’, Spatial Statistics 43, 100500.
Wikle, C. K. & Hooten, M. B. (2010), ‘A general science-based framework for dynamical
spatio-temporal models’, Test 19(3), 417–451.
Zachmann, L. J., Borgman, E. M., Witwicki, D. L., Swan, M. C., McIntyre, C. & Hobbs,
N. T. (2022), ‘Bayesian models for analysis of inventory and monitoring data with non-
ignorable missingness’, Journal of Agricultural, Biological and Environmental Statistics
27(1), 125–148.
Zellner, A. (1971), ‘Bayesian and non-Bayesian analysis of the log-normal distribution and
log-normal regression’, Journal of the American Statistical Association 66(334), 327–330.
24
Zhang, X., Friedl, M. A. & Schaaf, C. B. (2009), ‘Sensitivity of vegetation phenology
detection to the temporal resolution of satellite data’, International Journal of Remote
Sensing 30(8), 2061–2074.
25
Figure 1: Abundance estimates under three sampling mechanisms. In (a)-(c), the true
abundance and the observed abundance are depicted as black lines and dots, respectively.
The red lines and polygons depict the posterior means and 95% credible intervals under the
non-preferential model. The blue lines and polygons depict the posterior means and 95%
credible intervals under the preferential model.
Table 2: Root-mean-squared errors for the two competing models under three sampling
scenarios.
Sampling Mechanism Non-preferential Model Preferential Model
Random 4.393 4.332
Preferential Switch 12.127 8.016
Logistic 8.711 2.578
26
Figure 3: Posterior estimates under the preferential and non-preferential model for random,
preferential switch, and logistic sampling. True values are indicated by horizontal black lines.
(a) α(b) σ2
(c) β1(d) β2
27
Figure 4: Top: Posterior IQR and median for the abundance process of Aedes punctor at
the UNDE site from 2016-2019. Black dots denote observed abundances, lines denote the
posterior medians, and polygons denote the posterior IQRs. Bottom: Derived posterior
means for the growth rate of Aedes punctor at the UNDE site from 2016-2019. The blue
lines and polygons correspond to inference provided by the preferential model, whereas the
red corresponds to the non-preferential model.
28
Figure 5: Top: Posterior IQR and median for the abundance process of Aedes punctor at
the UNDE site when removing zero-count observations. Full black dots denote observed
abundances, black circles denote removed zero-count observations, lines denote the posterior
medians, and polygons denote the posterior IQRs. Bottom: Derived posterior means for the
growth rate of Aedes punctor at the UNDE site when removing zero-count observations.
29
Figure 6: Posterior distributions for the first day in which each species experiences population
growth,
ψ
, for 2017-2019 when removing zero-count observations. Black lines depict the
phenometric derived via summary statistics.
30
... We are motivated by a real-world problem in phenology, the field which studies the timing of seasonal life cycle events in plants and animals [Forrest and Miller-Rushing, 2010]. For recent applications, see [Dennis et al., 2024] and [Schwob et al., 2023] United States, and anomalously early or late bloom dates are used as evidence of environmental stress caused by climate change. Typically, bloom dates are considered early or late relative to historic records before major anthropogenic warming began in 1980 [Stoker et al., 2013]. ...
Preprint
Full-text available
We propose a new method to adjust for the bias that occurs when an individual monitors a location and reports the status of an event. For example, a monitor may visit a plant each week and report whether the plant is in flower or not. The goal is to estimate the time the event occurred at that location. The problem is that popular estimators often incur bias both because the event may not coincide with the arrival of the monitor and because the monitor may report the status in error. To correct for this bias, we propose a nonparametric Bayesian model that uses monotonic splines to estimate the event time. We first demonstrate the problem and our proposed solution using simulated data. We then apply our method to a real-world example from phenology in which lilac are monitored by citizen scientists in the northeastern United States, and the timing of the flowering is used to study anthropogenic warming. Our analysis suggests that common methods fail to account for monitoring bias and underestimate the peak bloom date of the lilac by 48 days on average. In addition, after adjusting for monitoring bias, several locations had anomalously late bloom dates that did not appear anomalous before adjustment. Our findings underscore the importance of accounting for monitoring bias in event-time estimation. By applying our nonparametric Bayesian model with monotonic splines, we provide a more accurate approach to estimating bloom dates, revealing previously undetected anomalies and improving the reliability of citizen science data for environmental monitoring.
... This approach can generate predictions that are analogous to the mechanistic approach we described above: forecasts of relevant phenometics with associated uncertainty while requiring less analytical and computational demand. Although raw data often show bias in terms of observed first or peak events due to inconsistent sampling effort over space and time (Schwob et al., 2023), a range of parametric models (e.g., 2nd-order polynomial regression; quantile regression; Weibull models; Inouye et al., 2019) and non-parametric (e.g., generalized additive models; Stemkovski et al., 2020) have been used to harmonize disparate datasets into unbiased phenometric estimators (Belitz et al., 2020;Youngflesh et al., 2021). Some disadvantages of these phenomenological approaches are that they are descriptive rather than mechanistic and are not as amenable to simultaneously using predictive validation to learn about process components. ...
Article
Full-text available
Climate‐induced shifts in mosquito phenology and population structure have important implications for the health of humans and wildlife. The timing and intensity of mosquito interactions with infected and susceptible hosts are a primary determinant of vector‐borne disease dynamics. Like most ectotherms, rates of mosquito development and corresponding phenological patterns are expected to change under shifting climates. However, developing accurate forecasts of mosquito phenology under climate change that can be used to inform management programs remains challenging despite an abundance of available data. As climate change will have variable effects on mosquito demography and phenology across species it is vital that we identify associated traits that may explain the observed variation. Here, we review a suite of modeling approaches that could be applied to generate forecasts of mosquito activity under climate change and evaluate the strengths and weaknesses of the different approaches. We describe four primary life history and physiological traits that can be used to constrain models and demonstrate how this prior information can be harnessed to develop a more general understanding of how mosquito activity will shift under changing climates. Combining a trait‐based approach with appropriate modeling techniques can allow for the development of actionable, flexible, and multi‐scale forecasts of mosquito population dynamics and phenology for diverse stakeholders.
Article
Mechanistic statistical models are commonly used to study the flow of biological processes. For example, in landscape genetics, the aim is to infer spatial mechanisms that govern gene flow in populations. Existing statistical approaches in landscape genetics do not account for temporal dependence in the data and may be computationally prohibitive. We infer mechanisms with a Bayesian hierarchical dyadic model that scales well with large data sets and that accounts for spatial and temporal dependence. We construct a fully connected network comprising spatio-temporal data for the dyadic model and use normalized composite likelihoods to account for the dependence structure in space and time. We develop a dyadic model to account for physical mechanisms commonly found in physical-statistical models and apply our methods to ancient human DNA data to infer the mechanisms that affected human movement in Bronze Age Europe.
Article
Full-text available
The problems of synthesis and analysis of multidimensional controlled models of population dynamics are of both theoretical and applied interest. The need to solve numerical optimization problems for such a class of models is associated with the expansion of ecosystem control requirements. The need to solve the problem of stochastization is associated with the emergence of new problems in the study of ecological systems properties under the influence of random factors. The aim of the work is to develop a new approach to studying the properties of population dynamics systems using methods of numerical optimization, stochastization and machine learning. The synthesis problems of nonlinear three-dimensional models of interconnected species number dynamics, taking into account trophic chains and competition in prey populations, are studied. Theorems on the asymptotic stability of equilibrium states are proved. A qualitative and numerical study of the models is carried out. Using computational experiments, the results of an analytical stability and permanent coexistence study are verified. The search for equilibrium states belonging to the stability and permanent coexistence region is made using the developed intelligent algorithm and evolutionary calculations. The transition is made from the model specified by the vector ordinary differential equation to the corresponding stochastic model. A comparative analysis of deterministic and stochastic models with competition and trophic chains is carried out. New effects are revealed that are characteristic of three-dimensional models, taking into account the competition in populations of prey. The formulation of the optimal control problem for a model with competition and trophic chains is proposed. To find optimal trajectories, new generalized algorithms for numerical optimization are developed. A methods for the synthesis of controllers based on the use of artificial neural networks and machine learning are developed. The results on the search for optimal trajectories and generation of control functions are presented.The obtained results can be used in modeling problems of ecological, demographic, socio-economic and chemical kinetics systems.
Article
Full-text available
We describe the application of Bayesian hierarchical models to the analysis of data from long-term, environmental monitoring programs. The goal of these ongoing programs is to understand status and trend in natural resources. Data are usually collected using complex sampling designs including stratification, revisit schedules, finite populations, unequal probabilities of inclusion of sample units, and censored observations. Complex designs intentionally create data that are missing from the complete data that could theoretically be obtained. This “missingness” cannot be ignored in analysis. Data collected by monitoring programs have traditionally been analyzed using the design-based Horvitz–Thompson estimator to obtain point estimates of means and variances over time. However, Horvitz–Thompson point estimates are not capable of supporting inference on temporal trend or the predictor variables that might explain trend, which instead requires model-based inference. The key to applying model-based inference to data arising from complex designs is to include information about the sampling design in the analysis. The statistical concept of ignorability provides a theoretical foundation for meeting this requirement. We show how Bayesian hierarchical models provide a general framework supporting inference on status and trend using data from the National Park Service Inventory and Monitoring Program as examples. Supplemental Materials Code and data for implementing the analyses described here can be accessed here: https://doi.org/10.36967/code-2287025 .
Article
Full-text available
Citizen science databases are increasing in importance as sources of ecological information, but variability in effort across locations is inherent to such data. Spatially biased data—data not sampled uniformly across the study region—is expected. A further introduction of bias is variability in the level of sampling activity across locations. This motivates our work: with a spatial dataset of visited locations and sampling activity at those locations, we propose a model-based approach for assessing effort at these locations. Adjusting for potential spatial bias both in terms of sites visited and in terms of effort is crucial for developing reliable species distribution models (SDMs). Using data from eBird, a global citizen science database dedicated to avifauna, and illustrative regions in Pennsylvania and Germany, we model spatial dependence in both the observation locations and observed activity. We employ point process models to explain the observed locations in space, fit a geostatistical model to explain observation effort at locations, and explore the potential existence of preferential sampling, i.e., dependence between the two processes. Altogether, we offer a richer notion of sampling effort, combining information about location and activity. As SDMs are often used for their predictive capabilities, an important advantage of our approach is the ability to predict effort at unobserved locations and over regions. In this way, we can accommodate misalignment between point-referenced data and say, desired areal scale density. We briefly illustrate how our proposed methods can be applied to SDMs, with demonstrated improvement in prediction from models incorporating effort.
Article
Full-text available
The ecology and environmental conditions of a habitat have profound influences on mosquito population abundance. As a result, mosquito species vary in their associations with particular habitat types, yet long-term studies showing how mosquito populations shift in a changing ecological landscape are lacking. To better understand how land use changes influence mosquito populations, we examined mosquito surveillance data over a thirty-four-year period for two contrasting sites in central Iowa. One site displayed increasing levels of urbanization over time and a dramatic decline in Culex pipiens group (an informal grouping of Culex restuans, Culex pipiens, and Culex salinarius, referred to as CPG), the primary vectors of West Nile virus in central Iowa. Similar effects were also shown for other mosquito vector populations, yet the abundance of Aedes vexans remained constant during the study period. This is in contrast to a second site, which reflected an established urban landscape. At this location, there were no significant changes in land use and CPG populations remained constant. Climate data (temperature, total precipitation) were compiled for each location to see if these changes could account for altered population dynamics, but neither significantly influence CPG abundance at the respective site locations. Taken together, our data suggest that increased landscape development can have negative impacts on Culex vector populations, and we argue that long-term surveillance paired with satellite imagery analysis are useful methods for measuring the impacts of rapid human development on mosquito vector communities. As a result, we believe that land use changes can have important implications for mosquito management practices, population modeling, and disease transmission dynamics.
Article
Full-text available
Studies in plant phenology have provided some of the best evidence for large-scale responses to recent climate change. Over the last decade, more than thirty studies have used herbarium specimens to analyze changes in flowering phenology over time, although studies from tropical environments are thus far generally lacking. In this review, we summarize the approaches and applications used to date. Reproductive plant phenology has primarily been analyzed using two summary statistics, the mean flowering day of year and first-flowering day of year, but mean flowering day has proven to be a more robust statistic. Two types of regression models have been applied to test for associations between flowering, temperature and time: flowering day regressed on year and flowering day regressed on temperature. Most studies analyzed the effect of temperature by averaging temperatures from three months prior to the date of flowering. On average, published studies have used 55 herbarium specimens per species to characterize changes in phenology over time, but in many cases fewer specimens were used. Geospatial grid data are increasingly being used for determining average temperatures at herbarium specimen collection locations, allowing testing for finer scale correspondence between phenology and climate. Multiple studies have shown that inferences from herbarium specimen data are comparable to findings from systematically collected field observations. Understanding phenological responses to climate change is a crucial step towards recognizing implications for higher trophic levels and large-scale ecosystem processes. As herbaria are increasingly being digitized worldwide, more data are becoming available for future studies. As temperatures continue to rise globally, herbarium specimens are expected to become an increasingly important resource for analyzing plant responses to climate change.
Article
The preferential sampling of locations chosen to observe a spatio-temporal process has been identified as a major problem across multiple fields. Predictions of the process can be severely biased when standard statistical methodologies are applied to preferentially sampled data without adjustment. Detecting preferential sampling is currently a technically demanding task. As a result, the problem is often ignored in data analyses. This paper offers a general, intuitive, and computationally-fast solution. A novel approach for testing if a spatio-temporal dataset was preferentially sampled is presented. We refer to the test as a perceptron as it attempts to capture the numerous factors behind the human decision-making that selected the sampled locations and times. Importantly, the method can also help with the discovery of a set of informative covariates that can sufficiently control for the preferential sampling. The discovery of these covariates can justify the continued use of standard methodologies. A thorough simulation study is presented to demonstrate both the power and validity of the test in various data settings. The test is shown to attain high power for non-Gaussian data with sample sizes as low as 50. Finally, two previously-published case studies are revisited and new insights into the nature of the informative sampling are gained. The test can be implemented with the R package PStestR.
Article
Bayesian models provide recursive inference naturally because they can formally reconcile new data and existing scientific information. However, popular use of Bayesian methods often avoids priors that are based on exact posterior distributions resulting from former studies. Two existing Recursive Bayesian methods are: Prior- and Proposal-Recursive Bayes. Prior-Recursive Bayes uses Bayesian updating, fitting models to partitions of data sequentially, and provides a way to accommodate new data as they become available using the posterior from the previous stage as the prior in the new stage based on the latest data. Proposal-Recursive Bayes is intended for use with hierarchical Bayesian models and uses a set of transient priors in first stage independent analyses of the data partitions. The second stage of Proposal-Recursive Bayes uses the posteriors from the first stage as proposals in an MCMC algorithm to fit the full model. We combine Prior- and Proposal-Recursive concepts to fit any Bayesian model, and often with computational improvements. We demonstrate our method with two case studies. Our approach has implications for big data, streaming data, and optimal adaptive design situations.
Article
Binomial N-mixture models have proven very useful in ecology, conservation and monitoring: they allow estimation and modeling of abundance separately from detection probability using simple counts. Recently, doubts about parameter identifiability have been voiced. I conducted a large-scale screening test with 137 bird data sets from 2,037 sites. I found virtually no identifiability problems for Poisson and zero-inflated Poisson (ZIP) binomial N-mixture models, but negative-binomial (NB) models had problems in 25% of all data sets. The corresponding multinomial N-mixture models had no problems. Parameter estimates under Poisson and ZIP binomial and multinomial N-mixture models were extremely similar. Identifiability problems became a little more frequent with smaller sample sizes (267 and 50 sites), but were unaffected by whether the models did or did not include covariates. Hence, binomial N-mixture model parameters with Poisson and ZIP mixtures typically appeared identifiable. In contrast, NB mixtures were often unidentifiable, which is worrying since these were often selected by AIC. Identifiability of binomial N-mixture models should always be checked. If problems are found, simpler models, integrated models which combine different observation models or the use of external information via informative priors or penalized likelihoods may help. This article is protected by copyright. All rights reserved.