# Zero-state Markov switching count-data models: an empirical assessment.

**ABSTRACT** In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.

**0**Bookmarks

**·**

**198**Views

- Cell Biology International 01/2008; 32(3). · 1.64 Impact Factor
- SourceAvailable from: Fred L. Mannering[Show abstract] [Hide abstract]

**ABSTRACT:**The analysis of highway-crash data has long been used as a basis for influencing highway and vehicle designs, as well as directing and implementing a wide variety of regulatory policies aimed at improving safety. And, over time there has been a steady improvement in statistical methodologies that have enabled safety researchers to extract more information from crash databases to guide a wide array of safety design and policy improvements. In spite of the progress made over the years, important methodological barriers remain in the statistical analysis of crash data and this, along with the availability of many new data sources, present safety researchers with formidable future challenges, but also exciting future opportunities. This paper provides guidance in defining these challenges and opportunities by first reviewing the evolution of methodological applications and available data in highway-accident research. Based on this review, fruitful directions for future methodological developments are identified and the role that new data sources will play in defining these directions is discussed. It is shown that new methodologies that address complex issues relating to unobserved heterogeneity, endogeneity, risk compensation, spatial and temporal correlations, and more, have the potential to significantly expand our understanding of the many factors that affect the likelihood and severity (in terms of personal injury) of highway crashes. This in turn can lead to more effective safety countermeasures that can substantially reduce highway-related injuries and fatalities.Analytic Methods in Accident Research. 01/2013; - SourceAvailable from: link.springer.com
##### Article: Modelling for identifying accident-prone spots: Bayesian approach with a Poisson mixture model

[Show abstract] [Hide abstract]

**ABSTRACT:**In traditional identification of hot spots, often known as the sites with black spots or accident-prone locations, methodologies are developed based on the total number of accidents. These criteria provide no consideration of whether the accidents were caused or could be averted by road improvements. These traditional methods result in misidentification of locations that are not truly hazardous from a road safety authority perspective and consequently may lead to a misapplication of safety improvement funding. We consider a mixture of the zero-inflated Poisson and the Poisson regression models to analyze zero-inflated data sets drawn from traffic accident studies. Based on the membership probabilities, observations are well separated into two clusters. One is the ZIP cluster; the other is the standard Poisson cluster. A simulation study and real data analysis are performed to demonstrate model fitting performances of the proposed model. The Bayes factor and the Bayesian information criterion are used to compare the proposed model with several competing models. Ultimately, this model could detect accident-prone spots.KSCE Journal of Civil Engineering 16(3). · 0.38 Impact Factor

Page 1

arXiv:0811.3639v2 [stat.AP] 2 Aug 2009

Zero-state Markov switching count-data

models: an empirical assessment

Nataliya V. Malyshkina∗and Fred L. Mannering

School of Civil Engineering, 550 Stadium Mall Drive, Purdue University, West

Lafayette, IN 47907, United States

Abstract

In this study, a two-state Markov switching count-data model is proposed as an alter-

native to zero-inflated models to account for the preponderance of zeros sometimes

observed in transportation count data, such as the number of accidents occurring on

a roadway segment over some period of time. For this accident-frequency case, zero-

inflated models assume the existence of two states: one of the states is a zero-accident

count state, in which accident probabilities are so low that they cannot be statisti-

cally distinguished from zero, and the other state is a normal count state, in which

counts can be non-negative integers that are generated by some counting process,

for example, a Poisson or negative binomial. In contrast to zero-inflated models,

Markov switching models allow specific roadway segments to switch between the

two states over time. An important advantage of this Markov switching approach

is that it allows for the direct statistical estimation of the specific roadway-segment

state (i.e., zero or count state) whereas traditional zero-inflated models do not. To

demonstrate the applicability of this approach, a two-state Markov switching nega-

tive binomial model (estimated with Bayesian inference) and standard zero-inflated

negative binomial models are estimated using five-year accident frequencies on Indi-

ana interstate highway segments. It is shown that the Markov switching model is a

viable alternative and results in a superior statistical fit relative to the zero-inflated

models.

Key words: Accident frequency count data models; zero-inflated models; negative

binomial; Markov switching; Bayesian; MCMC

∗Corresponding author.

Email addresses: nmalyshk@purdue.edu (Nataliya V. Malyshkina),

flm@ecn.purdue.edu (Fred L. Mannering).

Preprint submitted to Accident Analysis and Prevention2 August 2009

Page 2

1Introduction

The preponderance of zeros observed in many count-data applications has lead

researchers to consider the possibility that two states exist; one state that is a

“zero” state (where all counts are zero) and the other that is a normal count

state that includes zeros and positive integers. This two-state assumption has

led to the development of zero-inflated Poisson models and zero-inflated neg-

ative binomial models to account for possible overdispersion in the normal-

count state. These zero-inflated models have been applied to a number of fields

of study. For example, Lambert (1992) used a zero-inflated Poisson model to

study manufacturing defects. Lambert argued that unobserved changes in the

process caused manufacturing defects to move randomly between a state that

was near perfect (the zero state where defects were extremely rare) and an im-

perfect state where defects were possible but not inevitable (the normal count

state). Lamberts empirical assessment demonstrated that the zero-inflated

modeling approach fit the data much better than the standard Poisson. In

other work, van den Broek (1995) provided an application of the zero-inflated

Poisson to the frequency of urinary tract infections in men diagnosed with the

human immunodeficiency virus (HIV). In this case, it was postulated that a

zero-infection state existed for a portion of the patient population and that

this state generated a large number of zeros in the frequency data, which was

supported by the statistical findings. Also, Bohning et al. (1999) successfully

applied the zero-inflated Poisson to study the frequency of dental decay in

Portugal.

The frequency of vehicle accidents on a section of highway or at an intersection

(over some time period) often exhibit excess zeros. Similar to the literature

discussed above, the excess of zeros observed in the data could potentially be

explained by the existence of a two-state process for accident data genera-

tion (Shankar et al., 1997; Carson and Mannering, 2001; Lee and Mannering,

2002). In this case, roadway segments can belong to one of two states: a

zero-accident state (where zero accidents are expected) and a normal-count

state, in which accidents can happen and accident frequencies are generated

by some given counting process (Poisson or negative binomial). To account for

the two-state phenomena, zero-inflated Poisson (ZIP) and zero-inflated nega-

tive binomial (ZINB) models have been used in a number of roadway safety

studies (Miaou, 1994; Shankar et al., 1997; Washington et al., 2003). These

models explicitly account for an existence of the two states for accident data

generation and allow modeling of the probabilities of being in these states.

An application of ZIP and ZINB models was an empirical advance in statisti-

cal modeling of accident frequencies. However, although zero-inflated models

have become popular in a number of fields, they suffer from two important

drawbacks. First, these models do not deal directly with the states of road-

2

Page 3

way segments, instead they consider probabilities of being in these states. As

a result, zero-inflated models do not allow a direct statistical estimation of

whether individual roadway segments are in the zero or normal count state.

For example, suppose a given roadway segment has zero accidents observed

over a given time interval. This segment could truly be in the zero-accident

count state, or it may be in the normal-count state and just happened to have

zero accidents over the considered time interval (Shankar et al., 1997). Distin-

guishing between these two possibilities is not straightforward in zero-inflated

models. The second drawback of zero-inflated models is that, although they

allow roadway segments to be in different states during different observation

periods, zero-inflated models do not explicitly consider switching by the road-

way segments between the states over time. This switching is important from

the theoretical point of view because it is unreasonable to expect any roadway

segment to be in the zero-accident all the time and to have the long-term

mean accident frequency equal to zero (Lord et al., 2005).

In this study, we propose two-state Markov switching count-data models that

consider the zero-accident state and the normal-count state of roadway safety.

Similar to zero-inflated models, Markov switching models are intended to ex-

plain the preponderance of zeros observed in accident count data. However,

in contrast to zero-inflated models, Markov switching models allow a direct

statistical estimation of the states roadway segments are in at specific points

in time and explicitly consider changes in these states over time.

2Model specification

Two-state Markov switching count-data models of accident frequencies were

first presented in Malyshkina et al. (2009). Following that paper, we note that,

although there are several major differences between Malyshkina et al. (2009)

and this study, many ideas and statistical estimation methods developed in

Malyshkina et al. (2009) apply in this study as well. In that paper, two states

were assumed to exist but both were true count states (i.e., a zero-count

state did not exist). In the current paper, we take a different approach and

consider the case where one of the states is a zero state and the other is a

true count state and that individual roadway segments move between these

two states over time. This differs from Malyshkina et al. (2009) in that their

model assumes two true-count states and that all roadway segments are in the

same state at the same time.

To show this model, we note that Markov switching models are parametric

and can be fully specified by a likelihood function f(Y|Θ,M), which is the

conditional probability distribution of the vector of all observations Y, given

the vector of all parameters Θ of model M. In our study, we observe the

3

Page 4

number of accidents At,nthat occur on the nthroadway segment during time

period t. Thus Y = {At,n} includes all accidents observed on all roadway

segments over all time periods. Here n = 1,2,...,N and t = 1,2,...,T,

where N is the total number of roadway segments observed (it is assumed to

be constant over time) and T is the total number of time periods. Model M =

{M,Xt,n} includes the model’s name M (for example, M = “ZIP” or “ZINB”)

and the vector Xt,nof all roadway segment characteristic variables (segment

length, curve characteristics, grades, pavement properties, and so on).

To define the likelihood function, we introduce an unobserved (latent) state

variable st,n, which determines the state of the nthroadway segment during

time period t. Without loss of generality, it is assumed assume that the state

variable st,n can take on the following two values: st,n = 0 corresponds to

the zero-accident state, and st,n = 1 corresponds to the normal-count state

(n = 1,2,...,N and t = 1,2,...,T). It is further assumed that, for each road-

way segment n, the state variable st,nfollows a stationary two-state Markov

chain process in time,1which can be specified by time-independent transition

probabilities as

P(st+1,n= 1|st,n= 0) = p(n)

0→1,P(st+1,n= 0|st,n= 1) = p(n)

1→0. (1)

Here, for example, P(st+1,n = 1|st,n = 0) is the conditional probability of

st+1,n= 1 at time t + 1, given that st,n= 0 at time t. Transition probabilities

p(n)

(n = 1,2,...,N). The stationary unconditional probabilities of states st,n= 0

and st,n= 1 are ¯ p(n)

0

= p(n)

respectively.2If p(n)

0

segment n state st,n= 0 occurs more frequently than state st,n= 1. If p(n)

p(n)

0→1and p(n)

1→0are unknown parameters to be estimated from accident data

1→0/(p(n)

1→0, then ¯ p(n)

0→1+ p(n)

1→0) and ¯ p(n)

> ¯ p(n)

1

1

= p(n)

0→1/(p(n)

0→1+ p(n)

1→0)

0→1< p(n)

and, on average, for roadway

0→1>

1→0, then state st,n= 1 occurs more frequently for segment n.3

Next, consider a two-state Markov switching negative binomial (MSNB) model

that assumes a negative binomial (NB) data-generating process in the normal-

count state st,n= 1. With this, the probability of At,naccidents occurring on

roadway segment n during time period t is

1Markov property means that the probability distribution of st+1,ndepends only

on the value st,nat time t, but not on the previous history st−1,st−2,.... Stationarity

of {st,n} is in the statistical sense.

2These can be found from stationarity conditions ¯ p(n)

¯ p(n)

1010

3Here, Eq. (1) is a significant departure from Malyshkina et al. (2009) in that in-

dividual roadway segments can be in different states at the same time (i.e., the state

variable is subscripted by roadway segment n). Also, in contrast to Malyshkina et al.

(2009), here we do not restrict state st,n= 0 to be more frequent than state st,n= 1.

0

= [1−p(n)

0→1]¯ p(n)

0

+p(n)

1→0¯ p(n)

1,

= p(n)

0→1¯ p(n)

+ [1 − p(n)

1→0]¯ p(n)

and ¯ p(n)

+ ¯ p(n)

1

= 1.

4

Page 5

?

?

0

?

s

)(

01

n

p?

)(

10

)(

11

1

nn

pp

??

??

)(

10

n

p?

)(

01

)(

00

1

nn

pp

??

??

1

?

s

),|(

) 0 () 0 (

?

)

,

?

ntA NB

or

|(

) 0 (

?

,

ntAP

or

)(

,ntAI

),|(

) 1 () 1 (,

??

ntANB

or

)|(

) 1 (

?

,

ntAP

or

)(

,ntAI

1

, 2 ?

? nts

0

, 3 ?

? nts

1

, 1 ?

? nts

0

,?

nts

0

, 1 ?

? nts

Fig. 1. Graphical demonstration of a two-state Markov switching model.

P(A)

t,n =

I(At,n) if st,n= 0

NB(At,n)if st,n= 1

, (2)

I(At,n) = {1 if At,n= 0 and 0 if At,n> 0}, (3)

NB(At,n) =Γ(At,n+ 1/α)

Γ(1/α)At,n!

?

1

1 + αλt,n

?1/α?

αλt,n

1 + αλt,n

?At,n

,(4)

λt,n= exp(β′Xt,n),t = 1,2,...,T,n = 1,2,...,N.(5)

Here, Eq. (3) is the probability mass function that reflects the fact that acci-

dents never happen in the zero-accident state st,n= 0.4Eq. (4) is the standard

negative binomial probability mass function, Γ( ) is the gamma function, and

prime means transpose (so β′is the transpose of β). Parameter vector β and

the over-dispersion parameter α ≥ 0 are unknown estimable model parame-

ters.5Scalars λt,nare the accident rates in the normal-count state. We set

the first component of Xt,nto unity, and, therefore, the first component of β

is the intercept.

A two-state Markov switching model of accident frequencies is graphically

demonstrated in Figure 1. In the two states s = 0 and s = 1 shown in the

figure, the accident frequency data are generated by two different processes,

shown by the circles (for state s = 0) and the diamonds (for s = 1). In this

study, we assume that accident frequency is generated according to the zero-

accident distribution I(At,n) in state s = 0, and according to the standard

4Although Eq. (3) formally assumes st,n= 0 to be a zero-accident state, in which

accidents never happen, this state can be viewed as an approximation for a nearly

safe state, in which the average accident rate is negligible (λt,n≪ 1) and accidents

are extremely rare (over the considered time period).

5To ensure that α is non-negative, we estimate its logarithm instead of it.

5

Page 6

negative binomial distribution NB(At,n) in state s = 1 (these two distributions

are outlined by the boxes in Figure 1). The state variable st,nfollows a Markov

process over time, with transition probabilities p(n)

shown in Figure 1.

0→0, p(n)

0→1, p(n)

1→0and p(n)

1→1, as

If accident events are assumed to be independent, the likelihood function is

f(Y|Θ,M) =

T?

t=1

N

?

n=1

P(A)

t,n. (6)

Here, because the state variables st,nare unobservable, the vector of all es-

timable parameters Θ must include all states, in addition to all model param-

eters (β-s, α) and transition probabilities. Thus, Θ = [β′,α,p(1)

p(1)

length T × N and contains all state values.

0→1,...,p(N)

0→1,

1→0,...,p(N)

1→0,S′]′, where vector S = [(s1,1,...,sT,1),...,(s1,N,...,sT,N)]′has

Eqs. (1)-(6) define the two-state Markov switching negative binomial (MSNB)

model considered here. Note that in this model the estimable state variables

st,nexplicitly specify the states of all roadway segments n = 1,2,...,N during

all time periods t = 1,2,...,T.

In this study, in addition to the MSNB model, we also consider the standard

zero-inflated negative binomial (ZINB) models. In this case, the probability

of At,naccidents occurring is (Washington et al., 2003)

P(A)

t,n=qt,nI(At,n) + (1 − qt,n)NB(At,n),

1

1 + e−τ logλt,n,

1

1 + e−γ′Xt,n,

(7)

qt,n=

(8)

qt,n=(9)

where we use two different specifications for the probability qt,nthat the nth

roadway segment is in the zero-accident state during time period t. The right-

hand-side of Eq. (7) is a mixture of zero-accident distribution I(At,n) given by

Eq. (3) and negative binomial distribution NB(At,n) given by Eq. (4). Scalar

τ and vector γ are estimable model parameters. Accident rate λt,nis given

by Eq. (5). We call “ZINB-τ” the model specified by Eqs. (7) and (8). We

call “ZINB-γ” the model specified by Eqs. (7) and (9). Note that qt,ndepends

on the estimable model parameters and gives the probability of being in the

zero-accident state st,n= 0, but it is not an estimable parameter by itself and

does not explicitly specify the state value st,n.

6

Page 7

3Model estimation methods

Statistical estimation of Markov switching models is complicated by unobserv-

ability of the state variables st,n.6As a result, the traditional maximum likeli-

hood estimation (MLE) procedure is of very limited use for Markov switching

models. Instead, a Bayesian inference approach is used. Given a model M

with likelihood function f(Y|Θ,M), the Bayes formula is

f(Θ|Y,M) =f(Y,Θ|M)

f(Y|M)

=f(Y|Θ,M)π(Θ|M)

?f(Y,Θ|M)dΘ

.(10)

Here f(Θ|Y,M) is the posterior probability distribution of model parameters

Θ conditional on the observed data Y and model M. Function f(Y,Θ|M)

is the joint probability distribution of Y and Θ given model M. Function

f(Y|M) is the marginal likelihood function – the probability distribution of

data Y given model M. Function π(Θ|M) is the prior probability distribution

of parameters that reflects prior knowledge about Θ. The intuition behind

Eq. (10) is straightforward: given model M, the posterior distribution accounts

for both the observations Y and our prior knowledge of Θ.

In our study (and in most practical studies), the direct application of Eq. (10)

is not feasible because the parameter vector Θ contains too many components,

making integration over Θ in Eq. (10) extremely difficult. However, the poste-

rior distribution f(Θ|Y,M) in Eq. (10) is known up to its normalization con-

stant, f(Θ|Y,M) ∝ f(Y|Θ,M)π(Θ|M). As a result, we use Markov Chain

Monte Carlo (MCMC) simulations, which provide a convenient and practi-

cal computational methodology for sampling from a probability distribution

known up to a constant (the posterior distribution in our case). Given a large

enough posterior sample of parameter vector Θ, any posterior expectation and

variance can be found and Bayesian inference can be readily applied. A reader

interested in details is referred to Malyshkina (2008), where we comprehen-

sively describe our choice of the prior distribution π(Θ|M) and the MCMC

simulation algorithm.7We used MATLAB language for programming and

running the MCMC simulations.

For comparison of different models we use a formal Bayesian approach. Let

there be two models M1and M2with parameter vectors Θ1and Θ2respec-

tively. Assuming that we have equal preferences of these models, their prior

6Below we will have five time periods (T = 5) and 335 roadway segments (N =

335). In this case, there are 2TN= 21675possible combinations for value of vector

S = [(s1,1,...,sT,1),...,(s1,N,...,sT,N)]′.

7Our priors for α, β-s, p0→1and p1→0are flat or nearly flat, while the prior for

the states S reflects the Markov process property, specified by Eq. (1).

7

Page 8

probabilities are π(M1) = π(M2) = 1/2. In this case, the ratio of the models’

posterior probabilities, P(M1|Y) and P(M2|Y), is equal to the Bayes fac-

tor. The later is defined as the ratio of the models’ marginal likelihoods (see

Kass and Raftery, 1995). Thus, we have

P(M2|Y)

P(M1|Y)=f(M2,Y)/f(Y)

f(M1,Y)/f(Y)=f(Y|M2)π(M2)

f(Y|M1)π(M1)=f(Y|M2)

f(Y|M1), (11)

where f(M1,Y) and f(M2,Y) are the joint distributions of the models and

the data, f(Y) is the unconditional distribution of the data. As in Malyshkina et al.

(2009), to calculate the marginal likelihoods f(Y|M1) and f(Y|M2), we

use the harmonic mean formula f(Y|M)−1= E [f(Y|Θ,M)−1|Y], where

E(...|Y) means posterior expectation calculated by using the posterior dis-

tribution. If the ratio in Eq. (11) is larger than one, then model M2is favored,

if the ratio is less than one, then model M1is favored. An advantage of the

use of Bayes factors is that it has an inherent penalty for including too many

parameters in the model and guards against overfitting.

To evaluate the performance of model {M,Θ} in fitting the observed data Y,

we carry out a χ2goodness-of-fit test (Maher and Summersgill, 1996; Cowan,

1998; Wood, 2002; Press et al., 2007). We perform this test by Monte Carlo

simulations to find the distribution of the χ2quantity, which measures the dis-

crepancy between the observations and the model predictions (Cowan, 1998).

This distribution is then used to find the goodness-of-fit p-value, which is the

probability that χ2exceeds the observed value of χ2under the hypothesis that

the model is true (the observed value of χ2is calculated by using the observed

data Y). For additional details, please see Malyshkina (2008).

4Empirical results

Data are used from 5769 accidents that were observed on 335 interstate high-

way segments in Indiana in 1995-1999. We use annual time periods, t =

1,2,3,4,T = 5 in total.8Thus, for each roadway segment n = 1,2,...,N =

335 the state st,n can change every year. Four types of accident frequency

models are estimated:

(1) First, for the purpose of explanatory variable selection, we estimate an

auxiliary standard negative binomial (NB) model, which is not reported

here. We estimate this model by maximum likelihood estimation (MLE).

To obtain a standard NB model, we choose explanatory variables and

8We also considered quarterly time periods and obtained qualitatively similar re-

sults (not reported here).

8

Page 9

their dummies by using the Akaike Information Criterion (AIC)9and

the 5% statistical significance level for the two-tailed t-test (for details

on our variable selection methods, see Malyshkina, 2006). In order to

make a comparison of explanatory variable effects in different models

straightforward, in all other models, described below, we use only those

explanatory variables that enter the standard NB model.10

(2) We estimate the standard ZINB-τ model, specified by Eqs. (6)–(8). First,

we estimate this model by maximum likelihood estimation (MLE) and

use the 5% statistical significance level for evaluation of the statistical

significance of each β-parameter. Second, we estimate the same ZINB-τ

model by the Bayesian inference approach and MCMC simulations. As

one expects, the Bayesian-MCMC estimation results turned out to be

similar to the MLE estimation results for the ZINB-τ model.

(3) We estimate the standard ZINB-γ model, specified by Eqs. (6), (7) and (9).

First, we estimate this model by MLE and use the 5% statistical sig-

nificance level for evaluation of the statistical significance of each β-

parameter. Second, we estimate the same ZINB-γ model by the Bayesian

inference approach and MCMC simulations. The Bayesian-MCMC and

the MLE estimation results for the ZINB-γ model turned out to be sim-

ilar.

(4) We estimate the two-state Markov switching negative binomial (MSNB)

model, specified by Eqs. (1)-(6), by the Bayesian-MCMC methods. We

consecutively construct and use 60%, 85% and 95% Bayesian credible in-

tervals for evaluation of the statistical significance of each β-parameter

in the MSNB model. As a result, in the final MSNB model some com-

ponents of β are restricted to zero.11No restriction is imposed on the

over-dispersion parameter α, which turns out to be significant anyway.

The model estimation results for accident frequencies are given in Table 1.

Continuous model parameters, β-s and α, are given together with their 95%

confidence intervals (if MLE) or 95% credible intervals (if Bayesian-MCMC),

refer to the superscript and subscript numbers adjacent to parameter esti-

mates in Table 1.12Table 2 gives summary statistics of all roadway segment

9Minimization of AIC = 2K−2LL, were K is the number of free continuous model

parameters and LL is the log-likelihood, ensures an optimal choice of explanatory

variables in a model and avoids overfitting (Tsay, 2002; Washington et al., 2003).

10A formal Bayesian approach to model variable selection is based on evaluation

of model’s marginal likelihood and the Bayes factor (11). Unfortunately, because

MCMC simulations are computationally expensive, evaluation of marginal likeli-

hoods for a large number of trial models is not feasible in our study.

11A β-parameter is restricted to zero if it is statistically insignificant. A 1 − a

credible interval is chosen in such way that the posterior probabilities of being below

and above it are both equal to a/2 (we use significance levels a = 40%,15%,5%).

12Note that MLE assumes asymptotic normality of the estimates, resulting in con-

fidence intervals being symmetric around the means (a 95% confidence interval is

9

Page 10

characteristic variables Xt,n(except the intercept).

The estimation results show that the MSNB model is strongly favored by the

empirical data, as compared to the standard ZINB models. Indeed, from Ta-

ble 1 we see that the MSNB model provides considerable, 335.69 and 263.12,

improvements of the logarithm of the marginal likelihood of the data as com-

pared to the ZINB-τ and ZINB-γ models.13Thus, from Eq. (11), we find that,

given the accident data, the posterior probability of the MSNB model is larger

than the probabilities of the ZINB-τ and ZINB-γ models by e335.69and e263.12

respectively.14

Let us now consider the maximum likelihood estimation (MLE) of the standard

ZINB-τ and ZINB-γ models and an imaginary MLE estimation of the MSNB

model. Referring to Table 1, the MLE gave maximum log-likelihood values

−2502.67 and −2426.54 for the ZINB-τ and ZINB-γ models. The maximum

log-likelihood value observed during our MCMC simulations for the MSNB

model is equal to −2049.45. An imaginary MLE, at its convergence, would

give MSNB log-likelihood value that would be even larger than this observed

value. Therefore, the MSNB model, if estimated by the MLE, would provide

very large, at least 453.22 and 377.09, improvements in the maximum log-

likelihood value over the ZINB-τ and ZINB-γ models. These improvements

would come with no increase or a decrease in the number of free continuous

model parameters (β-s, α, τ, γ-s) that enter the likelihood function.

±1.96 standard deviations around the mean). In contrast, Bayesian estimation does

not require this assumption, and posterior distributions of parameters and Bayesian

credible intervals are usually non-symmetric.

13We use the harmonic mean formula to calculate the values and the 95% confidence

intervals of the log-marginal-likelihoods given in Table 1. The confidence intervals

are calculated by bootstrap simulations. For details, see Malyshkina et al. (2009) or

Malyshkina (2008).

14There are other frequently used model comparison criteria, for example, the de-

viance information criterion, DIC = 2E[D(Θ)|Y] − D(E[Θ|Y]), where deviance

D(Θ) ≡ −2ln[f(Y|Θ,M)] (Robert, 2001). Models with smaller DIC are favored to

models with larger DIC. We find DIC values 5037.3, 4891.4, 4261.5 for the ZINB-τ,

ZINB-γ and MSNB models respectively. This means that the MSNB model is fa-

vored over the standard ZINB models. However, DIC is theoretically based on the

assumption of asymptotic multivariate normality of the posterior distribution, in

which case DIC reduces to AIC (Spiegelhalter et al., 2002). As a result, we prefer

to rely on a mathematically rigorous and formal Bayes factor approach to model

selection, as given by Eq. (11).

10

Page 11

Table 1

Estimation results for models of accident frequency (the superscript and subscript numbers to the right of individual

parameter estimates are 95% confidence/credible intervals – see text for further explanation)

Variable

ZINB-τa

ZINB-γb

MSNBc

by MLE by MCMCby MLEby MCMCby MCMC

β- and α-parameters in Eq. (5)

Intercept (constant term)−15.0−12.5

−17.5

−15.2−13.0

−17.4

−11.6−8.32

−14.8

−11.6−8.29

−14.6

−17.3−13.0

−21.3

Accident occurring on interstates I-70 or I-164 (dummy)−.683−.570

−.797

−.685−.575

−.794

−.715−.602

−.829

−.715−.593

−.836

−.734−.617

−.850

Pavement quality index (PQI) averaged

−.0122−.0189

−.00550

−.0122−.00562

−.0188

.791.829

−.0140−.00627

−.0217

.929.978

−.0143−.00643

−.0221

.939.993

−.0163−.00850

−.0240

.887.929

Logarithm of road segment length (in miles).791.832

.751

.754

.880

.886

.845

Number of ramps on the viewing side per lane per mile.226.300

.153

.227.306

.149

.298.387

.209

.304.394

.214

.317.404

.230

Number of lanes on a roadway––––1.192.04

.386

Median configuration is depressed (dummy).184.288

.0795

.183.282

.0839

.201.319

.0820

.202.325

.0781

–

Median barrier presence (dummy)−1.43−1.22

−1.64

−1.43−1.14

−1.72

––−1.69−1.00

−2.46

Width of the interior shoulder is less that 5 feet (dummy).323.443

.202

.323.434

.211

.435.572

.297

.437.569

.307

.374.505

.243

Outside shoulder width (in feet)−.0480−.0196

−.0764

−.0478−.0207

−.0749

−.0532−.0176

−.0887

−.0532−.020

−.0867

−.0537−.0214

−.0862

Outside barrier is absent (dummy)––−.245−.117

−1.93−3.21

× 10−5

1.521.88

−.373

−.245−.101

−1.91−3.16

× 10−5

1.521.86

−.389

−.264−.124

−3.78−2.02

× 10−5

1.952.34

−.403

Average annual daily traffic (AADT)

−4.07−3.17

× 10−5

1.892.17

−4.97

−4.14−3.31

× 10−5

1.912.16

−5.04

−6.50

−5.83

−5.26

Logarithm of average annual daily traffic

1.611.671.151.15 1.49

Number of bridges per mile––––−.0214−.00164

−.0428

−.106−.0289

−.183

1.291.90

Maximum of reciprocal values of horizontal curve radii (in 1/mile)−.140−.0710

−.209

1.231.84

−.141−.0734

−.208

1.231.82

−.134−.0559

−.213

1.321.96

−.138−.0593

−.217

1.321.96

Percentage of single unit trucks (daily average)

.624

.646

.693

.691

.688

Number of changes per vertical profile along a roadway segment.0555.0930

.0180

.0562.0903

.0226

–––

Over-dispersion parameter α in NB models.144.183

.105

.150.192

.114

.130.168

.0925

.142.185

.105

.114.147

.0847

11

Page 12

Table 1

(Continued)

Variable

ZINB-τa

ZINB-γb

MSNB

by MLEby MCMCby MLEby MCMCby MCMCc

τ- and γ-parameters in Eqs. (8) and (9)

The model parameter τ in Eq. (8)−1.72−1.45

−2.00

−1.73−1.50

−1.98

–––

Intercept (constant term)––23.141.3

4.99

26.547.0

10.9

–

Logarithm of road segment length (in miles)––−1.34−.942

−1.73

−1.4−1.03

−1.83

4.165.20

10.517.4

× 10−5

−3.28−1.59

–

Median barrier presence (dummy)––3.974.86

9.2315.1

× 10−5

−2.88−.901

3.083.27

–

Average annual daily traffic (AADT)––

3.355.72

–

Logarithm of average annual daily traffic––

−4.86

−5.57

–

Mean accident rate (λt,n for NB), averaged over all values of Xt,n

–3.38–3.423.88

Standard deviation of accident rate (?

Total number of free model parameters (β-s, γ-s, α and τ)

λt,n(1 + αλt,n) for NB),

averaged over all values of explanatory variables Xt,n

–2.14–2.152.13

1616 19 19 16

Posterior average of the log-likelihood (LL)–−2510.68−2506.13

−2517.12

−−−2436.34−2431.12

−2443.54

−2124.82−2096.30

−2153.91

Max(LL):estimated max. value of log-likelihood (LL) for MLE;

maximum observed value of LL for Bayesian-MCMC

−2502.67

(MLE)

−2503.21

(observed)

−2519.90−2516.95

−2426.54

(MLE)

−2427.41

(observed)

−2447.33−2443.93

−2049.45

(observed)

−2184.21−2186.70

Logarithm of marginal likelihood of data (ln[f(Y|M)])–

−2521.59

–

−2448.86

−2169.56

Goodness-of-fit p-value–0.005–0.1770.191

Maximum of the potential scale reduction factors (PSRF)e

–1.01006–1.02200 1.02117

Multivariate potential scale reduction factor (MPSRF)e

–1.01023–1.023021.02189

aStandard (conventional) ZINB-τ model estimated by maximum likelihood estimation (MLE) and Markov Chain Monte Carlo (MCMC) simulations.

bStandard ZINB-γ model estimated by maximum likelihood estimation (MLE) and Markov Chain Monte Carlo (MCMC) simulations.

cTwo-state Markov switching negative binomial (MSNB) model where all reported parameters are for the normal-count state s = 1.

dThe pavement quality index (PQI) is a composite measure of overall pavement quality evaluated on a 0 to 100 scale.

ePSRF/MPSRF are calculated separately/jointly for all continuous model parameters. PSRF and MPSRF are close to 1 for converged MCMC chains.

12

Page 13

Table 2

Summary statistics of roadway segment characteristic variables

VariableMeanStandard deviationMinimumMedianMaximum

Accident occurring on interstates I-70 or I-164 (dummy).155.363001.00

Pavement quality index (PQI) averagea

88.65.9669.0 90.398.5

Logarithm of road segment length (in miles)−.9011.22−4.71−1.032.44

Number of ramps on the viewing side per lane per mile.138.408003.27

Number of lanes on a roadway2.09.2862.002.003.00

Median configuration is depressed (dummy).630.48401.001.00

Median barrier presence (dummy).161.368001

Width of the interior shoulder is less that 5 feet (dummy).696.46101.001.00

Outside shoulder width (in feet)11.31.746.2011.221.8

Outside barrier absence (dummy).830.37601.001.00

Average annual daily traffic (AADT)3.03 × 104

2.89 × 104

.944 × 104

1.65 × 104

14.3 × 104

Logarithm of average annual daily traffic10.0.6239.159.7111.9

Number of bridges per mile1.768.1400124

Maximum of reciprocal values of horizontal curve radii (in 1/mile).650.6320.5892.26

Percentage of single unit trucks (daily average).0859.0678.00975.0683.322

Number of changes per vertical profile along a roadway segment.522.908006.00

aThe pavement quality index (PQI) is a composite measure of overall pavement quality evaluated on a 0 to 100 scale.

13

Page 14

To evaluate the goodness-of-fit for a model, we use the posterior (or MLE)

estimates of all continuous model parameters (β-s, α, p(n)

104artificial data sets under the hypothesis that the model is true.15We find

the distribution of χ2and calculate the goodness-of-fit p-value for the observed

value of χ2. For details, see (Malyshkina et al., 2009). The resulting p-values

for our models are given in Table 1. For the ZINB-γ and MSNB models the

p-values are sufficiently large, around 20%, which indicates that these models

fit the data reasonably well. At the same time, for the ZINB-τ model the

goodness-of-fit p-value is only around 0.5%, which indicates a much poorer

fit.

0→1, p(n)

1→0) and generate

16

The estimation results also show that the over-dispersion parameter α is higher

for the ZINB-τ and ZINB-γ models, as compared to the MSNB model (refer

Table 1). This suggests that over-dispersed volatility of accident frequencies,

which is often observed in empirical data, could be in part due to the latent

switching between the states of roadway safety.

Now, refer to Figure 2, made for the case of the MSNB model. The four

plots in this figure show five-year time series of the posterior probabilities

P(st,n= 1|Y) of the normal-count state for four selected roadway segments.

These plots represent the following four categories of roadway segments:

(1) For roadway segments from the first category we have P(st,n= 1|Y) = 1

for all t = 1,2,3,4,5. Thus, we can say with absolute certainty that

these segments were always in the normal-count state st,n = 1 during

the considered five-year time interval. A roadway segment belongs to

this category if and only if it had at least one accident during each year

(t = 1,2,3,4,5). An example of such roadway segment is given in the

top-left plot in Figure 2. For this segment the posterior expectation of

the long-term unconditional probability ¯ p1of being in the normal-count

state is large, E(¯ p1|Y ) = 0.750.

(2) For roadway segments from the second category P(st,n= 1|Y) ≪ 1 for

all t = 1,2,3,4,5. Thus, we can say with high degree of certainty that

these segments were always in the zero-accident state st,n = 0 during

the considered five-year time interval. A roadway segment n belongs to

this category if it had no accidents observed over the five-year interval

despite the accident rates given by Eq. (5) were large, λt,n≫ 1 for all t =

1,2,3,4,5. Clearly this segment would be unlikely to have zero accidents

15Note that the state values S are generated by using p(n)

16It is worth to mention that for the auxiliary standard negative binomial (NB)

model, which we do not report here, the goodness-of-fit p-value was also very poor,

≈ 0.3%. This is an expected result because of a preponderance of zeros in the data,

not accounted for in the NB model.

0→1and p(n)

1→0.

14

Page 15

1995 1996 1997

Date

19981999

0

0.2

0.4

0.6

0.8

1

P(St=1|Y)

segment #1,E(p1|Y)=0.750

−

19951996 1997

Date

19981999

0

0.2

0.4

0.6

0.8

1

P(St=1|Y)

segment #54,E(p1|Y)=0.260

−

199519961997

Date

19981999

0

0.2

0.4

0.6

0.8

1

P(St=1|Y)

segment #274, E(p1|Y)=0.496

−

199519961997

Date

19981999

0

0.2

0.4

0.6

0.8

1

P(St=1|Y)

segment #37, E(p1|Y)=0.510

−

Fig. 2. Five-year time series of the posterior probabilities P(st,n = 1|Y) of the

normal-count state st,n= 1 for four selected roadway segments (t = 1,2,3,4,5).

observed, if it were not in the zero-accident state all the time.17An

example of such roadway segment is given in the top-right plot in Figure 2.

For this segment E(¯ p1|Y ) = 0.260 is small.

(3) For roadway segments from the third category P(st,n= 1|Y) is neither

one nor close to zero for all t = 1,2,3,4,5.18For these segments we

cannot determine with high certainty what states these segments were

in during years t = 1,2,3,4,5. A roadway segment n belongs to this

category if it had no any accidents observed over the considered five-

year time interval and the accident rates were not large, λt,n? 1 for all

t = 1,2,3,4,5. In fact, when λt,n≪ 1, the posterior probabilities of the

two states are close to one-half, P(st,n= 1|Y) ≈ P(st,n= 0|Y) ≈ 0.5,

and no inference about the value of the state variable st,ncan be made. In

this case of small accident frequencies, the observation of zero accidents

is perfectly consistent with both states st,n= 0 and st,n= 1. An example

17Note that the zero-accident state may exist due to under-reporting of minor,

low-severity accidents (Shankar et al., 1997).

18If there were no Markov switching, which introduces time-dependence of states

via Eqs. (1), then, assuming non-informative priors π(st,n= 0) = π(st,n= 1) = 1/2

for states st,n, the posterior probabilities P(st,n = 1|Y) would be either exactly

equal to 1 (when At,n > 0) or necessarily below 1/2 (when At,n = 0). In other

words, we would have P(st,n= 1|Y) / ∈ [0.5,1) for any t and n. Even with Markov

switching existent, in this study we have never found any P(st,n= 1|Y) close but

not equal to 1, refer to the top plot in Figure 3.

15

#### View other sources

#### Hide other sources

- Available from Fred L. Mannering · Jul 25, 2014
- Available from arxiv.org