Zero-state Markov switching count-data models: An empirical assessment

School of Civil Engineering, 550 Stadium Mall Drive, Purdue University, West Lafayette, IN 47907, USA.
Accident; analysis and prevention (Impact Factor: 1.65). 01/2010; 42(1):122-30. DOI: 10.1016/j.aap.2009.07.012
Source: PubMed

ABSTRACT In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Time-constant assumptions in discrete-response heterogeneity models can often be violated. To address this, a time-varying heterogeneity approach to model unobserved heterogeneity in ordered response data is considered. A Markov switching random parameters structure (which accounts for heterogeneity across observations) is proposed to accommodate both time-varying and time-constant (cross-sectional) unobserved heterogeneity in an ordered discrete-response probability model. A data augmented Markov Chain Monte Carlo algorithm for non-linear model estimation is developed to facilitate model estimation. The performance of the cross-sectional heterogeneity model and time-varying heterogeneity model are examined with vehicle crash-injury severity data. The time-varying heterogeneity model (Markov switching random parameters ordered probit) is found to provide the best overall model fit. Two roadway safety states are shown to exist and roadway segments transition between these two states according to Markov transition probabilities. The results demonstrate considerable promise for Markov switching models in a wide variety of applications.
    Transportation Research Part B Methodological 09/2014; 67:109–128. DOI:10.1016/j.trb.2014.04.007 · 2.94 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In crash frequency studies, correlated multivariate data are often obtained for each roadway entity longitudinally. The multivariate models would be a potential useful method for analysis, since they can account for the correlation among the specific crash types. However, one issue that arises with this correlated multivariate data is the number of zero counts increases as crash counts have many categories. This paper describes a multivariate zero-inflated Poisson (MZIP) regression model as an alternative methodology for modeling multivariate crash count data by severity. The Bayesian method is employed to estimate the model parameters. Using this Bayesian MZIP model, we can take into account correlations that exist among different severity levels. Our new method also can cope with excess zeros in the data, which is a common phenomenon found in practice. The proposed model is applied to the multivariate crash counts obtained from intersections in Tennessee for five years. The results reveal that, compared to the univariate ZIP models and multivariate Poisson-lognormal (MVPLN) models, the MZIP models provide the best statistic fit and have the smallest estimation bias. Apart from the improvement in goodness of fit, the results of the MZIP models show promise toward the goal of obtaining more accurate estimates by accounting for excess zeros in correlated count data.
    Safety Science 12/2014; 70:63–69. DOI:10.1016/j.ssci.2014.05.006 · 1.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of highway-crash data has long been used as a basis for influencing highway and vehicle designs, as well as directing and implementing a wide variety of regulatory policies aimed at improving safety. And, over time there has been a steady improvement in statistical methodologies that have enabled safety researchers to extract more information from crash databases to guide a wide array of safety design and policy improvements. In spite of the progress made over the years, important methodological barriers remain in the statistical analysis of crash data and this, along with the availability of many new data sources, present safety researchers with formidable future challenges, but also exciting future opportunities. This paper provides guidance in defining these challenges and opportunities by first reviewing the evolution of methodological applications and available data in highway-accident research. Based on this review, fruitful directions for future methodological developments are identified and the role that new data sources will play in defining these directions is discussed. It is shown that new methodologies that address complex issues relating to unobserved heterogeneity, endogeneity, risk compensation, spatial and temporal correlations, and more, have the potential to significantly expand our understanding of the many factors that affect the likelihood and severity (in terms of personal injury) of highway crashes. This in turn can lead to more effective safety countermeasures that can substantially reduce highway-related injuries and fatalities.
    01/2013; DOI:10.1016/j.amar.2013.09.001

Full-text (2 Sources)

Available from
Jul 25, 2014