Application of the dynamic spatial ordered probit model:
ABSTRACT The evolution of land development in urban area has been of great interest to policy-makers and planners. Due to the complexity of the land development process, no existing studies are considered sophisticated enough. This research uses the dynamic spatial ordered probit (DSOP) model to analyse Austin's land use intensity patterns over a 4-point panel. The observational units are 300 m × 300 m grid cells derived from satellite images. The sample contains 2,771 such grid cells, spread among 57 zip code regions. The marginal effects of control variables suggest that increases in travel times to central business district (CBD) substantially reduce land development intensity. More important, temporal and spatial autocorrelation effects are significantly positive, showing the superiority of the DSOP model. The derived parameters are used to predict future land development patterns, along with associated uncertainty in each grid cell's prediction. Copyright (c) 2009 the author(s). Journal compilation (c) 2009 RSAI.
APPLICATION OF THE DYNAMIC SPATIAL ORDERED PROBIT MODEL:
PATTERNS OF OZONE CONCENTRATION IN AUSTIN, TEXAS
Xiaokun (Cara) Wang
Department of Civil and Environmental Engineering
Lewisburg, PA 17837, USA
Kara M. Kockelman
Associate Professor & William J. Murray Jr. Fellow
Department of Civil, Architectural and Environmental Engineering
The University of Texas at Austin
6.9 ECJ, Austin, TX 78712-1076
To be presented at the 88th Annual Meeting of the Transportation Research Board and under
review for publication in Transportation Research Record
While a wide variety of transportation data sets involve discrete values scattered across space
and time, few techniques presently exist to properly analyze such data. A new dynamic spatial
ordered probit model (DSOP) is described here, and its use is demonstrated for a case of ozone
concentration categories. Using outputs of photochemical models for the Austin, Texas region
over a 24-hour period, the model parameters were estimated using Bayesian techniques, and
results illuminate key relationships, many of which are intuitive but generally obscured by
complex upstream model systems. Relying on 132 4 km x 4 km surface grid cells as
observational units, values are found to exhibit strong patterns of temporal autocorrelation, but
appear strikingly random in a spatial context (after controlling for local land cover, transportation,
and temperature conditions). While transportation and land cover conditions appear to influence
ozone levels, their effects are not as instantaneous, nor as practically significant as the impact of
temperature. The DSOP model proposed here is able to accommodate the unusual dynamics and
spatial evolution of ordered response categories inherent in the ozone data.
KEY WORDS: spatial autocorrelation, dynamic model, ordered probit, Bayesian estimation,
In the study of urban systems, many variables of interest are discrete and ordered in nature.
Many also exhibit temporal and spatial dependencies. For example, pavement surface
deterioration levels, air pollutant concentration classes, and standard-of-living indices are often
described using ordered categories. Such variables also are influenced by various site-specific
factors subject to spatial and temporal autocorrelation (across observations in space and time).
To understand such phenomena and quantify the effects of influential factors, rigorous statistical
methods are needed.
Over the years, various studies have been attempted to recognize spatial and temporal
autocorrelations in data analysis. For tackling spatial autocorrelation, two major methods are
spatial filtering (e.g., Nelson and Hellerstein , Wear and Bolstad , and Munroe et al.
) and specification of a spatial autoregressive (SAR) process (e.g., Anselin , Anselin
and Bera , and Anselin ). For recognizing temporal autocorrelation, time series
analysis is widely accepted as a reliable approach. However, few have considered the effects of
such autocorrelations in discrete response data analysis. The limited set of published studies in
this area focus on binary choice settings, and none recognizes both temporal and spatial
For these reasons, the objective of this study is to illustrate the specification and applicability of
the dynamic spatial ordered probit (DSOP) model, a new and powerful approach to spatial data
analysis with temporal autocorrelation, as illustrated here using data on ozone concentration
levels. The following section motivates this topic, for the case of air quality.
The Importance of Ozone
As a gas in the stratosphere that protects Earth from harmful ultraviolet rays, the ozone layer
shields living things. In the troposphere, however, ozone is a powerful oxidizer, harming lung
tissue and other materials. Under the National Ambient Air Quality Standards (NAAQS), all
Metropolitan Statistical Areas (MSAs) in the United States are required to develop strategies for
attaining the standards and accommodate future growth. Thus, planners and policy makers must
understand the spatial distribution of air pollutants, like ozone. Currently most studies on ozone
concentration projection are based on the modeling of photochemical process. Though such an
approach is more insightful, compared to statistical modeling, it is not very convenient for
sensitivity analysis, and is not very flexible for adding new variables of interest. In contrast, a
rigorous statistical model can be expected to facilitate the understanding of different factors’
impacts on ozone concentration more conveniently.
Ozone concentration is usually expressed as a continuous value. For example, the California one-
hour ozone standard is set at 0.09 parts per million (ppm) and the eight-hour average ozone
standard is 0.070 ppm (BAAQMD, 2005). The U.S. standard was recently reduced to 0.075 ppm
(EPA, 2008), and many regions around the U.S. are very anxious to avoid non-attainment status.
Many continuous variables are often made categorical, in order to convey key information more
directly to policy makers and the public. This is common in the case of air quality forecast
reports for public consumption, which are often indexed as low, moderate and potentially
dangerous concentrations. (See, for example, Athanasiadis et al., 2007.)
Of course, many factors can and do influence ozone concentration levels through complex
chemical and physical processes. For example, Niemeier et al. (2006) found that for most regions
in the Northern Hemisphere, road traffic intensity is closely associated with local ozone
concentrations. They surmised that, if traffic-related emissions per capita in south Asia hit U.S.
levels, that continent’s surface ozone concentrations would increase by 50 to 100%. Wang et al.
(2005) concluded that transportation sources are the main contributor to ozone concentrations,
averaging roughly twice the effect of industrial emissions. Friedman et al. (2001) studied
changes in commuting behaviors during the1996 Summer Olympic Games in Atlanta and noted
how decreased traffic densities were associated with a prolonged reduction in ozone pollution.
Land coverage development and intensity are also important determinants. And, of course, even
if the land is not developed for human use, its features need to be classified for calculation of
biogenic emissions. These are naturally occurring emissions from vegetation, which can be a
strong function of tree type. For example, live oak trees are high emitters of isoprene, a highly
reactive, volatile organic compound (VOC) that is a precursor to ozone. In areas such as eastern
Texas, where this species is common, biogenic emissions of VOCs dominate the area’s
emissions inventory (Wiedinmyer, 1999). Another reason for requiring such land coverage
information is the calculation of dry deposition rates. Dry deposition refers to the accumulation
of particles and gases as they come into contact with soil, water or vegetation on the earth's
surfaces. Allen (2002) suggests that during ozone season in Texas, dry deposition is the most
important physical removal mechanism for air pollutants. Dry deposition rates for specific
pollutants are typically computed according to land cover type. McDonald-Buller et al. (2001)
investigated the sensitivity of dry deposition and ozone mixing ratios as a function of land cover
classification and noted the importance of establishing accurate, internally consistent land cover
data for air quality modeling. Thus, changes to both developed and undeveloped land cover type
can significantly alter the magnitude spatial distribution of ozone.
Of course, many other factors also play a role. For example, Guldmann and Kim (2001) suggest
that, in addition to land development and transportation characteristics, pollution measurements,
meteorological factors and socioeconomic data can and do influence ozone concentrations. Loibl
et al. (1994) show how relative altitude and time of day are influential. Pont and Fontan (2001)
suggest that though local reduction in traffic is important, advection1 of ozone is also critical to
Obviously, ignoring any of these relevant factors introduces uncertainty in model estimation and
prediction. Such variables, if unobserved, can generate both temporal and spatial autocorrelations
in model error terms. For example, meteorological factors (such as local wind speeds, rainfall,
relative humidity, and temperature), precursors of ozone, and pollution control policies all
exhibit positive temporal and spatial dependencies (see, for example, Lin, 2007, and Hancock,
1994). Therefore, it is reasonable to incorporate temporally and spatially lagged term and
neighborhood effects in model specification.
In summary, ozone concentration levels are related to numerous factors. Among them,
transportation conditions and land use/land cover information appear critical for urbanized
region. A statistically rigorous analysis of ozone concentration categories can be achieved via
application of an ordered discrete choice model with a temporal lagged item and spatial
autocorrelation in error terms. The following sections describe key features of such a model, and
its application to the case of data from Austin, Texas.
1 Advection refers to the transport of something from one region to another. Ozone’s advection is predominantly
horizontal, following weather system patterns (Noguchi et al., 2006)
MODEL SPECIFICATION AND ESTIMATION
Wang (2007) and Wang and Kockelman (2008a) explain the dynamic ordered probit (DSOP)
model’s specification and estimation process in detail; and Wang and Kockelman (2008b) use
the DSOP model to analyze land development intensity levels over time (for purposes of
anticipating land use change). This section simply summarizes the specification, to show how the
model incorporates spatial, temporal and discrete features of the dataset.
The model starts with specification of the latent (unobserved) response variable Ukt , where the
subscript indicates individual i (i=1,…,N) in period t (t=1, 2, …, T). Each individual is observed
T times, making the total number of observations as NT. Each latent variable Uit is a function of
the unobserved variable from previous period Uit-1, and other explanatory variables Xit. Therefore,
the specification is as follows:
it ititi it
X β β
whereλ is the temporal autocorrelation coefficient and Xit is a
variables. β is the set of corresponding parameters. The remaining (uncontrolled/latent)
information is composed of two parts: θi which captures the individual-specific random
components for individual i, and time-variant individual effect εit which is allowed to be
heteroscedastic with variance υi .
Furthermore, θi values can exhibit spatial autorcorrelation, so that
Q× vector of explanatory
the spatial coefficient, and ui (which is iid normal with zero mean and variance σ2) stands for the
part of individual specific effect that is not influenced by its neighbors. The vector of regional
effects is thus a function of the weight matrix W , with wij as its elements.
The observed response variable, yikt, is a censored form of the unobserved response variable:
=<< = …
w is an exogenous indicator of contiguity (1 for contiguous and 0 otherwise), ρt is
if for 1
ysUs , ,S
That is, the possible outcomes have potential integer values between 1 to S , which are
determined by the value of latent variable Uit and the unknown boundaries
0 γ =−∞ and
S γ =+∞).
In the ordered probit setting, the likelihood function can be easily derived as follows:
δ δ γ⋅γ
where ( )
is an indicator function that equals 1 when event A is true and 0 otherwise.
As explained in Wang (2007) and Wang and Kockelman (2008a), Bayesian MCMC methods are
used to estimate all unknown parameters, providing valuable distribution information for all
estimators (rather than simply means and standard deviations, as in the case of classical
This model recognizes regional effects, spatial heterogeneity, spatial autocorrelation, and
temporal autocorrelation in a latent setting with ordered categorical responses. The general
framework can reduce to several simpler specifications, for cases of special interest − such as
when the dataset exhibits no temporal autocorrelation (i.e., individuals’ current responses do not
rely on prior states) and responses are homoskedastic.
Due to a somewhat limited sample size and no obvious arguments for heteroskedastic tendencies
across cell ozone levels, a single variance is used (υi = υ).
Ozone concentration levels were derived from continuous values originally prepared for an EPA
project, and provided by Dr. Elena McDonald-Buller at the University of Texas at Austin
(CAPCO et al., 2004). Using the ENVIRON’s ® CAMx photochemical model, many emissions
inventories and a variety of behavioral assumptions, the researchers developed hourly ozone
concentration estimates for a high-ozone episode, using meterological data for the September 13-
20, 1999 period.
In the Capital Area Planning Council (CAPCO) study, there are three levels of spatial resolution,
and the finest is 4 km. This resolution area covers a 360 km x 432 km area (i.e., 90 x 108 grid
cells) and includes all major urban centers within southern Texas and the Texas Gulf Coast. In
this study, hourly data for just one day (September 13, 1999) was selected, and available
transportation and land cover information (derived by Wang 2007) limited the scope to the
Austin region, a 44 km x 48 km study area containing 132 4 km x 4 km grid cells. Thus, the
resulting dataset is a 132 (N) × 24 (T) panel with values indicating ozone concentrations in parts
per million (ppm).
The rule for defining ozone concentration levels should be flexible and adaptable to the user’s
needs. In addition, every category needs to contain enough observations so that each is well
represented. Here, the values were categorized into 5 groups: values below 0.035 are assigned
Level 1, values between 0.035 and 0.04 are Level 2, those between 0.04 and 0.45 are Level 3,
those between 0.045 and 0.05 are Level 4, and those above 0.05 are categorized as Level 5.2
Figure 1 illustrates the continuous ozone concentration values and their corresponding levels
using data between 4 and 5pm on September 13, 1999 as an example. Table 1 shows the
changing trend of ozone concentration levels during the 24 hours: the levels are higher during
daytime, especially in the afternoon, and lowest at night and in the early morning.
Austin’s neighborhoods’ temperature information comes from the same EPA project datasets,
provided by Dr. McDonald-Buller. Table 2 illustrates the distribution and changes in
temperatures over the 132 cells and 24 hours.
As noted earlier, local traffic and land use/land cover conditions may influence local ozone
concentrations. Ideally, traffic counts and VMT by hour by cell would be available for use. Such
variables were not readily available (by time of day or all network links), so the total length of
street centerlines (per grid cell) was used as a proxy for local VMT levels.
Land cover type influences ozone concentration because it contributes to both ozone generation
(biogenic or anthropogenic) and deposition. Residential, commercial, transportation and
industrial land (i.e., “developed” lands) may be categorized together, since they mainly
2 While the current non-attainment threshold is 0.08 ppm, the sample data do not contain such high concentrations.