Content uploaded by Giancarlos Parady

Author content

All content in this area was uploaded by Giancarlos Parady on Oct 05, 2015

Content may be subject to copyright.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 1

THE BUILT ENVIRONMENT-TRAVEL BEHAVIOR CONNECTION: A

PROPENSITY SCORE APPROACH UNDER A CONTINUOUS TREATMENT

REGIME

Giancarlos Troncoso Parady

(Corresponding Author)

Ph.D. Candidate

Department of Urban Engineering, Graduate School of Engineering

The University of Tokyo

7-3-1 Hongo, Bunkyo-Ku, Tokyo, Japan 113-8656

gtroncoso@ut.t.u-tokyo.ac.jp

+81-3-5841-6234

Kiyoshi Takami, Ph.D.

Assistant Professor

Department of Urban Engineering, Graduate School of Engineering

The University of Tokyo

7-3-1 Hongo, Bunkyo-Ku, Tokyo, Japan 113-8656

takami@ut.t.u-tokyo.ac.jp

Department of Urban Engineering, School of Engineering

+81-3-5841-6234

Noboru Harata, Ph.D.

Professor

Department of Urban Engineering, Graduate School of Engineering

The University of Tokyo

7-3-1 Hongo, Bunkyo-Ku, Tokyo, Japan 113-8656

nhara@ut.t.u-tokyo.ac.jp

+81-3-5841-6233

Word count: 5243 (Without counting tables and figures)

Number of tables: 3

Number of figures: 3

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 2

ABSTRACT

In recent years, the compact city concept has become a paradigm of sustainable urban

development under the premise that mixed use, high density cities can significantly reduce automobile

dependency and promote the use of alternative modes. This claim however hinges on the existence of

a true causal mechanism between the built environment and travel behavior. This study tackles the

causality problem using a propensity score approach, but differs from previous studies in that it relaxes

the binary treatment assumption (i.e. urban vs. suburban) and assumes a continuous treatment of

urbanization level, estimated as a latent variable. Methodologically, the propensity score stratification

method utilized is successful in mitigating residential self-selection bias on estimates of the effect of

the built environment on non-work trip frequency and traveled distance. Overestimation ranges for the

direct regression estimates against the estimates stratified on the propensity score range from 6% to

36%. Findings suggest the existence of a causal mode substitution mechanism from car to non-

motorized modes given positive increases in the latent score of urbanization level.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 3

1. INTRODUCTION

The connection between the built environment and travel behavior has been the object of

interest of a considerable number of studies in the past twenty years. As concepts such as Smart Growth,

Compact Cities and New Urbanism permeate the sustainability discourse, the validity of the argument

that high density, compact and mixed use cities might reduce car use and promote the use of alternative

modes hinges on the existence of a true causal mechanism between the built environment and travel

behavior. Of particular importance to the establishment of this causal mechanism is the issue of

residential self-selection, where individuals choose their residential location in part to meet their

transport preferences (1); in that sense, failure to control for self-selection might results in biased and

inconsistent estimators of the true effect of interest.

This study uses a propensity score approach to mitigate self-selection bias, yet differs from

previous studies in that it generalizes the binary treatment regime of urban vs. suburban neighborhoods.

Instead, a continuous treatment regime of urbanization level is assumed, thus accounting for more

variability in the built environment characteristics.

The rest of the paper is structured as follows. Section 2 discusses the residential self-selection

problem from a program evaluation perspective in the planning literature. Section 3 summarizes the

general characteristics of the study, while Section 4 details the properties of the propensity score and

elaborates on the propensity score generalization used in this study. Section 5 and 6 present estimation

results for the latent variable for urbanization level and the estimated treatment effects respectively,

followed by discussion of findings Section 7. Finally, Section 8 wraps up the general conclusions of the

presented analysis.

2. THE RESIDENTIAL SELF-SELECTION PROBLEM FROM A PROGRAM

EVALUATION PERSPECTIVE

The establishment of a causal relation between the built environment and travel behavior has

been a widely debated issue among researchers. Although a great deal of studies have found statistically

significant associations between built environment features and some dimensions of travel behavior,

establishing a causal relationship hinges on stronger conditions that might be hard to meet outside ideal

experimental conditions. Particularly, the non-random allocation of the treatment of interest (in this case,

of the built environment characteristics) might compromise the validity of results. In the context of the

built environment-travel behavior connection, non-random treatment assignment is most likely a result

of households self-selecting themselves into neighborhoods that match their transport preferences.

In the absence of true experimental studies, researchers have attempted to establish causality

through either natural experiment studies or cross-sectional studies via econometric mechanisms (The

review of all the existing approaches is beyond the scope of this article and have been documented

elsewhere in the literature. Interested readers are referred to a Cao et al. (2) for a detailed review); the

present article focuses though on the latter group, as given cost and other implementation reasons,

existing data is largely cross-sectional.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 4

In the program evaluation literature several methods have been developed to address the

causality problem in the presence of non-random treatment allocation, out of which two main

approaches are highlighted: sample selection models and propensity score models.

The first step for both approaches is to estimate the conditional probability of seeking

treatment given a vector of covariates; in the case of a binary treatment, , where z is a

binary treatment that takes value 1 when the individual is treated and zero otherwise, and X is a vector

of conditioning covariates. For binary treatments, probability is usually estimated via a binary probit or

logit model.

In sample selection models (Heckman’s sample selection), given an estimated treatment

probability, a sample correction coefficient is estimated and introduced in the regression of the outcome

variable of interest on all covariates to correct for selectivity bias (3). Using this approach, Zhou and

Kockelman (4), estimated household vehicle-miles-Traveled (VMT) given a binary treatment of urban

vs. suburban residential location and found that 90% of the difference in VMT was attributed to the

treatment itself and that self-selection accounted only for 10% of the observed traveled distances. A

similar study by Cao (5) estimated that self-selection accounted for 19% of the observed VMT.

Similar to the sample selection models, propensity score models depart from the estimation

of treatment probability. Propensity score matching consists in matching treated and untreated samples

given similar treatment probability scores, difference in outcomes between matched pairs are then

averaged and the average treatment effect (ATE) estimated (6). Alternatively, as Rosenbaum and Rubin

(6) show, the sample can be stratified given estimated treatment probability and outcomes of subgroups

compared. Using propensity score matching with different binary treatments (i.e. urban vs. suburban,

urban vs. exurban etc.), Cao et al. (7) found a positive association between vehicle miles drive and

distance from the city center, with the impact of self-selection ranging from 0.05% to 52%, depending

on the treatments considered. Concerning walking behavior, Boer et al. (8) estimated treatment effects

for several built environment features and found that higher levels of business diversity and four way

intersections were on average associated with more walking. Through propensity score stratification,

Cao (9) found that residents living in neo-traditional neighborhoods tend to walk more than their

suburban counterpart; furthermore, he found that failure to account for self-selection might result in

overestimating the effect of the built environment on walking frequency by 64% and 17% for utilitarian

trips and recreational trips respectively.

Although the studies discussed so far do evidence the potential of propensity score methods

to reduce selection bias, in most studies reviewed, the built environment is polarized to a binary

treatment (usually urban vs. suburban), in itself a rather strong assumption that ignores the spectrum of

variability in terms of how “urban” or how “suburban” a neighborhood might be. In other words, a

binary treatment considers all neighborhoods within each treatment class identical in its built

environment features, thus making the estimates insensitive to variations in neighborhood compositions.

As an aside, it is important to note that as extensions to the binary treatment approach, multi-

valued treatment approaches have been proposed by Lee (10), in the form of a multinomial logit-OLS

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 5

two stage model, and by Imbens (11) which can also be applied to ordinal multi-level treatments. While

these approaches are certainly promising improvements, to the authors’ best knowledge they have yet

to be operationalized into the transportation field.

3. GENERAL CHARACTERISTICS OF THE STUDY

In order to address the identified gap in the literature this analysis uses a propensity score

approach under a continuous treatment regime. In that sense, this study consists of two main parts, (i)

the estimation of a continuous treatment variable for urbanization level, and (ii) the estimation of its

effect on travel behavior. Furthermore, given a continuous treatment, and considering that under the

Gauss-Markov assumptions the OLS estimator is the best linear unbiased estimator, the performance of

the direct OLS estimates are tested against the propensity score estimates.

Data from an online survey conducted in the city of Hiroshima, Japan was used for the analysis.

The survey was conducted in March 2013 through Rakuten Research, a company affiliated to Rakuten

Market, the largest internet shopping site in Japan, with over 2.3 million monitors all over the country.

The sample size consisted of 600 individuals gathered through stratified random sampling from the

monitor list to match the population distribution of the 8 wards that compose the metropolitan area and

the overall age distribution in the city.

Certainly there are some issues regarding web surveys that might compromise the external

validity of results, particularly, the issue of coverage error, which stems from the exclusion of (i) people

who do not have access to internet and (ii) have enough digital literacy to adequately answer the

questionnaire (12). Regarding access to the internet, the Ministry of Internal Affairs and

Communications of Japan (MIC) estimated in their Communications Usage Trend Survey a penetration

rate of 79.1% for 2011, with a 90% penetration in the 13-49 years old cohort, and lower rates for the

60-64, 65-69, and 70-79 cohorts, with 73%, 60% and 42% diffusion rates respectively (13); Regarding

digital literacy, the same survey estimates that among internet users, 60% used the internet for online

purchases or trade of merchandise, although it noted a gap between users under 49 years old and users

above that threshold. This suggests that although there is a rather high internet diffusion rate, some

limitations do exist in terms of representativeness of the sample, especially for the older cohorts. At any

rate, it is important to note that even if the sample is not perfectly representative of the population of

interest, the main contribution of this study is of a methodological nature, hence the present data was

considered valid for the analysis in question.

Regarding data gathered in the survey, as illustrated in Table 1, information was gathered on

general characteristics of the household as well as individual characteristics of the respondent such as

car use habit, as measured by the response frequency index (14), an indicator variable for job location

in the city center, a set of indicator variables for the type of area where they grew up or spent most of

their childhood (i.e. large metropolitan area, suburbs of large metropolitan area, regional city, etc.) and

three measurements of individual attitudes and preferences estimated via confirmatory factor analysis

following Kitamura et al. (15).

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 6

General sample characteristics were compared against population characteristics from the 2010

national census to check the representativeness of the sample. Overall the sample mean values

approximate the census mean values in the evaluated criteria, however, women are slightly under-

represented in the sample by 4 percentage points on average, a larger margin was observed for the

women over 60 years old. Although on average online surveys are expected to be biased towards the

young, the average age in the sample is also approximately five years higher than the population average

for the city of Hiroshima. Sample households are also slightly larger, with sample average of 2.66

against the population average of 2.29. Finally, the largest difference from the population values comes

from household income, where consistent with findings from the literature, higher income households

tend to be over-represented in web-survey samples (12). Compared against the Private Income

Statistical Survey for 2011 (16), income groups under US$4,000/year are underrepresented by 44%,

while the rest of the income cohorts are overrepresented. Other variables of interest however, such as

number of vehicles in the household and ratio of home-owners in the sample did not exhibit any

significant differences from the population values.

The socio-demographic composition data of all the districts of Hiroshima city, used to estimate

the continuous urbanization level indicator (see Section 5) were gathered from the 2005 national census,

as the GIS data for the 2010 census was not available at the time of writing. Finally, geo-referenced

land use data was gathered from the TelPOINT pack geo-referenced phonebook, developed by ZENRIN.

The dependent variables considered in this study were non-work trip frequency by car, non-

work trip frequency by non-motorized modes and total travel distance for non-work trips. Respondents

were asked to state the number of home-based trips (excluding return trips) taken during an average

week by purpose and the most frequently used mode for that type of trip. Trip frequencies lower than

once a week were assumed as zero. Travel distances were calculated as a function of the stated trip time

given average speeds by mode. Although estimation errors are likely given the way travel distances are

calculated, when formulating the questionnaire, it was assumed that individuals were more likely to be

accurate in terms of travel time than in terms of actual travel distances.

4. PROPENSITY SCORE AND CAUSAL INFERENCE

The propensity score, defined as the conditional probability of treatment given observed

covariates, was proposed by Rosenbaum and Rubin (6) as a way to remove bias due to observed

covariates. By acting as a balancing score in a non-randomized treatment assignment context, the

propensity score makes inherently different groups comparable, the main advantage being the

possibility of balancing a potentially large set of covariates X using one single scalar function. In that

sense, given a binary treatment z, stratifying by the propensity score will balance X, so that conditional

on the propensity score function, the distribution of X is the same for treated and

untreated groups; that is, conditional on P(X), X and z are independent:

. (1)

The main assumption behind the propensity score approach is the strong ignorability of

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 7

treatment where given equation (1), treatment outcomes (r0,r1) are independent from treatment

assignment given P(X):

(2)

Given equations (1) and (2), the expected difference between outcomes given P(X) is the average

treatment effect (ATE) given P(X). Rosenbaum and Rubin (17) further show that a 5 strata sub-

classification of the propensity score might reduce over 90% of bias due to observed covariates.

The analysis presented in this article follows the generalization of the propensity score method

proposed by Imai and van Dyk (18) to allow for arbitrary treatment regimes TA. This Section draws

heavily from Imai and van Dyk in order to summarize the used method. Readers are referred to the

original article for a more in-depth explanation.

Following the proposed generalization approach, under a continuous treatment regime, the

distribution of treatment TA given a vector of covariates X, is modeled as , where the

propensity score function is Gaussian distributed and parameterized by

, and , thus the propensity score function is solely characterized by the scalar .

In practice,

is estimated through a linear regression of the treatment variable and all

covariates X, so that

, that is, the propensity score is uniquely characterized by the

conditional mean function of the regression.

Imai and Van Dyk (18), also demonstrated that equations (1) and (2) can be extended to show

that even for non-binary treatments, the propensity score serves as a balancing score:

, (3)

and that the distribution of the outcome given a potential treatment tP, Y(tP) is independent from

treatment assignment given P(X):

, (4)

for any , where is a set of potential treatment values. Thus, by averaging

over the distribution of P(X), the distribution of the outcome of interest can be obtained:

. (5)

This integration can then be approximated parametrically as stratified by the

propensity score θ, where parameterizes the distribution. Thus, the distribution of Y(tP) can be

approximated as the weighted average of the within strata outcome distribution:

(6)

Where

is the within strata estimate of unknown parameter in strata j, and Wj is the relative

weight of strata j. can then be estimated as

, (7)

where covariates X are included to control for variability of θ within strata. The average treatment effect

is then a function of

; in this case, the weighted treatment coefficient of the regression of the outcome

variable Y(tP) on tP and all covariates, where weights are given by the sample relative weight nj/N.

Imai and van Dyk (18) verified through simulation and empirical analysis that stratification

on the propensity score reduces bias of observed covariates by 16-95%, suggesting a superior

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 8

performance over the direct non-stratified treatment estimation.

Although the estimation might look cumbersome, in practice, the procedure is rather simple.

First, the propensity score function is estimated through an OLS regression of the treatment variable

on all covariates X, where the conditional mean function

, characterizes the

propensity score. Using our score estimate

, that is, the regression fitted values, the sample is then

stratified in approximately equal strata and the outcome variables of interest are then regressed against

the same covariates X within each strata. The average treatment effect is simply the weighted average

of the within-strata treatment coefficients.

5. URBANIZATION LEVEL AS A CONTINUOUS TREATMENT REGIME

The latent variable for urbanization level was estimated using confirmatory factor analysis

(CFA). Five indicators were specified to load on the latent variable urbanization level: population

density, average area of housing per person, ratio of households living in multifamily residences within

the district, ratio of renter households within the district and density of commercial facilities. Out of the

460 districts (Chōchōmoku in Japanese) that constitute the Hiroshima metropolitan area, the effective

sample size was of 400 districts. Districts with values for population or housing area equivalent to zero

were excluded from the sample as these areas are not inhabited. Industrial parks in the port area were

also excluded from the analysis. The model was estimated with MPlus 6 developed by Muthen &

Muthen, using the maximum likelihood estimator, which allows for calculation of goodness of fit

indices to evaluate the estimated factor solution. Goodness of fit acceptable thresholds were guided by

the values recommended by Hu and Bentler (19) as follows: Root mean square error of approximation

RMSEA (≤ 0.06), standardized root mean square residual SRMR (≤0.08), comparative fit index CFI

(≥0.95) and Tucker-Lewis index TLI (≥0.95).

Goodness of fit indices suggest an acceptable model fit. Chi square with two degrees of

freedom was 1.813, yielding a p-value of 0.40; the lack of significance of the p-value indicates that the

estimated variance-covariance matrix is not statistically different from the input matrix, thus suggesting

that the model adequately reproduces observed variations in the data. RMSEA was 0.000, with a CFit

value (probability RMSEA≥0.05) of 0.684. CFI and TLI were 1.000 and 1.001 respectively, while the

standardized root mean square residual (SRMR) was 0.008. No modification indices were above the

3.84 threshold. Completely standardized estimated parameters are illustrated in Figure 1.

All estimated parameters were statistically significant at the 1% level. Factor loading suggest

that all indicators are strongly related with the latent factor urbanization level (with explained variances

ranging from 0.52 to 0.85) exhibiting positive correlations with all indicators except for average housing

area per person, which exhibited a negative correlation, clearly illustrating the existing tradeoff between

accessibility and housing area that households face when deciding residential location.

Figure 2, illustrates the geographical distribution of urbanization in the city. Urbanized areas

are mostly concentrated in flat areas, while the hilly areas that surround the city are less urbanized, and

in most cases scantly populated. Furthermore, the monocentric nature of the city is evidenced in the

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 9

distribution of the highly urbanized areas, concentrated near the harbor and gradually dispersing away.

6. TREATMENT EFFECT ESTIMATION

As explained in Section 4, an estimate of the propensity score function

for the continuous

treatment variable urbanization level is estimated through an OLS regression. Covariates considered for

the estimation of the propensity score are summarized in Table 1. Covariate selection was based both

on findings from the literature as well as theoretical considerations. Furthermore, variables that differed

considerably from the population distribution, as discussed in Section 3 were also introduced as

covariates. Estimation results are presented in Table 2. R-squared of the final model was 0.37 suggesting

an acceptable model fit.

It is important to note that as a prediction model, the object of interest of this regression is not

the individual coefficients of each explanatory variable, but the scalar estimate

, that following the

balancing score assumption described in equation (3) balances all the covariates thought to affect

treatment allocation; thus warranting the inclusion in the final model of variables that although

theoretically significant might be rendered insignificant due to multicollinearity.

To verify the balancedness of covariates given the estimated propensity score

, as suggested

by Imai and Van Dyk (18) each covariate was regressed against the original treatment variable; the same

regressions were run a second time but this time conditioning on

. OLS was used for continuous

covariates while binary logit was used for dummy covariates. As Figure 3 illustrates, without controlling

for

, most covariates are strongly correlated with the treatment, but once conditioned on the propensity

score, this correlation is considerably reduced, evident in the drop of the t-statistics for each covariate.

Having verified that the estimate

balances observed covariates, the average treatment

effect of urbanization level on the outcome variables of interest were estimated, outcomes being non-

work car trip frequency, non-work non-motorized trip frequency and its respective traveled distances;

all variables were introduced in the models in log form. As Table 3 illustrates, the sample is stratified

on

into roughly equal sub-classes j and effect estimates are compared not only against the non-

stratified estimates but against the no covariates estimates as well. The sensitivity of estimates to

stratification is tested by estimating treatment effects given different j, thus, sample is stratified into

three, five and ten strata.

In general, all models support the hypothesis that higher urbanization levels have a negative

effect on car trip frequency and car traveled distances, and a positive effect on non-motorized trip

frequency and distances; however, the propensity score models, in particular those controlling for all

covariates yield smaller effect magnitudes, suggesting that direct estimations (both with and without

covariate control) tend overestimate the real effect of the built environment on trip frequency and

traveled distances for both modes.

In the case of car trip frequency, while the naïve estimate (no stratification, no covariates)

stands at -0.36, that is, a 36% reduction in car trips given a standardized unit increase in urbanization

level, stratified full model estimates (all covariates) range from -0.155 in the three strata estimate to

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 10

-0.128 in the ten strata estimate. In a similar manner, for the non-motorized trip frequency model, the

naïve estimate stands at 0.37, suggesting a 37% increase in walking and biking trips given a one unit

increase in the urbanization level index, while stratified full model effects range from 0.25 to 0.22, for

the three strata and the ten strata estimates respectively. Likewise, considerable reductions are observed

in the case of traveled distances.

When comparing the stratified full models against the non-stratified, all covariates models,

although the direct estimation does indeed reduces bias, stratifying on the propensity score does perform

better. When compared to the ten strata full models, direct regression (all covariates) estimates are

overestimated by 36% and 6% in the car and non-motorized trip frequency cases respectively, and 8%

and 20% in the cases of car and non-motorized travel distances.

Certainly, given the fact that both the true propensity score, as well as the true population

parameter for the treatment of interest are unknown, the misspecification issue is a non-trivial one. In

spite of including a diverse range of covariates into the estimation of the propensity score function, and

verifying the balance of covariates after conditioning on

, model misspecification is still possible.

Results suggest however, that propensity score models are rather robust to model misspecifications,

particularly as the number of strata j increases; that is, estimate ranges between no covariate models

and full covariate models for each stratification scheme get smaller as number of strata increases, being

the exception the non-motorized traveled distance models; this latter might be nevertheless, a result of

aggregation of walk trips and bicycle trips, although this assertion needs to be verified through further

analysis.

7. DISCUSSION OF FINDINGS

Regarding the implication of estimated results, empirical findings suggest the existence of a

mode substitution mechanism between car and non-motorized modes as a result of increases in

urbanization levels as measured by the estimated latent variable. Furthermore, high urbanization levels

were not only associated with less car trip but also shorter traveled distances as well as a higher non-

motorized trip frequency and longer total traveled distances. These findings support the arguments of

advocates of compact cities as ways to reduce car dependency and promote travel by alternative modes.

Methodologically, stratification on the propensity score was shown to reduce estimation bias when

compared to both the naïve and direct regression estimates, furthermore, treatment effect estimates are

rather robust to misspecifications, particularly as the number of strata get larger.

It is important to highlight yet again the importance of the strong ignorability assumption.

That is, the assumption that the distribution of treatment outcomes are independent from the distribution

of treatment assignment given the propensity score is crucial to unbiasedness of estimates; nevertheless,

in practice it is impossible to know how well does the estimated function approximates the true

population function; that being said, in order to estimate the propensity score function, variables largely

cited in the literature as relevant to residential location were introduced in the model; hence, it is

assumed that the estimated function is a good estimate of the true unknown function.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 11

In terms of the overall contribution to the field, the presented methodology helps overcome

some of the limitations of existing program evaluation approaches in the transportation literature,

particularly the binary treatment assumption which usually polarizes the built environment into two

extremes, either urban or suburban, disregarding the large variability in district characteristics within

cities. A continuous urbanization level treatment, as the one developed in this article allows thus for a

more precise understanding of the built environment effect on travel behavior at all levels of the

urbanization spectrum without the need to arbitrarily draw a defining line between “urban” and

“suburban” which binary treatment models might be highly sensitive to. Instead, the CFA estimation of

a latent variable score for urbanization level allows for the calculation of goodness of fit statistics to

evaluate the estimated solution and provides statistical support to the proposed index. Certainly,

methodologically there is still room for improvement. Subject to data availability, other better-fitting

model specifications are possible. Furthermore, the use of a regular spatial unit such as a rectangular

grid instead of the existing demarcations (e.g. census tracts, districts) might make results less prone to

modifiable areal unit problems.

8. CONCLUSIONS

This study evaluated the built environment-travel behavior connection using a propensity

score approach under a continuous treatment regime thus overcoming the limitations of constraining

variations in the built environment to a binary treatment. To do so a latent variable for urbanization

level was estimated and considered as the treatment of interest. A mode substitution effect from car to

non-motorized modes was observed given positive changes in the urbanization level. The implemented

propensity score stratification approach was also successful in mitigating self-selection bias. Compared

against ten strata weighted estimates, direct regression treatment effects are overestimated by 36% and

6% in the car trip frequency and non-motorized trip frequency models respectively, and 8% and 20%

in the cases of car traveled and non-motorized travel distances. Findings provide supporting evidence

regarding a causal effect of the built environment on some dimensions of travel behavior, namely trip

frequency and travel distances for car and non-motorized modes.

ACKNOWLEDGEMENTS

All spatial data used for the analysis presented in this article were provided by the Center for Spatial

Information Science of The University of Tokyo. CSIS joint research No.479.

This study was supported by JSPS KAKENHI Grant No. 23246091.

REFERENCES

(1) M. Boarnet and R. Crane. The influence of land use on travel behavior: specification and

estimation strategies. Transportation Research Part A, vol. 35, 2001, pp. 823-845.

(2) X. Cao, P. Mokhtarian and S. Handy. Examining the impacts of residential self-selection on travel

behavior. Transport Reviews, vol. 29, no. 3, 2009, pp. 359-39.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 12

(3) J. Heckman, Sample selection bias as a specification error. Econometrica, vol. 47, no. 1, 1979,

pp. 153-162.

(4) B. Zhou and K. Kockelman. Self-selection in home choice: Use of treatment effects in evaluation

the relationship between the built environment and travel behavior. In Transportation Research

Record: Journal of the Transportation Research Board, No. 2077, Transportation Research Board

of the National Academies, Washington DC, 2008, pp. 54-61.

(5) X. Cao. Disentangling the influence of neighborhood type and self-selection on driving behavior:

an application of sample selection model. Transportation, vol. 36, 2009, pp. 207-222.

(6) P. Rosenbaum and D. Rubin. The central role of the propensity score in observational studies for

causal effects. Biometrika, vol. 70, no. 1, 1983, pp. 41-55.

(7) X. Cao, Z. Yu and Y. Fan. Exploring the connections among residential location, self-selection,

and driving: Propensity score matching with multiple treatments. Transportation Research Part A,

vol. 44, 2010, pp. 797-805.

(8) R. Boer, Y. Zheng, A. Overton, G. Ridgeway and D. Cohen. Neighborhood design and walking

trips in ten U.S. metropolitan areas. American Journal of Preventive Medicine, vol. 32, no. 4, 2007,

pp. 298-304.

(9) X. Cao. Exploring causal effects of neighborhood type on walking behavior using stratification of

propensity score. Environment and Planning A, vol. 42, 2010, pp. 487-504.

(10) L.-F. Lee. Generalized econometric models with selectivity. Econometrica, vol. 51, no. 2, 1983,

pp. 507-512.

(11) G. Imbens. The role of the propensity score in estimating dose-response functions. Biometrika, vol.

87, 2000, pp. 706-710.

(12) M. P. Couper. Web surveys: A review of issues and approaches. The Public Opinion Quarterly,

vol. 64, no. 4, 2000, pp. 464-494.

(13) Ministry of Internal Affairs and Communications (MIC). Communications usage trend survey in

2011 compiled. Tokyo, Japan, 2012.

(14) B. Lanken, H. Aarts, A. van Knippenberg and C. van Knippenberg. Attitude versus general habit:

Antecedents of travel mode choice. Journal of Applied Social Psychology, vol. 24, no. 11, 1994,

pp. 285-300.

(15) R. Kitamura, P. Mokhtarian and L. Laidet. A micro-analysis of land use and travel in five

neighborhoods in the San Francisco Bay Area. Transportation, vol. 24, 1997, pp. 125-158.

(16) National Tax Agency. Heisei 23 nenbun minkan kyuuyo jittai chousa (Private income statistical

survey for 2011). Tokyo, Japan, 2012.

(17) P. Rosenbaum and D. Rubin. Reducing bias in observational studies using subclassification on the

propensity score. Journal of the American Statistical Association, vol. 79, no. 387, 1984, pp. 516-

524.

(18) K. Imai and D. A. van Dyk. Causal inference with general treatment regimes: Generalizing the

propensity score. Journal of the American Statistical Association, vol. 99, no. 467, 2004, pp. 854-

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 13

866.

(19) L. Hu and P. Bentler. Cutoff criteria for fit indexes in covariance structure analysis: Conventional

criteria versus new alternatives. Structural Equation Modeling, vol. 6, 1999, pp. 1-55.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 14

List of table titles and figure captions

TABLE 1. Descriptive Statistics of Covariates

FIGURE 1. Path Diagram of “Urbanization Level” Latent Variable Estimation

FIGURE 2. Urbanization Level Map of Hiroshima City.

TABLE 2. Propensity Score OLS Estimation Results

FIGURE 3. Standard Normal Quantile Plots of t-Statistics

TABLE 3. Estimation Results of Causal Effects of Urbanization Level on Travel Behavior

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 15

TABLE 1. Descriptive Statistics of Covariates

Variable

Mean

Std.Dev.

Min.

Max

Household Characteristics

Household size

2.667

1.231

1

7

Single household

0.173

0.379

0

1

Pre-school children

0.127

0.333

0

1

Members over 65 years in household

0.205

0.404

0

1

Owner of detached house

0.198

0.399

0

1

Owner of apartment

0.302

0.459

0

1

Number of drivers in household

2.420

0.892

1

5

Income (<US$ 40,000)

0.326

0.469

0

1

Income (US$ 40,001-60,000)

0.283

0.451

0

1

Income (US$ 60,001-80,000)

0.164

0.371

0

1

Income (US$ 80,001-100,000)

0.117

0.322

0

1

Income (>US$ 100,001)

0.108

0.211

0

1

Number of bikes in household

2.393

1.200

1

9

Number of cars in household

1.162

0.759

0

5

Individual Characteristics

Gender

0.522

0.500

0

1

Age

46.835

14.487

20

79

Worker

0.532

0.499

0

1

Car habit (Response frequency index)

5.745

3.214

0

10

Grew up in large metropolitan area (Tokyo, Osaka,etc.)

0.015

0.122

0

1

Grew up in the suburbs of large metropolitan area

0.047

0.211

0

1

Grew up in a regional city (Hiroshima, Fukuoka, etc.)

0.287

0.453

0

1

Grew up in the suburbs of a regional city

0.393

0.489

0

1

Grew up in a small city

0.143

0.351

0

1

Grew up in a village

0.065

0.247

0

1

Grew up in the remote countryside

0.050

0.218

0

1

Attitudes: Car lover*

0.000

1.000

-2.912

1.632

Attitudes: Pro-transit and non-motorized modes*

0.000

1.000

-4.971

2.206

Attitudes: Suburban*

0.000

1.000

-2069

3.026

Works in city center (Naka, Minami and Higashi wards)

0.412

0.493

0

1

Dependent Variables**

Non-work car trip frequency

2.4366

3.3815

0

23

Non-work non-motorized trip frequency

3.4066

3.941

0

20

Non-work car traveled distances

15.583

25.224

0

228

Non-work non-motorized traveled distances

6.073

10.067

0

92.93

*3 factor CFA solution using maximum likelihood estimator. Full estimation results available upon request.

**Dependent variables were log-transformed for model estimations.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 16

FIGURE 1. Path Diagram of “Urbanization Level” Latent Variable Estimation

URBANIZATION

LEVEL

Ratio of

Households in

Multifamily

Residences

0.281

0.906

(0.821)

Log of

Average m2

per Person in

HH

0.781

-0.922

(0.850)

Ratio of

Renter

Households

0.342

0.841

(0.707)

Log of

Density of

Commercial

Facilities

0.139

0.844

(0.712)

Log of

Population

Density

0.327

0.727

(0.529)

Chi-Square test of model fit (d.f.) 1.813 (2); p-value: 0.4040; RMSEA (C.I. 90%) : 0.000 (0.000, 0.096)

Probability RMSEA ≤.05 : 0.684; CFI: 1.000; TLI: 1.001; SRMR: 0.004

Value in parenthesis is total explained variance by the factor.

All parameter estimates are significant at the p < 0.01 level, except “Log of Average m2per Person in

Household” which is significant at the p < 0.05 level.

0.429 0.159 -0.204

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 17

FIGURE 2. Urbanization Level Map of Hiroshima City.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 18

TABLE 2. Propensity Score OLS Estimation Results

Variable

Coefficient

S.E.

t stat.

P value

Constant

0.854

0.188

4.546

0.000

Age

0.000

0.003

-0.144

0.886

Male

-0.120

0.058

-2.053

0.040

Household size

-0.109

0.033

-3.276

0.001

Single household

0.065

0.097

0.665

0.506

Children in preschool /elementary school

0.125

0.075

1.680

0.093

Members over 65 in HH

-0.163

0.092

-1.768

0.077

Number of cars

-0.105

0.050

-2.071

0.038

Number of bicycles

0.085

0.022

3.945

0.000

Middle income (US$40,000~100,000)

0.051

0.067

0.750

0.453

High income (>US$100,001)

0.190

0.096

1.971

0.049

Car habit

-0.049

0.012

-4.260

0.000

Grew up in large city or regional city

-0.089

0.087

-1.017

0.309

Grew up in large or regional city suburbs

-0.126

0.083

-1.522

0.128

Grew up in the country side or village

-0.349

0.114

-3.065

0.002

House owner

-0.285

0.062

-4.610

0.000

Car lover

0.002

0.032

0.052

0.958

Pro-transit and non-motorized modes

0.036

0.031

1.167

0.243

Suburban preference

-0.140

0.029

-4.873

0.000

Job located in city center

0.268

0.059

4.538

0.000

Standard deviation

0.78

Sum of squares

195.500

Number of observations

517

Standard error of e

0.627

Parameters

20

R2

0.370

Degrees of freedom

497

Adjusted R2

0.346

F[ 19, 497] (prob)

15.38 (0.00)

Variables in bold are significant at least at the p<0.10.

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 19

FIGURE 3. Standard Normal Quantile Plots of t-Statistics

-12.0

-8.0

-4.0

0.0

4.0

8.0

-2.0 -1.0 0.0 1.0 2.0

Quantiles of t-statistics

Standard normal quantiles (Without controlling for θ)

-12.0

-8.0

-4.0

0.0

4.0

8.0

-2.0 -1.0 0.0 1.0 2.0

Standar normal quantiles (After controlling for θ)

Giancarlos Troncoso Parady, Kiyoshi Takami, Noboru Harata 20

TABLE 3. Estimation Results of Causal Effects of Urbanization Level on Travel Behavior

Models

Covariates

No stratification

3 Strata

5 Strata

10 Strata

Car trip frequency

No covariates

-0.362

-0.204

-0.182

-0.148

(0.042)

(0.094)

(0.121)

(0.053)

All covariates

-0.174

-0.155

-0.146

-0.128

(0.048)

(0.084)

(0.110)

(0.050)

NMM trip frequency

No covariates

0.376

0.289

0.248

0.199

(0.046)

(0.109)

(0.149)

(0.064)

All covariates

0.234

0.255

0.214

0.220

(0.059)

(0.100)

(0.129)

(0.059)

Car traveled distance

No covariates

-0.753

-0.444

-0.388

-0.342

(0.067)

(0.155)

(0.198)

(0.087)

All covariates

-0.366

-0.361

-0.329

-0.339

(0.076)

(0.141)

(0.180)

(0.084)

NMM traveled distance

No covariates

0.436

0.322

0.283

0.317

(0.056)

(0.130)

(0.175)

(0.075)

All covariates

0.243

0.256

0.216

0.201

(0.071)

(0.119)

(0.145)

(0.070)

Variance for weighted coefficients (in parenthesis) was calculated as:

, where Wj is the weight

of each strata j, where

.