Content uploaded by Yang Yang
Author content
All content in this area was uploaded by Yang Yang on Apr 02, 2015
Content may be subject to copyright.
1
MODELING TOURISTS’ LENGTH OF STAY: DOES ONE MODEL FIT ALL?
YANG YANG* AND HONG-LEI ZHANG†
* School of Tourism and Hospitality Management, Temple University,
Philadelphia, PA 19122, United States
Phone: (01) 215-204-8701; Fax: (01) 215-204-8705
Email: yangy@temple.edu
†Department of Land Resources and Tourism Sciences, Nanjing University,
Nanjing, 210093, China
E-mail: zhanghonglei@nju.edu.cn
Corresponding Author: Dr. Hong-Lei Zhang
Acknowledgement: This research was financially supported by National Natural Science Foundation of
China (No. 41301134) and Grants from Ministry of Education in China Project of Humanities and Social
Sciences (No. 13YJC790193).
Please cite as:
Yang, Y. and Zhang, H-L. (2015). Modeling tourists’ length of stay: Does one model fit all? Tourism
Analysis, 20(1), 13-23.
2
MODELING TOURISTS’ LENGTH OF STAY: DOES ONE MODEL FIT ALL?
Abstract: Examining the individual heterogeneity of tourists is fundamental to providing
insights on tourist market segmentation, targeting potential markets and niches, and
proposing marketing strategy. However, most past studies failed to incorporate this individual
heterogeneity in an integral way. This study utilizes a latent class duration model to
investigate the latent segments of tourists regarding the preference of length of stay (LOS) in
a destination. The study unveils a substantial amount of latent heterogeneity across the
sample, and our empirical results identify two latent classes of tourists, namely,
short-duration and long-duration tourists. These classes share distinct LOS preferences, and
information sources and travel partners have no significant influences in predicting the LOS
of short-duration tourists. Therefore, the “one-fit-all” solution from the conventional duration
model could be misleading, and this highlighted heterogeneity provides destination marketing
organizations (DMOs) with the incentive to segment the tourists and offer specific tourism
products and bundles.
Keywords: length of stay; latent class; duration model; individual heterogeneity
1. Introduction
Length of stay (LOS) is an essential index for measuring the level of tourists’ consumption
and an important indicator for monitoring the growth of a tourism area (Archer & Shea,
1975). A large body of literature has been devoted to modeling LOS data of tourists (Alegre,
Mateo, & Pou, 2011; Barros & Machado, 2010; Gokovali, Bahar, & Kozak, 2007; Thrane,
2012; Yang, Wong, & Zhang, 2011). The analysis and investigation of the tourists’ LOS
preferences is important for several reasons. First, by understanding the determinants of the
LOS, tourism administrative units and organizations can allocate the necessary resources to
cater to tourists’ needs more efficiently. Second, the LOS modeling is beneficial for market
segmentation. Destination marketing organizations (DMOs) can therefore target more
specific marketing efforts toward a particular market and design featured tourism products
and services that increase market share. Third, private tourism sectors can increase revenue
3
after understanding factors associated with a longer LOS, as projected by the empirical model,
such as discrete choice model (Alegre & Pou, 2006, 2007; Yang, et al., 2011) and duration
model (Barros, Correia, & Crouch, 2008; Martínez-Garcia & Raya, 2008). Finally, by
targeting tourists with a predicted higher LOS at a destination, neighboring tourist
destinations could benefit from spillover effects due to the increased possibility of tourists
undertaking multi-destination tours to nearby areas (Yang & Wong, 2012).
Different econometric methods have been used to model tourists’ LOS. By assuming different
data generation processes of LOS, there are generally four types of models adopted in
tourism research: linear regression model (Fleischer & Pizam, 2002), count data model
(Alegre, et al., 2011; Salmasi, Celidoni, & Procidano, 2012), discrete choice model (Alegre &
Pou, 2006, 2007; Yang, et al., 2011), and duration model (survival analysis) (Barros, et al.,
2008; Barros & Machado, 2010; Martínez-Garcia & Raya, 2008). Among these four, the
duration model has been heavily applied over the last decade, and it is able to formulate a
survival function and a hazard function as well as incorporate censored observations and
time-varying covariates (Gokovali, et al., 2007).
In the conventional LOS model, it is assumed that sampled tourists are homogeneous, and a
single model is good enough to explain their duration within a particular destination.
However, understanding the individual heterogeneity of tourists is fundamental to providing
insights on market segmentation of tourists, targeting potential markets, and proposing
tourism destination marketing strategies. As suggested by Allenby and Rossi (1998), one of
4
the greatest challenges in marketing is to understand the diversity of consumers’ preferences
and provide differentiated products to market segments and niches with distinct preferences.
In the past literature, the individual heterogeneity of customers could be captured either by
assuming a continuous variation or a discrete variation across different customers. In
particular, as a method to incorporate the discrete individual heterogeneity, the latent class
methodology assumes that a sample of observations arises from a number of underlying
classes and regards the overall sample as a mixture of different segments (Wedel & DeSarbo.,
1994).
Although the latent class modeling strategy has been applied in tourism studies (Alegre, et al.,
2011; Mazanec & Strasser, 2007; Wu, Zhang, & Fujiwara, 2011), to the best of our
knowledge, no known research has utilized the latent class model to look into tourists’ LOS
under the framework of the duration model (survival analysis). Unlike the conventional
model, which estimates a single set of coefficients across all observations, a latent class
duration model is able to unveil the individual heterogeneity by estimating different sets of
regression coefficients for different segments simultaneously. More importantly, the latent
class model provides a formal statistical procedure to identify latent segments, allowing us to
recognize and characterize various preference groups. A global and universal duration model
might mask the individual heterogeneity of tourists and provide misleading results, especially
when consumer preferences and sensitivities become more diverse (Allenby & Rossi, 1998).
Therefore, this global model may provide misleading results for marketing implications. To
fill this research gap, the paper applies the latent class duration model to identify the number
5
of classes with homogeneous LOS preferences and tourists’ memberships to each class.
Hence, the study represents the first application of the latent class duration model in tourists’
LOS studies and sheds light on the segmentation of tourists with regard to their LOS
preferences.
This paper is organized as follows: Section 2 reviews the literature that applies various
duration models in modeling tourists’ LOS data. Section 3 details the specifications of the
models and describes the data set used in this study. In Section 4, the empirical results of the
models are presented and explained. Section 5 concludes the paper with implications.
2. Modeling Length of Stay of Tourists
LOS data can be modeled under a range of data generation mechanisms. Four major types of
micro-econometric models have been applied for modeling tourists’ LOS; these are linear
regression model (Fleischer & Pizam, 2002), count data model (Alegre, et al., 2011), discrete
choice model (Alegre & Pou, 2006, 2007), and duration model (Barros, et al., 2008; Barros &
Machado, 2010; Martínez-Garcia & Raya, 2008). By regarding LOS as continuous data, a
linear regression model assumes a normality of LOS distribution and a simple linear
relationship between the expectation of LOS and explanatory variables. Because LOS can be
treated as a number of days or months, the count data model, which assumes a specific
discrete distribution of the dependent variable as a count number, is an inherent alternative to
model tourists’ LOS (Hellström, 2006; Hellström & Nordström, 2008). Through treating all
possible duration outcomes within a choice set, a discrete choice model explains tourists’
decision among a set of durations. Finally, in a duration model, the time it takes before
6
tourists leave a destination is usually specified as the dependent variable (Barros, et al., 2008;
Gokovali, et al., 2007; Gomes de Menezes, Moniz, & Cabral Vieira, 2008; Martínez-Garcia
& Raya, 2008; Thrane, 2012). The duration model is particularly powerful in modeling
duration data, as the LOS data are always skewed, censored, or truncated (Cleves, Gould, &
Gutierrez, 2003).
In micro-econometrics and bio-statistics, duration models are specifically used to model the
time elapsed before the occurrence of certain events. Generally, there are two
parameterizations of the duration model: the proportional hazards (PH) model and the
accelerated failure-time (AFT) model. In the PH-metric model, the hazard rate is the
dependent variable, which denotes the possibility that a tourist will leave the destination in
the next infinitesimal time period conditional on the fact that the tourist has already stayed
there beyond that particular moment. In contrast, the AFT-metric model treats the logarithm
of LOS as a dependent variable and asserts an interest on possible factors associated with it.
To estimate duration models, both parametric and semi-parametric methods have been
proposed. In the parametric model, one can specify a particular distribution for the hazard and
related functions, such as the Weibull model and the Gamma model (De Menezes & Moniz,
2011), whereas in the semi-parametric model, such as the Cox model, prior information on
the baseline distribution is not necessary (De Menezes, Moniz, & Cabral Vieira, 2008).
Some specific types of duration models are very similar to the other three types of LOS
models. For the AFT-metric model that assumes a log-normal distribution of LOS, it is
7
statistically equivalent to the linear regression on the logarithm of LOS. The AFT-metric
model is estimated by maximum likelihood estimation (MLE), which is asymptotically
equivalent to ordinary least squares. Moreover, some semi-parametric duration models can be
estimated under the count data model framework (Hilbe, 2011) or the ordered discrete choice
model framework (Greene & Hensher, 2010).
Considering the individual heterogeneity of tourists, various duration models with
unobservable heterogeneity have been introduced in LOS studies by specifying a prior
distribution of the individual effect (frailty) and estimating the combined model with a
mixture distribution (Barros, Butler, & Correia, 2010; Barros, et al., 2008). Barros, et al.
(2010) argued that overlooking this heterogeneity results in inconsistent estimates in the
duration model. In general, the application of the frailty duration model improves the overall
goodness-of-fit and provides information on the extent of the heterogeneity over the sampled
tourists. However, this modeling strategy restricts heterogeneity to model intercepts, and
further investigation on the heterogeneity of slope coefficients is more intriguing.
A major debate over the use of duration models centers on the applicability of hazard
function in the context of tourists’ LOS decision making. Thrane (2012) argued that because
tourists generally determine their durations before their trips, the hazard rate should be
constant over the duration, and the analysis based on the hazard rate tends to be meaningless.
However, in various AFT-metric models, it is not necessary to use such concepts as hazard
rate and survival function to understand the model and interpret the results.
8
3. Methodology and Data
3.1. Model specification
Because the interpretation of the estimated coefficients is more intuitive in AFT-metric
duration models (Cleves, et al., 2003), we focus on various AFT-metric duration models. By
assuming different distributions of error terms, we estimated a series of AFT-metric models
with latent classes, including the exponential model, the Weibull model, the log-normal
model, the log-logistic model, and the gamma model. After that, we estimated a probit model
to explain the membership of individual tourists to each identified latent class.
A general AFT-metric duration model is specified as follows:
ln i i i
yu
x
(1)
where i indexes the observation;
i
y
is the LOS of observation i;
i
x
is a vector of explanatory
variables; and
is a vector of coefficients. In particular,
i
u
denotes the error term of the
model. The difference between the AFT-metric duration model and the typical regression
model is that the error term of the former is not necessarily normal and involves one or more
shape parameters. If the error term follows a normal distribution with a fixed variance, the
model is labeled as the log-normal model, and if it follows a logistic distribution, the model
becomes a log-logistical model. Moreover, in the exponential model, the error term follows a
standard Gumbel (extreme value) distribution, whereas in the Weibull model, the error term
follows a Gumbel distribution with a particular shape parameter.
9
In this study, the latent class modeling strategy is introduced to unveil the potential
heterogeneity in factors determining tourists’ LOS. The regression coefficients in the model
are specified to be the same for observations within the same latent class while varying across
difference classes. Equation 1 then becomes
( ) ( )
ln j
i j i i j
y
x
(2)
where j indexes the latent class, j = 1, …, J, and
is varying across different classes.
Therefore, the factors are assumed to contribute to tourists’ LOS for different classes in a
different way. The latent class model specifies the density of the dependent variable, lny, as a
linear combination of J different densities. To estimate the proposed latent class duration
model, the total density becomes
11
(ln | , , ) (ln | , ), 0 1, =1
JJ
j j j j j
i i i i i i i
jj
f y f y
xx
(3)
where i indexes the observation and j indexes the latent class, j = 1, …, J.
(ln | , )
jj
ii
fy
x
is the density of jth class(component), and
j
i
is the probability of being jth class for
observation i. To determine the empirical optimal value of J (the total number of classes) in
the latent class model, a common way is to compare the information criteria associated with
different J values from MLE (Bhatnagar & Ghose, 2004; Clark, Etilé, Postel-Vinay, Senik, &
Van der Straeten, 2005). However, according to Swait and Sweeney (2000), the selection of
models should also be based on judgment, experience, and statistical considerations. In
general, lower values of information criteria measures characterize optimal solutions. In this
paper, we report four of these, namely, the Akaike Information Criterion (AIC), the ‘finite
sample’ version of AIC (FSAIC), the Bayes Information Criterion (BIC), and the Hannan and
Quinn Information Criterion (HQIC). They are specified as follows:
10
2ln 2AIC L K
(4)
2 ( 1)
1
KK
FSAIC AIC NK
(5)
2ln lnBIC L K N
(6)
2ln 2 ln(ln )HQIC L K N
(7)
where lnL is the log likelihood value, K is the number of parameters, and N is the sample
size.
In past literature, several types of factors were found to determine tourists’ LOS. The most
used factors are tourists’ social-demographic variables, such as age, income, level of
education, and frequency of travel (Barros, et al., 2010; Barros & Machado, 2010; Gokovali,
et al., 2007; Martínez-Garcia & Raya, 2008). Trip-characteristic-related factors are another
important group of determinants, including distance to destination, motivation, party size,
package tour, transport, daily cost, information source, activities participated, accommodation
type, and past visit (Barros, et al., 2008; Barros & Machado, 2010; Machado, 2010).
According to the previous literature, we set the length of stay of a tourist to be a function of
tourists’ age, motivation, past visit to the destination, travel distance from home, education
level, number of attractions visited, information source, accommodation type, whether the
tourist comes from the same city as the destination city, and type of travel partners.
3.2. Data description
To estimate the proposed model, we use the data from a province-wide domestic tourist
survey in Jiangsu Province of China. Jiangsu, located in the Yangtze River Delta, is one of the
11
most developed regions in China, with a GDP per capita of 7,945 USD in 2010. Seeking to
enjoy the high-quality infrastructure and exceptional tourist attractions, 350 million tourists
visited Jiangsu in 2010, bringing in a total revenue of 468.5 billion RMB Yuan. The Jiangsu
domestic tourist survey is conducted by the Jiangsu Tourism Administration, and the
questionnaires are distributed to domestic tourists at scenic spots and hotels around 13 cities
in Jiangsu. In this survey, various questions cover individual social-demographic information,
trip characteristics, and trip satisfaction. As far as we are concerned, this survey is one of the
most comprehensive domestic tourist surveys in China, considering its sample size, the scope
and variety of questions, and the heterogeneity of surveyed tourists. The variables we are
interested in are described in Table 1. The table demonstrates that in the overall sample of
27,709 observations, 62% of tourists were aged between 25 and 44 years, and 42% visited the
cities with the purpose of sightseeing. In terms of traveling partners, 34% tourists traveled
with friends and relatives, and 29% traveled alone. A further examination of the correlation
matrix of these independent variables indicates that most pairwise Pearson correlation
coefficients are below 0.3. Only two exceed 0.5, and they are the correlation coefficient
between info1 and partner3 (0.546) and between info3 and partner1 (0.557). This suggests
that tourists traveling with colleagues usually obtain tourism information from their work
affiliation, whereas those organized by travel agencies resort to those agencies to collect the
information related to their tours.
(Please place Table 1 about here)
In particular, we investigate the distribution of the LOS. As shown in Figure 1, the data are
12
heavily left skewed, as more than 70% of tourists choose two or three day stays in Jiangsu
destinations, and very few tourists spend more than seven days.
(Please place Figure 1 about here)
4. Empirical Results
At the outset, we have to choose the underlying distribution of the duration model and the
number of latent classes. Table 2 presents the values of information criteria for different
models. Different information criteria measures, such as AIC, FSAIC, BIC, and HQIC, are
particularly useful in comparing different latent class model solutions based on their model fit
and parsimony (Magidson & Vermunt, 2004). To select the best-fit underlying distribution of
the duration model, we compare the measures of different duration models. The results
suggest that the log-logistical model consistently outperforms others, no matter how many
latent classes are specified. Moreover, we compare these information criteria measures to
determine the optimal number of latent classes. Table 2 shows that adding a third or fourth
class does not decrease the information criteria measures. Therefore, as highlighted by the
lowest values of AIC, FSAIC, BIC, and HQIC, a log-logistical model with two latent classes
is selected, and among all alternatives, this model offers the optimal balance between
goodness-of-fit, parsimony, and explanatory power. Figure 2 demonstrates the distribution of
LOS within each latent class. For latent class 1, LOS ranges from one to six days, and all
one-day tourists belong to this class. For latent class 2, LOS ranges from two to twenty-one
days, and the average LOS is much larger than latent class 1. Therefore, we labeled latent
class 1 as “short-duration tourists” and latent class 2 as “long-duration tourists”. The result of
13
these two unveiled latent classes is similar to the findings from Alegre, et al. (2011), who also
recognized two latent classes with different durations based on count data models.
(Please place Table 2 about here)
(Please place Figure 2 about here)
Table 3 presents the estimation results for these two latent classes. In the first column for
latent class 1, short-duration tourists, several variables are estimated to be statistically
significant. The effect of the explanatory variable xj in the AFT-metric duration model is to
change the LOS by a factor of exp(xjβj). The estimated coefficient of age4, which is -0.032,
indicates that tourists aged 65 years and above stay for 3.1% less time than tourists aged
between 24 and 44 years, which is set as the reference category. In terms of motivations, the
results indicate that business tourists have the longest LOS, followed by those with other
purposes, then followed by sightseers. motivation4 is estimated to be 0.358, suggesting that
business tourists tend to stay 43.0% longer than tourists with a vacation motivation (reference
category) in latent class 1. The negative and statistically significant coefficient of motivation3
notes that VFR tourists have the shortest stay in latent class 1. This result is contradictory to
findings from previous literature (Hsu & Kang, 2007; Sung, Morrison, Hong, & O’Leary,
2001), highlighting the different LOS preferences of this tourist segment. Moreover, distance,
attraction, and hnr are estimated to be statistically significant and positive, and these results
indicate that tourists who traveled longer to the destination, visited more attractions, and
14
stayed in hotels are more likely to stay longer. However, several other explanatory variables
are found to be insignificant for latent class 1, such as pastvisit, educate, and samecity, as
well as grouped variables of information sources and travel partners.
(Please place Table 3 about here)
Column 2 in Table 3 provides the estimation results for latent class 2, long-duration tourists.
These estimates are substantially different from those for latent class 1, suggesting the
noticeable heterogeneity in LOS preference between short-duration and long-duration tourists.
First, in a set of age dummies, age3 is statistically significant and estimated to be -0.016,
implying that tourists aged between 45 and 64 years stay for 1.59% less time than those aged
between 25 and 44 years. Second, although all motivation variables are significant as in the
latent class 1 estimates, the estimates of these variables flipped the sign in latent class 2. The
results show that sightseeing, business, and other-motivation tourists are likely to stay shorter
than vacationing tourists, whereas VFR tourists stay longer. motivation4 is estimated to be
-0.027, suggesting that VFR tourists tend to stay for 2.7% less time than tourists with a
vacation motivation in latent class 2. Third, the grouped variables of information sources and
travel partners become statistically significant for latent class 2. We find that tourists
collecting information from friends and relatives, affiliations, and other sources have a longer
LOS than those obtaining information from travel agencies; furthermore, tourists traveling
alone have a shorter LOS than those traveling with colleagues. Finally, variables pastvisit,
educate and samecity are estimated to be statistically significant only in the latent class 2
15
model, suggesting that tourists with more past visits, higher levels of education, and
residency in the destination city are associated with a longer LOS for long-duration tourists.
We further fit a probit model to understand the membership of the two latent classes. Column
3 in Table 3 presents these estimates, and the dependent variable is whether the observation
belongs to latent class 2. The results show that, compared to latent class 1, latent class 2
consists of fewer aged tourists, fewer frequent travelers to the destination, more long-haul
tourists, and more well-educated tourists. In terms of motivation, there are more VFR,
business, and other-motivation tourists than vacationing tourists, but there are fewer
sightseeing tourists in latent class 2. Moreover, for those belonging to latent class 2, fewer
tourists collect information from work affiliation or media, and fewer travel with family and
friends or alone.
We also fit an ordinary log-logistic model without considering latent heterogeneity. Keep in
mind that this log-logistic model can also be regarded as a frailty model and it is a Weibull
model with an exponential heterogeneity term. Column 4 in Table 3 presents the estimation
results of this model. The results are quite different from the estimates of either latent class
(Columns 1 and 2 in Table 3). For example, in the global model, VFR tourists (motivation3)
are found to have the longest duration, which is consistent with the estimate of latent class 2,
whereas business and other-motivation tourists (motivation4 and motivation5) are found to
stay longer than vacationing tourists, which are similar to the estimates of latent class 1.
Therefore, our results suggest that a model that does not consider latent heterogeneity masks
16
the substantial individual heterogeneity, especially the heterogeneity of slope coefficients.
5. Conclusion
To account for latent heterogeneity in the tourists’ LOS model, we employed a latent class
modeling strategy that allows for multiple segments varying in model estimates. In the
estimated model, slope coefficients are different across segments. We highlighted a
substantial amount of latent heterogeneity across the sample, and our empirical results
unveiled two latent classes of tourists, namely, short-duration and long-duration tourists.
They share distinct LOS preferences. The estimation results suggested that information
source and travel partners have no significant influences in predicting the LOS of
short-duration tourists, whereas for long-duration tourists, those who obtain information from
friends and relatives, work affiliations, and other sources and who do not travel alone are
likely to stay longer.
We observed significant differences regarding the LOS preferences of tourists. Therefore, the
“one-fit-all” solution from the global duration model could be misleading. Tourists differ in
determining their LOS, and this diversity provides DMOs with the incentive to segment the
tourists and offer specific tourism products. Therefore, a well-rounded understanding and
analysis of individual heterogeneity in LOS modeling enables practitioners to identify the
proper segments and consider product differentiation to maximize revenue. To increase this
revenue, specific schedules and activities should be provided to cater to the needs of these
tourist segments.
17
Our results show that long-haul tourists are more likely to stay longer. Hence, we recommend
the improvement of visitor information centers in major transportation hubs where long-haul
travelers can be found, such as airports and railway stations. Furthermore, as suggested by the
results of latent class duration model, to encourage tourists to stay longer in Jiangsu, DMOs
should increase marketing efforts targeting hotel guests for the short duration segment (latent
class 1) and VFR tourists for the long-duration segment (latent class 2). To further lengthen
the stay of long-duration VFR tourists, feasible undertakings include offering all-inclusive
package discounts for groups, providing multi-day tickets of major attractions with little
additional charge, and giving free tickets to local residents when traveling with a group of
friends or relatives. Moreover, our results suggested that keeping a high level of satisfaction
is important. As highlighted in the latent class 2 estimates, those tourists obtaining
information from relatives and friends are more likely to stay longer, and the word-of-mouth
effect heavily relies on past visitors’ satisfaction (Aktaş, Çevirgen, & Toker, 2010). Therefore,
it is important to consistently improve service quality to guarantee a high level of satisfaction.
In this paper, we did not correct for possible sample selection (Barros & Machado, 2010) or
consider simultaneous decision making between LOS and other trip characteristics (Machado,
2010). Moreover, our research is based on data from domestic tourists in China, which cover
a relatively narrow range of LOS values and might be distinct from data from Western
tourists in terms of duration patterns. Therefore, we believe that future studies should
incorporate more sophisticated duration models with latent classes and apply this modeling
framework with other LOS datasets.
18
References
Aktaş, A., Çevirgen, A., & Toker, B. (2010). Tourists' satisfaction and behavioral intentions on destination
attributes: An empirical study in Alanya. Tourism Analysis, 15(2), 243-252.
Alegre, J., Mateo, S., & Pou, L. (2011). A latent class approach to tourists' length of stay. Tourism Management,
32(3), 555-563.
Alegre, J., & Pou, L. (2006). The length of stay in the demand for tourism. Tourism Management, 27(6),
1343-1355.
Alegre, J., & Pou, L. (2007). Microeconomic determinants of the duration of stay of tourists. In Á. Matias, P.
Nijkamp & P. Neto (Eds.), Advances in Modern Tourism Research (pp. 181-206). Heidelberg:
Physica-Verlag.
Allenby, G. M., & Rossi, P. E. (1998). Marketing models of consumer heterogeneity. Journal of Econometrics,
89(1–2), 57-78.
Archer, B. H., & Shea, S. (1975). Length of stay problems in tourist research. Journal of Travel Research, 13(3),
8-10.
Barros, C. P., Butler, R., & Correia, A. (2010). The length of stay of golf tourism: A survival analysis. Tourism
Management, 31(1), 13-21.
Barros, C. P., Correia, A., & Crouch, G. (2008). Determinants of the length of stay in Latin American tourism
destinations. Tourism Analysis, 13(4), 329-340.
Barros, C. P., & Machado, L. P. (2010). The length of stay in tourism. Annals of Tourism Research, 37(3),
692-706.
Bhatnagar, A., & Ghose, S. (2004). A latent class segmentation analysis of e-shoppers. Journal of Business
Research, 57(7), 758-767.
19
Clark, A., Etilé, F., Postel-Vinay, F., Senik, C., & Van der Straeten, K. (2005). Heterogeneity in reported
well-being: Evidence from twelve European countries. The Economic Journal, 115(502), C118-C132.
Cleves, M., Gould, W., & Gutierrez, R. (2003). An Introduction to Survival Analysis Using Stata (Revised ed.).
College Station, TX: Stata Press.
De Menezes, A. G., & Moniz, A. (2011). Determinants of length of stay: A parametric survival analysis. Tourism
Analysis, 16(5), 509-524.
De Menezes, A. G., Moniz, A., & Cabral Vieira, J. (2008). The determinants of length of stay of tourists in the
Azores. Tourism Economics, 14(1), 205-222.
Fleischer, A., & Pizam, A. (2002). Tourism constraints among Israeli seniors. Annals of Tourism Research, 29(1),
106-123.
Gokovali, U., Bahar, O., & Kozak, M. (2007). Determinants of length of stay: A practical use of survival
analysis. Tourism Management, 28(3), 736-746.
Gomes de Menezes, A., Moniz, A., & Cabral Vieira, J. (2008). The determinants of length of stay of tourists in
the Azores. Tourism Economics, 14(1), 205-222.
Greene, W. H., & Hensher, D. A. (2010). Modeling Ordered Choices: A Primer. Cambridge: Cambridge
University Press.
Hellström, J. (2006). A bivariate count data model for household tourism demand. Journal of Applied
Econometrics, 21(2), 213-226.
Hellström, J., & Nordström, J. (2008). A count data model with endogenous household specific censoring: The
number of nights to stay. Empirical Economics, 35(1), 179-192.
Hilbe, J. (2011). Negative Binomial Regression (2nd ed.). Cambridge; New York: Cambridge University Press.
Hsu, C. H. C., & Kang, S. K. (2007). CHAID-based segmentation: International visitors' trip characteristics and
20
perceptions. Journal of Travel Research, 46(2), 207-216.
Machado, L. P. (2010). Does destination image influence the length of stay in a tourism destination? Tourism
Economics, 16(2), 443-456.
Magidson, J., & Vermunt, J. K. (2004). Latent class models. In D. Kaplan (Ed.), The Sage handbook of
quantitative methodology for the social sciences (pp. 175-198). Thousand Oaks, CA: Sage
Publications.
Martínez-Garcia, E., & Raya, J. M. (2008). Length of stay for low-cost tourism. Tourism Management, 29(6),
1064-1075.
Mazanec, J. A., & Strasser, H. (2007). Perceptions-based analysis of tourism products and service providers.
Journal of Travel Research, 45(4), 387-401.
Salmasi, L., Celidoni, M., & Procidano, I. (2012). Length of stay: Price and income semi-elasticities at different
destinations in Italy. International Journal of Tourism Research, 14(6), 515-530.
Sung, H. H., Morrison, A. M., Hong, G.-S., & O’Leary, J. T. (2001). The effects of household and trip
characteristics on trip types: a consumer behavioural approach for segmenting the US domestic leisure
travel market. Journal of Hospitality and Tourism Research, 25(1), 46-68.
Swait, J., & Sweeney, J. C. (2000). Perceived value and its impact on choice behavior in a retail setting. Journal
of Retailing and Consumer Services, 7(2), 77-88.
Thrane, C. (2012). Analyzing tourists' length of stay at destinations with survival models: A constructive critique
based on a case study. Tourism Management, 33(1), 126-132.
Wedel, M., & DeSarbo., W. S. (1994). A Review of Recent Developments in Latent Class Regression Models. In
R. E. Bagozzi (Ed.), Advanced Methods of Marketing Research (pp. 352-388). Cambridge, MA:
Blackwell.
21
Wu, L., Zhang, J., & Fujiwara, A. (2011). Representing tourists’ heterogeneous choices of destination and travel
party with an integrated latent class and nested logit model. Tourism Management, 32(6), 1407-1413.
Yang, Y., & Wong, K. (2012). A spatial econometric approach to model spillover effects in tourism flows.
Journal of Travel Research, 51(6), 768-778.
Yang, Y., Wong, K. K. F., & Zhang, J. (2011). Determinants of length of stay for domestic tourists: Case study of
Yixing. Asia Pacific Journal of Tourism Research, 16(6), 619-633.
22
Table 1. Description of Independent Variables
Categorical
Variable
Description
Frequency
Percentage
age1
age 15-24
3762
13.64%
age2
age 25-44
17153
62.17%
age3
age 45-64
6012
21.79%
age4
age 65 and above
663
2.40%
motivation1
vacation motivation
5422
19.65%
motivation2
sightseeing motivation
11677
42.32%
motivation3
VFR motivation
2863
10.38%
motivation4
business motivation
3231
11.71%
motivation5
other motivations
4397
15.94%
info1
info from agency
4572
16.57%
info2
info from friends and relatives
7339
26.60%
info3
info from affiliations
4928
17.86%
info4
info from media
4866
17.64%
info5
other info sources
5885
21.33%
samecity
indicator for tourists whose residence is in
the destination city
1134
4.11%
partner1
with colleagues
6705
24.30%
partner2
with friends and relatives
9240
33.49%
partner3
with travel agency
3604
13.06%
partner4
alone
8041
29.14%
Continuous
Variable
Description
Mean
S.D
pastvisit
1=no past visit; 2=1-2 past visits; 3=3-4
past visits; 4=5 and more past visits
1.954
1.040
distance
distance from residence (in 1,000 km)
0.472
0.480
education
education level: 1= college and above; 2=
associate diploma; 3=senior high
school/secondary vocational school;
4=junior high school; 5=elementary
school and below.
4.061
0.877
attraction
number of attractions visited
2.241
1.552
hnr
proportion of nights in hotel
0.486
0.497
observations
27590
23
Table 2. Goodness-of-fit Measures of Different Models
AFT-metric model
Number of
latent classes
Log-
likelihood
AIC
FSAIC
BIC
HQIC
Log-logistic
1
-12546.70
0.911
0.911
0.918
0.913
Log-logistic
2
-10052.36
0.732
0.732
0.745
0.736
Log-logistic
3
-11537.63
0.841
0.841
0.862
0.848
Log-logistic
4
-11863.41
0.867
0.867
0.894
0.875
Log-normal
1
-12704.30
0.923
0.923
0.929
0.925
Log-normal
2
-10519.55
0.766
0.766
0.779
0.770
Log-normal
3
-12248.79
0.893
0.893
0.913
0.899
Log-normal
4
-12545.16
0.916
0.916
0.943
0.925
Weibull
1
-15549.35
1.129
1.129
1.135
1.131
Weibull
2
-12859.10
0.935
0.935
0.949
0.940
Weibull
3
-12121.51
0.884
0.884
0.904
0.890
Weibull
4
-12306.43
0.899
0.899
0.926
0.907
Gamma
1
-12747.82
0.926
0.926
0.933
0.928
Exponential
1
-29634.27
2.150
2.150
2.156
2.152
General-F
1
-12512.71
0.909
0.909
0.916
0.911
24
Table 3. Estimation Results of Duration Models
Variable
Log-logistic model
(latent class 1)
Log-logistic model
(latent class 2)
Probit model
of membership
Log-logistic
model
age1
-0.027
0.006
0.027
0.005
(0.023)
(0.007)
(0.061)
(0.007)
age3
-0.010
-0.016**
0.002
-0.013**
(0.018)
(0.006)
(0.053)
(0.006)
age4
-0.112***
0.001
-0.488***
-0.032**
(0.037)
(0.016)
(0.122)
(0.014)
motivation2
0.0464**
-0.090***
-0.368***
-0.086***
(0.021)
(0.007)
(0.055)
(0.006)
motivation3
-0.081***
0.059***
0.850***
0.110***
(0.028)
(0.009)
(0.096)
(0.008)
motivation4
0.358***
-0.027***
0.438***
0.076***
(0.026)
(0.010)
(0.102)
(0.009)
motivation5
0.063**
-0.027***
0.716***
0.035***
(0.029)
(0.009)
(0.089)
(0.008)
pastvisit
0.003
0.008***
-0.049**
0.011***
(0.007)
(0.003)
(0.022)
(0.002)
distance
0.121***
0.105***
0.511***
0.111***
(0.013)
(0.005)
(0.059)
(0.005)
education
0.003
-0.009***
-0.053**
-0.009***
(0.008)
(0.003)
(0.024)
(0.003)
attraction
0.026***
0.055***
0.288***
0.069***
(0.004)
(0.002)
(0.019)
(0.001)
info2
0.004
0.032***
0.042
0.030***
(0.026)
(0.010)
(0.081)
(0.008)
info3
0.037
0.049***
-0.453***
0.004
(0.027)
(0.011)
(0.096)
(0.009)
info4
0.017
0.009
-0.343***
-0.022**
(0.027)
(0.010)
(0.083)
(0.009)
info5
0.013
0.026***
-0.079
0.018**
(0.026)
(0.010)
(0.083)
(0.008)
hnr
0.909***
-0.054***
3.138***
0.095***
(0.020)
(0.008)
(0.086)
(0.005)
samecity
0.001
-0.074***
-1.471***
-0.305***
(0.031)
(0.016)
(0.078)
(0.010)
partner1
-0.020
0.014*
0.217***
0.030***
(0.022)
(0.008)
(0.082)
(0.007)
partner2
0.001
-0.006
-0.459***
-0.032***
(0.019)
(0.006)
(0.054)
(0.006)
partner3
-0.009
-0.075***
-0.557***
-0.091***
(0.027)
(0.011)
(0.084)
(0.009)
constant
-0.126***
0.816***
1.155***
0.640***
25
(0.040)
(0.014)
(0.122)
(0.012)
sigma
0.138***
0.182***
0.214***
(0.005)
(0.002)
(0.001)
Sample size
27590
27590
27590
AIC
0.732
0.538
0.911
BIC
0.745
0.544
0.918
Pseudo R-squared
0.252
(Notes: * indicates p<0.10, ** indicates p<0.05, *** indicates p<0.01.The significance of some auxiliary parameters is
tested in logarithm. Standard errors are estimated by Huber/White/sandwich estimator of the variance in parenthesis.)
26
Figure 1. Histogram of length of stay
010 20 30 40
Percent
0 5 10 15 20
Length of Stay
27
Figure 2. Histogram of length of stay in different latent classes
0.2 .4 .6 .8 1
Density
0 5 10 15 20
LOS (latent class 1)
0.2 .4 .6 .8 1
Density
0 5 10 15 20
LOS (latent class 2)