Conference PaperPDF Available

Decoding Retail Location: A Primer for the Age of Big Data and Social Media


Abstract and Figures

The rise of social media and the open government movement are revolutionising data-driven research in social sciences. In this paper we use passively collected datasets of human mobility and public records of economic activity to develop a shopping location choice model that considers internal and external economies of scale at the outlet level. Our findings highlight the impact of scale and agglomeration on shopping location choices and form a platform for further research.
Content may be subject to copyright.
Decoding Retail Location: A Primer for the Age
of Big Data and Social Media
Vassilis Zachariadis, Camilo Vargas-Ruiz, Joan Serras, Peter Ferguson and
Michael Batty
The rise of social media and the open government movement are revolu-
tionising data-driven research in social sciences. In this paper we use pas-
sively collected datasets of human mobility and public records of eco-
nomic activity to develop a shopping location choice model that considers
internal and external economies of scale at the outlet level. Our findings
highlight the impact of scale and agglomeration on shopping location
choices and form a platform for further research.
V. Zachariadis (Corresponding author) • C. Vargas-Ruiz, J. Serras, P. Fer-
guson, M. Batty
Centre for Advanced Spatial Analysis (CASA), University College Lon-
don (UCL), London W1N 6TR, UK
C. Vargas-Ruiz
J. Serras
P. Ferguson
M. Batty
CUPUM 2015
1 Introduction
From Huff (1966) to Teller and Reutterer (2008), modelling retail location choice
processes has been one of the pillars of urban modelling. Recent work (Leszczyc
et al, 2004; Reimers and Clulow, 2004) has shed empirical light to pricing strategy
and location preference of retail activity. However, retail location and consumer
location choice theory continue to draw heavily from fundamental ideas from
von Thunen (1966) and Hotelling (1931), from central place theory (Christaller,
1966; Dennis et al, 2002) and rent-bid theory (Alonso 1960).
Admittedly, since the late 90s, advances in economic geography (Krugman,
1990 and 1998) have reinvigorated the field and reorganised it, by considering
economies of scale, cross-dependencies of markets (e.g. labour, retail, housing)
and forms of imperfect competition between firms (Dixit and Stiglitz, 1977).
However, apart from notable exceptions (for example, Anderson et al, 1992;
Suarez et al, 2004), fresh approaches have been slow in informing state-of-the-
art modelling and engaging with mainstream location choice modelling based on
discrete choice and random utility theory (Williams, 1977).
Here we present a model of consumer location choice, based on random util-
ity theory and designed to capture internal and external economies of scale at
the individual retailer level. We discuss its theoretical foundations, propose an
implementation strategy, take advantage of unconventional data sources that
became recently available (e.g. detailed economic activity, digital social media
footprints etc.) for calibration and validation and review its outputs.
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
2 Datasets
Before we go on to present the proposed consumer location choice model, in
this section we present the main datasets used for its calibration and validation.
These are a combination of formal proprietary datasets of travel behaviour and
economic activity, and passively collected data sources of digital social media
2.1 London Travel Damand Survey (LTDS)
LTDS is a continuous household survey of the London area, covering the Lon-
don boroughs as well as the area outside Greater London but within the M25
motorway. Results in this report relate to residents of the Greater London area,
comprising the 32 London boroughs and the City of London. The first year of
results covered the financial year 2005/06, meaning that there are now eight
years of data available.
The survey is a successor to the household survey component of the London
Area Transport Survey (LATS) which was last carried out in 2001. The LTDS
annual sample size is around 8,000 households in a typical year, a sum of 65,000
households for the 2005-2013 period.
LTDS captures information on households, people, trips and vehicles. All
members of the household are surveyed, with complete trip detail for a single
day recorded for all household members aged 5 and over. Three questionnaires
are used a household questionnaire, individual questionnaires for all household
members, and trip sheets or travel diaries. The later capture data on all trips
made on a designated travel day, the same day for all members of the household.
Details captured include trip purposes, modes used, trip start and end times,
and the locations of trip origins and destinations.
2.2 Valuation Office Business Rates
There is large number of studies providing evidence for high correlation between
flow volumes of shopping pedestrians and turnover of neighbouring retail outlets
(Chiradia et al 2009, Timmermans and Waerden 1992, Thomas and Bromley
2003, Thornton et al 1991 etc.). As a result, the literature suggests that, in the
absence of data, pedestrian flows are frequently used to estimate retail turnover
and vice versa (Borgers et al 2008, Winrich 2008).
Since, the recent online publication of the Valuation Office Agency Business
Rates for 2005 and 2010, the business rates of all business premises in England
and Wales has become available to the public and offer a unique in depth, extent
and geographic precision dataset. VOA compiles and maintains lists of rateable
values of the 1.7 million non-domestic properties in England, and the 100,000 in
Wales, to support the collection of around 25 billion in business rates.
The Rateable value represents the agencys estimate of the open market an-
nual rental value of a business/ non-domestic property; i.e. the rent the property
would let for on the valuation date, if it were being offered on the open market.
CUPUM 2015
Decoding Retail Location …
The rateable value is estimated based on the varying rents at the vicinity of
a property and decided to represent a reasonable level of open market rental
value, taking into account the expected turnover of the premise, the size, age
and condition of the property, the length of the frontage, the depth and vertical
layout, the visibility, the footfall and pedestrian flow volumes on surrounding
streets etc. Because the rateable value of a property is used to determine the
non-domestic property tax (business rate), and the evaluators have access to
detailed information (such as contracts, revenue documents etc.), and compre-
hensive evaluation documentation and methodology, rateable value is considered
a very good indicator of the property value of a hereditament.
The agency publishes a detailed set of information for each property; this
includes the classification of its main use (detailed breakdown into more than
100 classes), the full address and postcode, the total area of the premise, the
total rateable value and breakdown into zones with different rateable value per
square metre, and the weighted average rateable value per square metre. This
makes it possible to create a detailed map of rateable value for any use.
2.3 Social media spatiotemporal profiles (Twitter and Foursquare)
We use two passively generated datasets. One contains 25 millions geo-located
tweets collected over a period of 7 months (10/2013 to 05/2014), and covering an
area that extends beyond London and covers most of the Greater South East of
England. Each record contains the tweet coordinates, timestamp, text, language,
tags etc. For this paper we use only location and time.
A second one contains all Foursquare venues within the M25 motorway (300
thousands venues). Each record contains location coordinates, number of check-
ins and unique visitors since venue was registered and detailed venue category
(activity type). The Foursquare venue data was collected in December 2014.
3 Methodology
In this section we present the shopping location choice model and the steps of
the calibration process.
3.1 Shopping location model
The objective is to compose a causal model of actors (consumers and retailers)
that explains the spatial distribution of retail activity. We assume that con-
sumers choose shopping locations that maximise their utility and that retailers
are willing to pay floorspace rent in line with the consumption that they expect
to attract (eq. 1).
cr,j =ar×xr,j +br(1)
where cr,j is the floorspace rent (£/m2) that retailer ris willing to pay in location
j,xr,j is expected consumption per m2of floorspace and ar,brare constants
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
particular to retailer r. In the case of the baseline model of this paper, we assume
bris zero for all retailers. This means that retailer ris willing to pay ad valorem
rents ar, which reflect gross profit margins etc. To simplify, for this paper we
also set ar=afixed for all retailers.
We assume that the net utility of consumer qin location ibuying product
type ffrom retailer rin jis equal to:
uf,q,i,r,j =ufpf,r cf ,i,j (2)
where ufis the utility of acquiring f,pf,r,j is the price retailer rin jis selling
ffor, and cf,i,j is the cost of transporting ffrom ito j. For simplification
we assume a sole homogeneous product type f. Therefore, the transportation
costs cf,i,j per unit of product are equal for all varieties of product f; i.e. equal
regardless of the retailer of choice.
Despite the homogeneity assumption, we follow Fujita et al (2001, see also Fu-
jita and Thisse, 2013) in assuming monopolistic competition (Dixit and Stiglitz,
1977) between retailers. This means that each retailer offers a unique variety of
products; i.e. each retail unit brings to the market a variety of products suffi-
ciently differentiated to represent a unique blend of product type f. As a result,
retailers hold the power to set their selling prices. Moreover, the number of re-
tailers is assumed to be sufficiently high and the entry/exit costs to the market
comparatively low, and therefore, profits to be zero and strategic decision-making
Taking into account the spatial dimension of the problem, the source of prod-
uct variation between retailers is a combination of differences in the actual prod-
uct in offer and in the location of sale point (the shop) (Greenhut et al, 1987).
Therefore, from eq. 2 we have:
uf,q,i,r,j = (uf+wq,f ,r )pf,r cf,i,j (3)
where wq,f,r is a random element of utility, reflecting fit between the variation
of foffered by retailer rand the particular preferences of consumer qin terms
of spatial location and product variety. If random utility wq,f,r is Gumbel i.i.d.
for all retailers then following Train (2003) the probability of consumer qin i
choosing to shop from retailer rin jis:
Pf,q,i,r,j =exp(βq×(ufpf,r cf ,i,j ))
P[r0,j0][R,J ]exp(βq×(ufpf,r0cf,i,j0)) (4)
where βqis inverse standard deviation of the Gumbel distribution of wq,f ,r. If
βq=βfor all consumers and asking price of product type fis equal for any
retailers in location j, eq. 4 is simplified into:
Pi,j =exp(β×(pj+ci,j ))
Pj0Jexp(β×(pj0ci,j0)) (5)
In this simplified case, probability Pi,j is only a function of locations iand j.
The model of eq. 5 looks similar to a multinomial logit model (McFadden 1980,
CUPUM 2015
Decoding Retail Location …
2001); however, the source of stochasticity is not modelling uncertainty but taste
variation. As such probability Pf,q,i,r,j does not reflect estimated likelihood of
[q, i] choosing option [r, j], but rather share of [q , i] decisions directed towards
(matching) [r,j].
The utility function of eq. 3 implies that each retailer roffers one, and only
one, variation of product type f. Therefore, each consumer qassociates only one
random utility component wq,f,r per retailer, reflecting the fit between this sole
product variation and the preferences of the consumer. This assumption is quite
unrealistic; typically, a retailer will stock a variety of products of the same type.
It is reasonable to assume that larger shops will stock larger varieties. Therefore,
variety can be expressed as a function of floorspace (eq. 6):
uf,q,i,r,j = max
[(ufv+wq,fv,r )pfv,r cfv,i,j ] (6)
which means that the utility of consumer qshopping from retailer ris equal to
the the utility of buying the specific product variety fvoffered rthat maximises
the consumers utility. Following Daly and Zachary (1976) and Ben-Akiva and
Lerman (1985), and assuming ufv=uf,pfv,r =pf,r ,cfv,i,j =cf,i,j and pr=p
the probability of consumer qshopping from retailer ris equal to:
Pi,r,j =sα
rexp(β×ci,j )
P[r0,j0][R,J ]sα
where sris floorspace of rand αis the level of correlation between the stochastic
components of the product variations fv. Macroscopically, eqs. 6 and 7 suggest
that the utility that consumer qgets from buying from retailer ris either sub-
linear (α < 1), linear (α= 1), or super-linear (α > 1) functions of the floorspace
of retail unit r. The respective consumer utilities are translated into either inter-
nal dis-economies of scale (sub-linear utility function) or internal economies of
scale (super-linear utility function). Indeed, in the former case if a shop doubles
its size it will attract less than double its original consumption, in the latter case
it will attract more than double its original consumption (eq. 6).
These internal (dis)economies of scale emerge from the application of a trans-
parent utility-based approach and reflect perceived opportunities at the level of
individual retailer. The value of αsuggests intensity of product variation as a
function of floorspace: if α < 1 is smaller than 1, the variation of stocked prod-
ucts (as perceived by the consumer) builds up slower than floorspace, if α > 1
perceived variation builds up faster than floorspace (e.g. when a shop doubles in
size it stocks more than double the number of the original varieties).
Eq. 7 captures the potential impact of size at the individual shop level (in-
ternal economies of scale). As such it is sufficient in describing flow patterns
and activity distribution associated with shop-size variations. It fails, however,
to account for the concentration of retail activity in clusters (markets/shopping
centres). In other words, eq. 7 cannot explain agglomeration effects in the spatial
distribution of retail activity. In order to introduce the appropriate mechanisms,
we extend our behavioural approach accordingly.
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
Let us assume consumer qevaluating the prospect of shopping a variation
of product type ffrom retailer r. We assume that the consumer knows r’s
floorspace srin advance. As such qcan estimate the expected utility of rand
decide the probability of visiting rusing eq. 7. However we assume that qis only
able to determine the stochastic part of utility wq,fv,r for each of the product
variations offered by rupon arrival. In order to minimise the risk of unfavourable
matches between r’s product variations and personal preferences, qevaluates
favourably retailers that are close to other retailers. Therefore, when considering
the expected utility of buying product ffrom retailer r, the consumer also takes
into account the expected utility of all other retailers in the vicinity of r. This
means that the amount of opportunity associated with ris equal to:
r0exp(γ×dr,r0) (8)
where, dr,r0is some form of generalised distance between rand some other
retail unit r0within distance dfrom r;sr0is the floorspace of r0and γis a
calibration parameter associated with the type of generalised distance used. Since
exp(γ×dr,r) = 1, eq. 8 can be simplified by taking sα
rinside the sum and
widening the summation condition, so r0can be equal to r. Eq. 8 states that the
composite perceived utility Srthat a consumer attaches to a particular retailer r
is equal to its individual utility sα
rplus the utility of reaching, from r, the shops
in its vicinity. By replacing eq. 8 into eq. 6 we get:
Pi,r,j =exp(β×ci,j)Pr0,dr r0<d sα
P[r00,j 00][R,J ]exp(β×ci,j00 )Pr0,dr00 r0<d sα
In the resulting location choice model, αcontrols the extent of the internal
economies and γof the external economies of scale. If α > 1, the relationship
between consumer perceived utility of a shop and its size is super-linear and
the economies of scale are positive. Similarly, if γ+, the utility of visiting
retailer ris only a function of its own utility sα
r, while if γ+0, the utility
of visiting ris shaped equally by all shops r0in the vicinity of r. Note that
when γ+and α= 1, we get the simple multinomial logit model. It is
easy to show that the model of eq. 9 is a particular case of the Cross-Nested
Logit model (CNL) (Wen and Koppelman, 2001; Bierlaire, 2006), in which each
retailer rrepresents a nest, the extend of overlap between nests is controlled by
exp(γ×dr,r0), and there is no correlation between the random elements of util-
ity of individual retailers. At one hand, this observation confirms the proposed
behavioural explanation of the agglomeration effect (consumers hedging proba-
bility of finding a variety that meets their profile). At the other hand, it offers
an alternative explanation; one where consumers prefer destinations with high
spatial concentration of retailers with uncorrelated product varieties, in order to
bundle together heterogeneous shopping activities (e.g. Oppewal and Holyoake,
2004). While this is also an attractive interpretation of observed agglomeration
CUPUM 2015
Decoding Retail Location …
effects, it fails to account for concentration of retailers offering similar prod-
uct varieties - a frequently observed phenomenon. In any case, identification
of proper interpretation of the underlying behavioural processes is outside the
scope of this paper and subject to further work following the development of the
model and access to appropriate data sources.
3.2 Implementation Strategy
The aim of this modelling process is to explore how the sizes of competing
shopping destinations affect the locations consumers decide to shop from. In
section 3.1 we formulated a location choice model for consumers using ran-
dom utility theory. The proposed cross-nested logit model designed has been
designed to capture internal and external economies of scale. In this particu-
lar context, economies of scale are defined as the consumers super-linear utility
returns with retail size; i.e. the consumers preference to shop at larger shops
(internal economies of scale) and at locations with higher concentration of retail
activity (external economies of scale).
Using the model in its general form (eq. 9) we can estimate the turnover of
a particular retailer ras the sum of sales to all consumers:
(Xn,i ×Pn,i,r,j) (10)
where Xn,i is the disposable retail budget for consumer of type nbased in loca-
tion i.
Based on the assumption of monopolistic competition and low entry/exit
costs to the market, we expect the ratio of turnover to floorspace rent (fracYrRr)
to be constant for all retail units rof type s. This means that we consider long-
term selling prices and running/labour costs per square metre of floorspace to be
equal across space (a reasonable simplification for an intra-urban retail market).
To simplify we have assumed that all retailers are of a sole type s. Moreover,
we consider only two consumer types (consumers shopping from (i) home and
(ii) work) and set a fixed disposable retail budget Xn,i =Xfor all consumer
types nand origins i.
Following these simplifications, it is possible to evaluate the level of correla-
tion between estimated turnover Yrof retailer r(from equations 9 and 10) and
some sort of proxy rent Rrfor different values of αand γ; and to evaluate the
respective internal and external economies of scale.
4 Results
As we mentioned in section 3.2, calibration of the model is a two-step process. In
the first instance we use the London Travel demand survey to generate distance
profiles for shopping trips. Once these are established we generate location choice
patterns for a spectrum of internal and external economies of scale and for each
combination we correlate total number of trips to each shop with its rateable
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
4.1 Distance profiles
For this paper we focus on two types of trips: shopping trips from home and work.
The LTDS database contains 5004 trips between home and shopping and 2242
trips between work and shopping. These numbers are not sufficient to calibrate
the location choice model directly. In fact the LTDS supplementary report (TfL
2011) suggests that the sample size is sufficient only for a Inner/Outer London
spatial classification. To address this the survey is only used to calibrate distance
profiles; i.e. the probability distribution of distance travelled for shopping from
home and work.
Fig. 1. Correlation between modelled turnover/sq.m. and observed (VOA Rateable
Value) floorspace Rent/sq.m.
The calibration of the distance profiles involves three steps: (i) for each [α,γ]
combination, calculate the attractiveness of every shop; (ii) for each trip origin
(home/work locations) calculate distances to every shop (iii) for each [β,λ]
combination generate the cumulative trip distance profile for home and work
based trips using eq.11.
P[type](x) = 1
sr×exp(β×f(i, r))
Prsr×exp(βt×f(i, r)) (11)
CUPUM 2015
Decoding Retail Location …
where sris the attractiveness of shop r,Xiis the demand for shopping in
location iof type [type] (from home or from work) and f(i, r) = xλ1
λis the
cost function for origin-destination pair [i, r]. Eq. 11 states that the probability
of shopping within distance xfrom origin iis equal to the weighted sum of
probabilities of shopping to any shop that is within distance xfrom i. For each
[α,γ] combination of the cumulative distribution of eq. 11 we calculate the [β,λ]]
values that maximise the likelihood of the observed LTDS trips. The output of
this process generates the following (β,λ) values for home-based and work-based
trips respectively: (1.20, 0.42) and (1.34, 0.20).
Fig. 2. Correlation between modelled turnover and approximated number of visits
(Foursquare check-ins in immediate vicinity).
4.2 Generation of economies of scale
Having completed the calibration of the distance profiles we calculate modelled
turnover estimates for each retailer rfor a set of (α, γ) parameters. Following
this, for each (α, γ) combination, we calculate the correlation level between the
modelled turnovers and the observed floorspace rents. For each retailer r, we use
the VOA rateable value as an indicator for willingness to pay for floorspace (Rr).
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
As we mention earlier, rateable value represents the Valuation Office Agencys
estimate of the open market annual rental value of a business/ non-domestic
property; i.e. the rent the property would let for on the valuation date, if it
were being offered on the open market; and as such, it is considered a very good
indicator of the property value of the respective hereditament. Figure 1 shows
correlation between modelled Yrand observed Rrfor different pairs of values α
and γ.
Pair (α= 1.00, γ= 486/3×103) is the combination closer to the simple logit
model (α= 1, γ= +); in this case weighted correlation between floorspace
rents and revenues is 0.275. On the other hand, correlation is maximum for
the pair (α= 1.25, γ= 18/3×103). In this case both internal and external
economies of scale are manifested. A value of αover 1.00, means that there is a
super-linear relationship between the floorspace area of a retail unit and its per-
ceived utility. Similarly, a value of γ << means that consumers associate the
composite utility of shopping in a shop as a combination of its individual utility
and the utility of shopping in shops in its vicinity. As expected this generates
strong agglomeration effects that are translated into retail activity clustering
and reflected in higher floorspace rents in locations of higher concentration of
Fig. 3. Correlation between estimated local clustering level and observed (VOA Rate-
able Value) floorspace Rent/sq.m.
CUPUM 2015
Decoding Retail Location …
Fig. 4. Log-ratio of modelled turnover to observed floorspace rent for values pair (α=
1.00, γ= 486/3×103).
In order to validate the findings of the fitting process presented in figure 1
we use the Foursquare check-ins dataset described in the datasets section as a
benchmark. For each retailer r, we sum the number of check-ins in its immediate
vicinity (50-100 metres) and correlate modelled turnover (calculated using the
process we describe in the methodology section) against the number of check-ins.
The results are presented in figure 2. The correlation level of the basic logit model
(with no internal or external economies of scale) is quite low at 0.12. On the
other hand, maximum correlation of 0.43 is obtained for the exact same pair of
economies-of-scale parameters (α, γ) that maximised the correlations in figure 1.
This means that the maximum correlation between modelled turnover/sq.m. and
VOA floorspace rent corresponds to the same (α, γ) parameters that maximise
correlation between modelled turnover and foursquare check-ins. This is quite
reassuring, especially if we consider that modelled turnover represents number
of consumer visits and forsquare check-ins represent human presence passively
collected via social media activity. Therefore, we end up suggesting that the
modelled levels of internal and external economies of scale, as perceived by the
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
consumer, that offer the best fit to observed floorspace rents (VOA) and esti-
mated human presence (Foursquare) are, as should be the case, identical and
equal to (α= 1.25, γ= 18/3×103). This means that internal economies of
scale are existent and strongly super-linear (preference towards larger shops)
and external economies of scale are also strong (γis small and therefore the
perceived composite utility of a shop is largely determined by other shops in its
To establish the impact of modelled consumption demand on observed floor-
space rents, we conclude this piece of analysis by calculating the correlation
levels between floorspace rents (VOA) and local clustering levels. In this con-
text, local clustering is identified as the accessibility of each shop rto other
shops r0as expressed by eq. 8. Essentially, this is equal to considering th case
where consumers are uniformly distributed in space, and as such turnover is
determined only by the respective perceived utility of each shop. In this case,
correlations between clustering levels and floorspace rents are given by figure 3.
As expected, correlation levels are lower (since consumer s’ spatial distribution
is assumed uniform): maximum value is 0.35, and corresponds to the (α= 0.75,
γ= 18/3×103) parameter pair. Therefore, in this case internal economies of
scale are negative (preference for smaller shops), but external economies of scale
remain positive and strong at 18/3×103.The fact that the oversimplification
of consumer demand has a direct impact on the estimation of the type of inter-
nal economies of scale, highlights the importance of developing models that are
well-rooted to behavioural attributes of the respective actors. Ideally, all sectoral
models should be designed as integrable components of comprehensive spatial
equilibrium/dynamics model (e.g. Echenique et al, 2013).
To conclude our discusion, figures 4 and 5 illustrate the log-ratio between
modelled revenue and observed floorspace rents (log( Yr
Rr)) for shops. The size of
the each circle refers to the floorspace area of the respective shop.
The patterns in figures 4 and 5 highlight the limitations of two assumptions:
(i) homogeneity of population in respect to disposable retail budget, and (ii)
network-based metric distance as the determinant of proximity between retail-
ers rand r0. In the case of the former, it is clear, from the map in figure 5, that,
despite addressing some of the distortions seen in the map of figure 4, there is
still systematic overestimation of turnover in the south and east sides of London
(where household incomes are relatively low) and systematic underestimation of
turnover in the west and south west sides of London (where household incomes
are higher than the London average). The introduction of disposable budgets in
line with household incomes would, to an extend, address this issue. In the case
of the latter, looking at the distribution of log-ratios in figure 5 (particularly
in Central London), it becomes clear that there is systematic overestimation of
turnover of shops rthat are in proximity to other shops r0in terms of met-
ric distance, but not in terms of topological distance; e.g. routes from rto r0
are complicated, involving several turns. The introduction of composite met-
ric/topological costs of moving from rto r0should partially address this issue,
CUPUM 2015
Decoding Retail Location …
and the balance between metric and topological cost components is a potential
area for further research (Zachariadis, 2014).
Fig. 5. Log-ratio of modelled turnover to observed floorspace rent for values pair (α=
1.25, γ= 18/3×103).
5 Conclusion and Next steps
In this paper we present a location choice model, based on random utility and
following the, growing in popularity, cross-nested choice structure (Wen and
Koppelman, 2001). The novelty of the proposed model is that it models retailers
at the individual level. This opens up exciting opportunities towards integrating
the consumer location choice component with explicit retail location microsimu-
lation models able to get full advantage of emerging availability of detailed data
sources and incorporate complex behaviour on price-setting, network dynamics
and risk management.
The proposed model in its current form has been simplified (with no loss
of generality) into assuming that all retailers offer unique varieties of the same
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
product. Moreover, it has been assumed that (i) all consumers have equal dis-
posable retail budgets regardless of their location, (ii) all trips are uni-purpose
(only shopping is considered), (iii) VOA rateable values are good indicators of
floorspace rents and (iv) product prices do not vary in space. These assumptions
mean that a considerable part of the complexity of the decision making mecha-
nism is not represented by the model. The VOA and LTDS datasets that we are
currently using have the potential to increase the complexity of the model signifi-
cantly towards removing some of the existing simplifying assumptions, and when
combined with passively collected social media datasets and formal datasets on
economic activity (e.g. Business Structure Dataset from ONS) offer sufficient
detail to capture all the main dimensions of behavioural variation.
Having said that, the basic model that we present in this paper remains very
useful, both as the baseline example of the proposed approach and as a bench-
mark; despite its simplicity, it translates a substantial amount of the discrepancy
between the modelled flows of the unconstrained location choice model and the
observed rents into estimates of internal and external economies of scale.
Fig. 6. Correlation between number of Twitter Users and Rent-rate for Retail premises
(x-axis: time of day, y-axis: correlation). Each line represents distance from retail
premise (e.g. 100m represents correlation between rent rate of shop and number of
Twitter users within 100 metres from the shop).
Looking at the - not too distant - future, passively generated datasets of
human presence promise to offer deeper insights on the dynamics of urban ac-
tivities, including spatio-temporal patterns of shopping behaviour. For example,
CUPUM 2015
Decoding Retail Location …
figure 6 illustrates the correlation between number of tweets and rateable value
of retail stores (Manley et al, 2015). Lines represent distance from stores; e.g. the
100m line represents the correlation level (y-axis) between rateable value of each
store and number of tweets within 100 metres from it for different times of the
day (x-axis). Figure 6 shows that the highest correlations between rateable val-
ues and number of tweets are found for the 50m and 100m distance bands (there
is little difference between the two) and between 2pm and 5pm. Both spatial
and temporal dimensions seem to confirm expected values (afternoon shopping
and distances that cover in-store locations plus the immediate vicinity of retail
Exercises like this one, are particularly useful for exploring the extent in
which biases associated with the temporal variation of social-media usage (i.e.
preference of users to tweet at specific times) is reflected in the temporal distri-
bution of generation of digital output associated with particular activities and
thus the respective impact on the efficacy of passively generated datasets in
generating valid representations of travel demand and activity dynamics.
Having said that, the abundance of existing social media datasources and
the relentless pace in which new data are introduced, sustain the promise of
accessible and highly disaggregated spatiotemporal information for anyone who
manages to overcome lack of specification, representational biases and possibly
absence of context.
6 Bibliography
Alonso, W., 1960. A theory of the urban land market. Papers in Regional
Science, 6(1), 149-157.
Anderson, S. P., De Palma, A., and Thisse, J. F. (1992). Discrete choice
theory of product differentiation. MIT press.
Ben-Akiva, M.E. and Lerman S., 1985. Discrete-Choice Analysis: Theory and
Applications to Travel Demand, MIT Press (1985)
Bierlaire, M., 2006. A theoretical analysis of the cross-nested logit model.
Annals of operations research, 144(1), 287-300.
Christaller, W., 1966. Central places in southern Germany. Prentice-Hall.
Daly, A.J., Zachary, S., 1976. Improved multiple choice models. In: Proceed-
ings of the Fourth PTRC Summer Annual Meeting. University of Warwick, Eng-
land, 12-16 July 1976.
Dennis, C., Marsland, D., and Cockett, T., 2002. Central place practice:
shopping centre attractiveness measures, hinterland boundaries and the UK re-
tail hierarchy. Journal of Retailing and Consumer Services, 9(4), 185-199.
Dixit, A. and Stiglitz, J., 1977. Monopolistic Competition and Optimum
Product Diversity. American Economic Review 67 (3): 297308
Echenique, M.H., Grinevich, V., Hargreaves, A.J. and Zachariadis, V., 2013.
LUISA: a land-use interaction with social accounting model; presentation and
enhanced calibration method. Environment and Planning B: Planning and De-
sign, 40(6), 1003-1026.
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
Fujita, M., Krugman, P.R., and Venables, A.J., 2001. The spatial economy:
Cities, regions, and international trade. MIT press.
Fujita, M., and Thisse, J.F., 2013. Economics of agglomeration: Cities, in-
dustrial location, and globalization. Cambridge university press.
Greenhut, M.L., Norman, G., and Hung, C.S. 1987. The economics of imper-
fect competition: a spatial approach. Cambridge University Press.
Hotelling, H., 1931. The economics of exhaustible resources. The journal of
political economy, 137-175.
Huff, D.L., 1966. A programmed solution for approximating an optimum
retail location. Land Economics, 293-303.
Krugman, P., 1998. What’s new about the new economic geography?. Oxford
review of economic policy, 14(2), 7-17.
Krugman, P., 1990. Increasing returns and economic geography (No. w3275).
National Bureau of economic research.
Leszczyc, P.T.P., Sinha, A. and Sahgal, A., 2004. The effect of multi-purpose
shopping on pricing and location strategy for grocery stores. Journal of Retailing,
80(2), 85-99.
Manley E., Dennett A., Serras J., Zachariadis V. and Batty M., 2015. Visu-
alising Londons Traffic: Flow and Activity in a 21st Century City. in Traffic in
Towns [ed. Jin Y.]. [Forthcoming]
McFadden, D., 1980. Econometric models for probabilistic choice among
products. Journal of Business, S13-S29.
McFadden, D. (2001). Economic choices. American Economic Review, 351-
Oppewal, H. and Holyoake, B., 2004. Bundling and retail agglomeration ef-
fects on shopping behavior. Journal of Retailing and Consumer Services, 11(2),
Reimers, V. and Clulow, V., 2004. Retail concentration: a comparison of
spatial convenience in shopping strips and shopping centres. Journal of Retailing
and Consumer Services, 11(4), 207-221.
Suarez, A., del Bosque, I.R., Rodrguez-Poo, J.M., and Moral, I., 2004. Ac-
counting for heterogeneity in shopping centre choice models. Journal of Retailing
and Consumer Services, 11(2), 119-129.
Teller, C. and Reutterer, T., 2008. The evolving concept of retail attrac-
tiveness: What makes retail agglomerations attractive when customers shop at
them?. Journal of Retailing and Consumer Services, 15(3), 127-143.
von Thnen, J.H., 1966. Isolated state: an English edition of Der isolierte
Staat. Pergamon Press.
Train, K.E., 2009. Discrete choice methods with simulation. Cambridge uni-
versity press.
Wen, C.H., and Koppelman, F.S., 2001. The generalized nested logit model.
Transportation Research Part B: Methodological, 35(7), 627-641.
Williams, H.C., 1977. On the formation of travel demand models and eco-
nomic evaluation measures of user benefit. Environment and planning A, 9(3),
CUPUM 2015
Decoding Retail Location …
Zachariadis, V., 2014. Modelling pedestrian systems (Doctoral dissertation,
University College London (University of London)).
CUPUM 2015
Zachariadis, Vargas-Ruiz, Serras, Ferguson & Batty
... Morphet for continuous and useful discussions, and to A. Ialongo for a long and fruitful conversation on the analysis of the results. Authors would also like to acknowledge the contribution of C. Vargas-Ruiz, J. Serras and P. Ferguson in the development of the the model as described in [36] Author contributions statement DP and VZ contributed equally to the scientific research in this work. VZ developed the model, DP implemented it and performed the simulations, and they both analysed the results. ...
Full-text available
Newly available data on the spatial distribution of retail activities in cities makes it possible to build models formalized at the level of the single retailer. Current models tackle consumer location choices at an aggregate level and the opportunity new data offers for modeling at the retail unit level lacks an appropriate theoretical framework. The model we present here helps to address these issues. Based on random utility theory, we have built it around the idea of quantifying the role of floor-space and agglomeration in retail location choice. We test this model on the inner area of Greater London. The results are consistent with a super linear scaling of a retailer’s attractiveness with its floorspace, and with an agglomeration effect approximated as the total retail floorspace within a 300 m radius from each shop. Our model illustrates many of the issues involved in testing and validating urban simulation models involving spatial data and its aggregation to different spatial scales.
Full-text available
Abstract: Commercial classification is essential to describe and compare the spatial patterns of commercial activity. Most classification systems consider a large set of dimensions that include detailed features such as store ownership or development type. Since new business models are continually being developed, the need to revise classification systems is constant. This makes generalisation hard, thus hindering the comparison of commercial structures in different places and periods. Recent studies have focused on cluster analysis and a smaller number of variables to gain insights into commercial structures, directly addressing this issue. Systematic bottom- up classification generates comparable structures, which is essential to contrast policy results in different situations. Furthermore, since form or accessibility are usually considered in classifications, cluster membership is precluded from most retail location models, often relying on the latter as an explanatory variable. Hence, a new classification system is proposed, based on cluster analysis (k-means) and a minimal set of variables: density, diversity, and clustering. This classification was implemented in 1995, 2002, and 2010 in Lisbon. Cross-sectional analysis of the commercial structures shows the system accurately describes commercial location and change, suggesting it can be generalised as a classification system. Since the minimal dataset also allows for cluster membership to be used on location models, the relationship between commercial classification and location modelling could be strengthened, reinforcing the role of commercial studies in urban planning and policymaking.
Full-text available
This study investigates effects of bundling and retail agglomeration on shopping behavior, in particular on in-store purchase incidence and the sequencing of shopping activities. It is argued that many of the consumer benefits of bundling also apply to retail agglomeration and as such can help to explain multipurpose shopping behavior. The paper derives hypotheses and tests these in a choice experiment in which respondents have to purchase a beach holiday. Findings are that, as predicted, consumers are more inclined to buy individual components when they have more information about these components. The hypothesis that they would be also more inclined to buy components when there are more competitors nearby is however falsified; instead it appears that consumers are more likely to either not purchase at all or purchase a bundle if there are more competitors nearby. Effects are also found for time pressure, category experience and the presence of a shopping companion.
Economic activities are not concentrated on the head of a pin, nor are they spread evenly over a featureless plane. On the contrary, they are distributed very unequally across locations, regions, and countries. Even though economic activities are, to some extent, spatially concentrated because of natural features, economic mechanisms that rely on the trade-off between various forms of increasing returns and different types of mobility costs are more fundamental. This book is a study of the economic reasons for the existence of a large variety of agglomerations arising from the global to the local. This second edition combines a comprehensive analysis of the fundamentals of spatial economics and an in-depth discussion of the most recent theoretical developments in new economic geography and urban economics. It aims to highlight several of the major economic trends observed in modern societies.
Random utility modelling has been established as one of the main paradigms for the implementation of land-use spatial interaction (LUSI) models. We present a detailed formal description of a LUSI model that adheres to the random utility paradigm through the explicit distinction between utility and cost across all processes that represent the behaviour of agents. The model is rooted in a social accounting matrix, with the workforce and households accounts being disaggregated by socioeconomic type. Similarly, the land account is broken down by domestic and nondomestic land-use types. The model is developed around two processes. Firstly, the generation of demand for inputs required by established production; when appropriate the implicit production functions are assumed to depend on costs of inputs, which give rise to price-elastic demands. And, secondly, the spatial assignment of demanded inputs to locations of their production; here sequences of decisions are used to distribute demand both spatially and aspatially, and to propagate costs and utilities of production and consumption that emerge from imbalances between supply and demand. The implementation of this generic model is discussed in relation to the case of the UK. The model has been developed for testing the sustainability of integrated economic, spatial development policies, and output information for estimating urban form and the potential for decentralised technologies. The inputs include area-wide socioeconomic forecasts and the allocation policy of urban land. The outputs include the spatial allocation of activities and prices of labour, goods and services, land, and floorspace. They are combined with the land inputs to estimate the changes in the density of urban form and activities. These outputs can then be used to estimate the demands for infrastructure services and the potential for decentralised infrastructure supply. We focus primarily on the calibration process and its methodological implications, including a method of refining the calibration and demonstrate how this improves the spatial representation of the utility of land.
Random utility models are a tool of great interest in the study of shopping centre choice. However, most research that has used these models employed the most basic specification: the multinomial logit model. This model presents two main drawbacks. Firstly, it is assumed that sensitivity to variables of attraction and dissuasion can be considered the same for all consumers. Secondly, this model can only be applied to situations in which alternatives from which you can choose are totally independent. In this paper, we present two specifications which allow us both to introduce heterogeneity and relax the assumption of independence of irrelevant alternatives, a nested logit model and a random effects model. Data collected in three European cities of the same region are used to implement these models. Results attained from these models reveal the existence of different segments of consumers. This provides the potential for retail managers to gain better insight into the way in which different consumers choose between a set of shopping centres in a certain area.