PreprintPDF Available

Examining the Vintage Effect in Hedonic Pricing using Spatially Varying Coefficients Models: A Case Study of Single-Family Houses in the Canton of Zurich

Authors:

Abstract and Figures

This article examines the spatially varying effect of age on single-family house (SFH) prices. Age has been shown to be a key driver for house depreciation and is usually associated with a negative price effect. In practice, however, there exist deviations from this behavior which are referenced to as vintage effects. We estimate a spatially varying coefficients (SVC) model to investigate the spatial structures of vintage effects on SFH pricing. For SFHs in the Canton of Zurich, Switzerland, we find substantial spatial variation in the age effect. In particular, we find a strong vintage effect in the best urban locations compared to pure depreciative age effects in rural locations. Using cross validation, we assess the potential improvement in predictive performance by incorporating additional spatially varying vintage effects in hedonic models. For out-of-sample observations, we find no considerable difference in predictive performance between a classical spatial hedonic and an SVC hedonic model.
Content may be subject to copyright.
a)Institute of Financial Services Zug (IFZ),
Lucerne University of Applied Sciences
and Arts
Suurstoffi 1
CH-6343 Rotkreuz
Jakob A. Dambon (corresponding author)
Mail: jakob.dambon@hslu.ch
orcID: 0000-0001-5855-2017
Fabio Sigrist
Mail: fabio.sigrist@hslu.ch
orcID: 0000-0002-3994-2244
b)Department of Mathematics,
University of Zurich
Winterthurerstrasse 190
CH-8057 Zurich
c)Fahrländer Partner
Raumentwicklung
Seebahnstrasse 89
CH-8003 Zurich
Stefan S. Fahrländer: sf@fpre.ch
Saira Karlen: ska@fpre.ch
Manuel Lehner: ml@fpre.ch
Jaron Schlesinger: js@fpre.ch
Anna Zimmermann: azi@fpre.ch
Examining the Vintage Effect in Hedonic Pricing using 1
Spatially Varying Coefficients Models: 2
A Case Study of Single-Family Houses in the Canton 3
of Zurich 4
Jakob A. Dambona, b, Stefan S. Fahrländerc, Saira Karlenc, Manuel Lehnerc, Jaron Schlesingerc, Fabio 5
Sigrista, Anna Zimmermannc
6
Abstract: This article examines the spatially varying effect of age on single-family house (SFH) prices. 7
Age has been shown to be a key driver for house depreciation and is usually associated with a negative 8
price effect. In practice, however, there exist deviations from this behavior which are referenced to as 9
vintage effects. We estimate a spatially varying coefficients (SVC) model to investigate the spatial 10
structures of vintage effects on SFH pricing. For SFHs in the Canton of Zurich, Switzerland, we find 11
substantial spatial variation in the age effect. In particular, we find a strong vintage effect in the best 12
urban locations compared to pure depreciative age effects in rural locations. Using cross validation, 13
we assess the potential improvement in predictive performance by incorporating additional spatially 14
varying vintage effects in hedonic models. For out-of-sample observations, we find no considerable 15
difference in predictive performance between a classical spatial hedonic and an SVC hedonic model. 16
2
JEL Classifications: C31, C53, R31, R32 17
Keywords: Gaussian process, spatial statistics, real estate, mass appraisal 18
1 Introduction 19
Hedonic real estate models contain several predictor variables, and age is a key explanatory variable. 20
The marginal effect of the building age on house prices has been well-studied. It has been found that 21
the age effect is nonlinear (Goodman & Thibodeau, 1995; Clapp & Giaccotto, 1998; ). In particular, 22
Case, Clapp, Dubin, and Rodriguez (2004) report a “plausible quadratic form” for the building age. This 23
behavior is a result of two main features of the age as an independent variable: i) In general, older 24
buildings depreciate due to deterioration; ii) “however, beyond some point, only those houses with 25
the best locations and the highest construction quality survive.” (Case, Clapp, Dubin, & Rodriguez, 26
2004, p. 171). The paraboloid appearance of the age effect has also been observed by Fahrländer 27
(2006) and linked to the building material and architectural style. Studies investigating this particular 28
type of behavior, i.e., a deviation from a pure depreciative effect once a particular age has been 29
reached, are referencing to it as a vintage effect (Goodman & Thibodeau, 1995; Clapp & Giaccotto, 30
1998; Rubin, 1993). 31
Over the last two decades, there emerged a special focus on location specific effects due to newly 32
available modeling methodologies. There are numerous publications which show a clear indication of 33
spatially varying covariate effects within hedonic pricing models. For instance, when applying additive 34
mixed regression models on rents in Vienna (Austria), Brunauer, Lang, Wechselberger, and Bienert 35
(2010) find “substantial spatial variation” of covariate effects between the districts of Vienna. 36
Existing methods to model such spatially varying coefficients (SVC) are Bayesian processes (Gelfand, 37
Kim, Sirmans, & Banerjee, 2003) and geographically weighted regression (Fotheringham, Brunsdon, & 38
Charlton, 2002). Applications of these methods consistently show the existence of non-stationary 39
coefficients, e.g., Baton Rouge (LA, United States) (Gelfand, Kim, Sirmans, & Banerjee, 2003), in 40
3
Toronto (ON, Canada) (Wheeler, Páez, Spinney, & Waller, 2014), Singapore (Cao, Diao, & Wu, 2019; 41
van Eggermond, Lehner, & Erath, 2011), and Shenzhen (China) (Geng, Cao, Yu, & Tang, 2011). 42
The goal of this paper is to unite the two frameworks investigating a possible spatially varying vintage 43
effect and to possibly enhance prediction performance of hedonic models using this feature. One of 44
the first observations of spatial differences in the age effects can be found in Malpezzi, Ozanne, and 45
Thibodeau (1987). They compared individual hedonic models for 59 metropolitan areas in the United 46
States and concluded that “[s]everal metropolitan areas exhibited significant deviations from the 47
average depreciation patterns.” (Malpezzi, Ozanne, & Thibodeau, 1987, p. 382). More recent evidence 48
for such behavior are presented in Brunauer, Lang, Wechselberger, and Bienert (2010) as well as 49
Dambon, Sigrist, and Furrer (2020), who found pronounced spatially varying effects on the rents and 50
the prices of apartments, respectively. 51
In this paper, we will model spatially varying vintage effects for single-family houses (SFH) in the Canton 52
of Zurich (Switzerland). Our working hypothesis is that, on average, age has a negative effect on the 53
house prices. However, as indicated above, we expect spatial deviations as we assume that there exist 54
municipalities or city districts where a vintage effect is present. An important question that arises from 55
this is whether, given the existence of such a vintage effect, it can be used to improve predictive 56
performance of hedonic models. 57
To verify our hypothesis on spatial varying vintage effects, we will use a new methodology introduced 58
by Dambon, Sigrist, and Furrer (2020) to model SVCs using Gaussian processes (GP). One of the 59
difficulties of classical GP based SVC models is that they do not scale to large data sets. The novel 60
methodology, in particular, allows for applying model-based SVC models to large spatial data sets. In 61
the next section, we first introduce and then extend the definition on SVC models and, in particular, 62
GP-based SVC models. In Section 3, we present the real estate data and justify the model. The results 63
are discussed in Section 4. In Section 5 we turn to predictive performance before discussing our results 64
in Section 6. 65
4
2 SVC Models 66
SVC models are a generalization of classical linear regression models, where we allow the regression 67
coefficients to vary over space. That is, the effect of a covariate () denoted by the coefficient can 68
depend on a geographic location , which we assume to be two-dimensional. SVC models can be 69
applied to spatial points data sets, where for each of the observations of the response variable 70
(,…,) and covariates ()
(),…,
(),= 1, … , , every observation has 71
an associated location . In summary, SVC models are defined as 72
=()
()
++()
()
+, (1)
where = 1, … , indexes the observations with their corresponding locations and is a classical 73
(0, ) iid error term with > 0. 74
If one assumes that not all coefficients should contain spatial structures, one can define mixed SVC 75
models. Let with 1 be the number of covariates for which we want to model SVCs. Without 76
loss of generality, we define the mixed SVC model as 77
=()
()
++()
()
+
()
++
()
+.
(2)
From now on, we assume that the first coefficient = 1 always models an intercept. In the special case 78
of = 1, we have the classical geostatistical model that is also used in most hedonic models. The exact 79
assumptions for the coefficients (), = 1, … , , and how they are estimated, have yet to be 80
defined. The literature on how to do so for both the classical geostatistical and SVC models is extensive. 81
For geostatistical models, see Cressie (2011) and Heaton et al. (2019) for an overview. For SVC models, 82
see Dambon, Sigrist, and Furrer (2020), Wheeler and Calder (2007), and Wheeler and Waller (2009) for 83
comparisons. 84
2.1 Gaussian Process-based SVC Models 85
We specify the SVC model such that each coefficient is defined by a Gaussian process. Gaussian 86
processes are well-studied (Rasmussen & Williams, 2006) and widely used tools to model dependency 87
structures with applications including but not limited to spatial statistics (Gelfand & Schliep, 2016; 88
5
Banerjee, Gelfand, Finley, & Sang, 2008; Datta, Banerjee, Finley, & Gelfand, 2016), econometrics (Wu, 89
Hernández-Lobato, & Ghahramani, 2014), and time series modeling (Roberts, et al., 2013). They are 90
infinite dimensional stochastic processes that are defined similarly to a finite-dimensional normal 91
distribution. We assume the GP to be jointly independent as well as independent of the error term 92
(, … , )(,). For observations =(, … , ), they are given by 93
(
)
,
()
,
(3)
for = 1, … , . We assume a constant mean and a covariance matrix (), which is defined by a 94
covariance functions () and the corresponding observation locations . The observation locations are 95
being used to model the dependency between observations by computing the distances. In spatial 96
statistics, one usually assumes that closer observations share higher dependency than observations 97
which are far apart1. We use the Euclidean distance denoted by which yields pair-wise distances 98
 between all observations, 1,. Here, we assume to have exponential 99
covariance functions ()()= exp 
,0, parametrized by variances 0 and 100
ranges > 0. The former parameter defines the extent of variation within an SVC () and the latter 101
defines the decay of spatial dependency with distance. The covariance function is then applied to the 102
distances, which yields the following corresponding covariance matrix 103
(): = ()()=exp
. 104
2.1.1 Example of two sampled Gaussian Processes 105
In this section, we illustrate the interpretation of the parameters for a GP with the help of two samples 106
of a GP. Both are defined by their corresponding parameters given in Table 1. Under the assumption 107
of an exponential covariance function, these parameters, more specifically, the ranges and variances, 108
define the covariance functions given in Figure 1. With the given covariance functions as well as the 109
1 This statement is also referred to as the first law of geography according to Waldo R. Tobler: Everything is
related to everything else, but near things are more related than distant things.(Tobler, 1970).
6
mean parameters, we sample the GPs on a regular 101 ×101 from the unit square. The sampled GPs 110
are given in Figure 2. 111
Mean
Range
Variance
Parametrization 1
2
0.25
1
Parametrization 2
-1
0.10
2
112
Figure 1: Covariance functions. Two exponential covariance functions which are depending on a
distance d and parametrized as given in Table 1.
One can clearly see that the variance in
Parametrization 1 is lower than in Parametrization 2. On the other hand, Parametrization 1 has a
greater range which leads to slower decay of the covariance function over distance.
113
7
Figure 2: Visualization of two sampled Gaussian processes. The parametrization and covariance
functions are given in Table 1 and Figure 1, respectively. The GPs values are given at the respected
coordinates x and y in the unit square. The sampled value is given via the color scale.
The influence of each of the corresponding 3 parameters, i.e., the mean , the range , and the 114
variance , can be directly seen from the individual visualized samples in Figure 2. First, we note that 115
the values of each parametrization are scattered around their individual means. The greater range of 116
parametrization 1 relative to parametrization 2 expresses itself by larger color patches in Figure 2. The 117
greater variance of parametrization 2 leads to a wider range of values in the simulation which 118
manifests itself by a wider color range in the visualization. 119
2.2 Maximum Likelihood Estimation of GP-based SVC Models 120
We give a brief summary of a maximum likelihood estimation (MLE) approach for SVC models as 121
introduced in Dambon, Sigrist, and Furrer (2020). Additionally, we extend the framework such that not 122
only full GP-based SVC models as given in (1), but also mixed GP-based SVC models as given in (2) can 123
be estimated. 124
8
With a data matrix , where the entry (): = () is the -th observation of the -th covariate, a 125
mean vector ,…,, and using the independence assumption from above, the 126
distribution of the response is given by 127
, ()
 ()()+ . (4)
The differences between the response’s distribution as above and as given in Dambon, Sigrist, and 128
Furrer (2020) are twofold. The first entries of the mean vector are the means of the GP as defined 129
in (3), while the further entries are the coefficients , … , . For simplicity, we identify them with 130
, … , , respectively. The second difference is the sum building the covariance matrix. Since only 131
covariates = 1, … , are defined to have SVCs, only covariance matrices and the respective 132
covariates enter the sum. 133
The model is thus fully parametrized by the covariance parameters ,
, … , ,
,134
 and the mean parameters . We define (,) as our parameter of interest 135
which we estimate by maximizing the log-likelihood of (4). Since there exists no analytical solution, we 136
have to turn to numeric optimization. Once the estimate
is found, one can use it to predict the SVCs 137
for (new) locations  using the conditional distribution, i.e., one obtains
(), = 1, … , . The 138
estimator and predictor is implemented in the statistical software R (R Core Team, 2020) and can be 139
used via the package varycoef (Dambon, Sigrist, & Furrer, 2019). 140
3 Data & Model 141
3.1 Data 142
The analysis is based mainly on transaction data and to a small extent on offer data for SFH in the 143
Canton of Zurich including a margin of 10 km width to account for margin effects when modeling the 144
data on the border of the Canton of Zurich. The data is provided by Fahrländer Partner 145
Raumentwicklung (FPRE), Zurich (Switzerland) and was collected (except for the offer data) by Swiss 146
9
banks and insurance companies in their day-to-day business2. It covers a time span of 6 consecutive 147
quarters ranging from the last quarter of 2018 to the first quarter of 2020 and consists of 2904 148
observations. For locations with limited data availability, namely for those locations with less than 3 149
observations, we used carefully selected offer data to enhance the available transaction data. By doing 150
so we obtained a data set consisting of about 20% offer data (583 observations) and 80% transaction 151
data (2321 observations). The data set contains approximately 45% of the transactions in the Canton 152
of Zurich for the given time period. An overview of the data alongside with some summary statistics is 153
given in Table 2 (a) and (b). 154
Due to Swiss banking secrecy, the exact geographic locations of the SFH cannot be disclosed. Here, 155
FPRE works with a fine grid of cells. For each observation we know the corresponding cell it falls in and 156
the location is given by a representative centroid of that cell, c.f. Table 2 (c) and Figure 3. The centroid’s 157
location is provided in the LV03 coordinate reference system (Federal Offce of Topography swisstopo, 158
1900). The cell’s resolution is higher in densely populated areas. The median cell size is 3.658 , 159
with the total range of areas extending from 0.147  to 36.316 . In total, we observe data at 160
618 distinct cells, of which 337 (equal to 1678 observations) lie in the Canton of Zurich. Additionally, 161
each cell is labeled with a location type, see Table 2 (c), which will turn out helpful when analyzing our 162
findings in Section 4. 163
2 A total of three observations were removed from the data set, for which real estate experts from Fahrländer
Partner Raumentwicklung (FPRE) assume that they were incorrectly classified as arm’s length transactions.
10
Table 2: Description and summary statistics of underlying data set.
(a) Continuous variables
Variable
Description
Min.
Median
Max.
Std. Deviation
price
Adjusted transaction price
in Swiss Francs
excluding parking and
special factors
120000
1’260000
10900000
917821
yoc
Year of construction
1920
1995
2020
26
volume
Building volume in (SIA
Zürich, 2003)
290
803
3134
300
plot size
Plot size in
100
489
3232
333
renov
Need for renovation
(difference between actual
and theoretical building
condition, higher meaning
better; h.m.b.)
0.00
0.00
4.00
0.83
standard
Standard; h.m.b.
2.00
3.00
5.00
0.65
micro
Micro-location ; h.m.b.
2.00
3.50
5.00
0.64
11
(b) Categorical variables, reference level in italics.
Variable
Description
Levels
Observations
transaction
quarter
Transaction quarter
20184: 4th Quarter of 2018
20191: 1st Quarter of 2019
20192: 2nd Quarter of 2019
20193: 3rd Quarter of 2019
20194: 4th Quarter of 2019
20201: 1st Quarter of 2020
430
562
571
579
389
373
energy
Energy standard
1: Insulated shell
2: Enhanced energy efficiency
2791
113
SFH type
Type of SFH
1: detached
2: semi-detached
3: row house
1680
846
378
(c) Observation locations
Coordinates
Description
Range
Easting 03
Coordinates in the LV03 coordinate
reference system (Federal Office of
Topography swisstopo, 1900) in .
200 ×10800 ×10
Northing 03
100 ×10400 ×10
Variable
Description
Levels
Observations
FPRE type
Type of location
1: Top-Locations
2: Urban Agglomerations
3: Other Agglomerations
4: Rural Areas
439
898
885
682
164
12
Figure 3: Spatial distribution of observations and classification of locations within the Canton of
Zurich. The Canton of Zurich is divided into 563 cells which are classified into types of location. The
number of observations within such cells is depicted with the color-coded observation count. The
type of cell and the representative location are depicted by the orange symbols.
3.2 Model 165
The model has the natural logarithm of the transaction price as the response variable. Further, we 166
standardize the year of construction (yoc) using the following transformation 167
.=
2000
20 .
The advantage of working with . rather than the actual year of construction or age is a numerical 168
stable optimization process of the MLE. As mentioned above, we expect a quadratic effect (Case, 169
13
Clapp, Dubin, & Rodriguez, 2004; Clapp & Giaccotto, 1998; Goodman & Thibodeau, 1995; Fahrländer, 170
2006), which is why we also include the covariate .. As we expect spatial variation in these 171
coefficients, we use SVCs for these variables, c.f. first line in (5). The plot size as well as the volume 172
enter the model under a natural logarithm transformation. The rest of the continuous covariates 173
, and  are included without further transformation. Thus, all continuous 174
covariates have approximately the same standard deviations which results in a well-behaved numeric 175
optimization procedure for estimating the model. 176
The categorical variables  , and   and the error term complete 177
our model which can be formulated as: 178
= log =
(
)
+
(
)
.+
(
)
.
+log +log  
+++
+ + + 
+ .
(5)
Comparing the general mixed SVC model (2) and our explicit hedonic model (5) we note that we have 179
= 3 and =16 including the intercept and all factor levels deviating from the reference levels. The 180
model is therefore fully parametrized by 181
= (,)=(
,
,
,
,
,
,,
, … ,

).
We will use a numeric optimization over the profile likelihood. Thus, we must optimize over the 182
covariance parameters and the mean parameters are determined implicitly by calculating the 183
generalized least square estimate. 184
3.3 Observation Locations 185
As the LV03 coordinates for the centroid’s locations =(03,03) cover a fairly large range, 186
we standardized them to kilometers using the following formula: 187
14
.3
.3=1003
03600000
200000
Again, this ensures a well-behaved numeric optimization while remaining interpretable as the ranges 188
now act as a scaling factor on the kilometer distances. 189
4 Results 190
4.1 Parameter Estimates 191
We first take a look on the ML estimates
, which are given in Table 3. Here, we find that the 192
mean estimates match our expectations. In particular, the vintage-related covariates, i.e., . and 193
., show the following: 194
1) Reminding ourselves that . is derived from the year of construction (yoc) and not the 195
building age, the positive mean effect is equivalent to a depreciation of the building price with 196
increasing age. 197
2) Further, the quadratic effect . is very close to 0, but positive. A larger quadratic mean 198
effect would correspond to an emphasized vintage effect. 199
All other mean effects have plausible signs, too. That is, all other coefficients of continuous covariates 200
are positive and therefore in line with our expectations. As for the categorical covariates, we observe 201
some temporal price volatility for the transaction quarter, a premium for stand-alone, detached SFH 202
compared to other SFH for the type of SFH and a premium for houses with enhanced energy efficiency 203
for the energy standard. 204
The estimates for ranges show that the range for the intercept is considerably larger than those for 205
. and .. This will be expressed by larger spatial structures for the SVC modeling the 206
intercept compared to the other two SVCs, see Table 3. The small range of . and . indicate 207
that the corresponding SVCs will behave much more selective in their deviations from the mean. 208
Finally, the variances suggest higher variations in the first SVC, which can also be interpreted as a mean 209
pricing level for the respective location. 210
15
211
212
213
214
215
Table 3: Mean and covariance estimates

of SVC model (5).
Covariates
Mean 
Range
Variance
and 
Intercept
8.951622
28.439374
0.118325
.
0.107510
2.041053
0.001574
.
0.005834
2.197881
0.000313
log()
0.478684
log( )
0.189723

0.032742

0.095569

0.030550
factor( )20191
-0.002353
factor( )20192
0.012518
factor( )20193
0.009341
factor( )20194
0.016490
factor( )20201
0.012574
factor( )2
-0.017312
factor( )3
-0.042931
factor()2
0.028165
nugget
0.025688
16
Table 4: Summary Statistics of SVCs
a) estimated
Intercept
()
.
(
)
.
(
)
Minimum
8.922
-0.003295
-0.026603
Mean
9.389
0.100902
0.006979
Maximum
10.088
0.149384
0.056561
b) spatially predicted
Intercept
(
)
.
() .
()
Minimum
8.883
0.035420
-0.010333
Mean
9.333
0.103460
0.006225
Maximum
10.079
0.137420
0.035978
216
4.2 Visualization and Interpretation of SVCs 217
In Figure 4 we visualize fitted and predicted SVCs. Specifically, the figure shows the estimated SVCs for 218
the observation locations the model has been trained on as well as for the spatial predictions for all 219
other cell’s centroid where we did not have any observations. The quality of these coincides with the 220
previous parameter estimates interpretations from Section 4.1 and real estate experts’ knowledge. 221
For the intercept’s SVC, c.f. Figure 4 (a), which also can be interpreted as a mean price level, we can 222
see that the highest values are achieved close to the city or Zurich and the Lake Zurich, with a local 223
peak at the city of Winterthur. As expected, the lowest values can be found towards the North and 224
East, which are more rural areas. 225
17
Figure 4: Estimated and spatially predicted SVCs of model (5). Note that the coefficients’ value spans
are according to their variance estimates , i.e., descending from (a) to (c).
As for the . and . SVCs, c.f. Figure 4 (b) and (c), one can see small scale local deviations. 226
Examples are a very small value of the . SVC to the West of Zurich (in Table 4 one can verify that 227
the negative SVC effect is almost identical to 0) and a relatively high value for the . SVC close to 228
the city center. There is also some clustering of below average values (recall . 0.11) in Zurich, 229
adjacent to Lake Zurich and the city of Winterthur. However, an individual interpretation of both 230
panels is cumbersome and inadequate as the fitted SVCs originate from the same covariate. As we are 231
simultaneously modeling a linear and quadratic effect, one could therefore interpret the results as 232
18
spatially varying quadratic effects. Using the SVCs
() and
() for all observation locations  233
within the training data and the Canton of Zurich, we back-transform the estimated effects to receive 234
the marginal effect (,) for the year of construction [1920,2020]: 235
(,)()
2000
20 +()
2000
20
. (6)
236
Figure 5: Back-transformed, aggregated effect of year of construction. The grey curves correspond
to the marginal effects as defined in (6) and the red lines are the aggregated marginal effects as
defined in (7). The most extreme are displayed with their respective names, i.e., Fluntern (a district
of the city of Zurich), Bonstetten (a suburb to the West of the city of Zurich), and Guggenbühl (a
district of the city of Winterthur).
This is what we visualize in Figure 5 in averaged form for 4 different location types. The grey lines are 237
the marginal effects (,), grouped in panels by FPRE type of location and filtered such that 238
i) there are at least 5 observations per location  and ii) cropped to the span of observed years of 239
construction at corresponding location. This is to ensure that we have sufficient data backing up the 240
results and that we do not extrapolate to unobserved years of construction. We define these set of 241
observation locations as ,{1, 2,3,4}. The red line is obtained by aggregating all marginal effects 242
by type of location, i.e., 243
19
,()=
1
||(,)

(7)
for year of construction [1920,2020]. In all panels in Figure 5 a depreciation part is present for all 244
years of construction >1975. This holds not only for all ,, but also for most of individual 245
(,). A first hint of a vintage effect is observable for , with = 1, i.e., top locations, 246
and = 3, i.e., other agglomerations. The estimated effects have pronounced curvatures which are 247
going back to a substantial quadratic component, i.e.,
(). The other two effects with = 2, i.e., 248
urban agglomerations, and = 4, i.e., rural areas, appear almost linear which means that the age of a 249
building is exclusively associated with a price depreciation. 250
Looking at the individual marginal effects (,), one can observe some variety within each 251
location type . However, the conveyed message stays the same. At top locations (= 1) a vintage 252
effect is present where some of the oldest SFHs have the same marginal effect for year of construction 253
as just recently built SFHs. Here, one location (Fluntern, a district of Zurich) stands out as it gives a 254
premium of approximately 0.5 log-points for an SFH built in the 1920s compared to an SFH built in the 255
1960s and 1970s, ceteris paribus. At other types of locations, the individual marginal effects are of a 256
purely depreciating nature, with exception to 3 locations at other agglomerations (= 3); for instance 257
Guggenbühl (a district of Winterthur). At this location, we again note that the observed real estate 258
objects are rather old, but, that the range of the marginal effect is quite small. These two locations 259
therefore represent two urban districts with rather old real estate objects but two very different 260
marginal effects of age, ceteris paribus. While the prices of SFHs in Guggenbühl seem rather in-261
sensitive to age, very old SFHs are sought after in Fluntern. Here, as well as in other top locations, the 262
vintage effect is most pronounced. In any case, both manifestations of the marginal age effects are 263
deviations from the usual depreciative nature of age with respect to SFH pricing. Finally, we address 264
the municipality of Bonstetten and its marginal effect  displayed in Figure 5. It suggests a relatively 265
steep depreciation with age. The strongly negative age effect in Bonstetten is mainly driven by two 266
observations with year of construction 1920, being sold at relatively low transaction prices. As we 267
20
observe no transactions with year of construction between 1920 and 1979, the two observations 268
mentioned have a high leverage and the marginal effect  is extrapolated. Therefore, the depicted 269
age effect for Bonstetten should be treated with caution. 270
We conclude this section by noting that we observe spatially varying age effects which clearly deviate 271
from a pure depreciation. These effects are locally pronounced and mostly appear at top locations, 272
which backs our hypothesis of spatially varying vintage effects and is in line with both initial citations 273
taken from Case, Clapp, Dubin, and Rodriguez (2004) and Malpezzi, Ozanne, and Thibodeau (1987). 274
5 Predictive Performance 275
In this section, we will assess the implications for predictive performance of our findings from above. 276
As suggested in Section 4, there appears to be a spatially varying age effect that deviates from a linear 277
depreciation with age. Now, we investigate if one can exploit this to enhance classical hedonic models 278
to increase predictive performance. 279
To examine this, we validate and compare our findings to a classical hedonic model with only the mean 280
price, i.e., the intercept depending on spatial location . Thus, we use a geostatistical model similarly 281
defined as the SVC model in (5) but with = 1: 282
= log 
=
(
)+
.
+
.
+log +log  
+++
+ + + 
+ .
(8)
Using the same data (2904 observations) as before, the two models (5) and (8) were trained on two 283
folds in a very similar manner as actual hedonic pricing is conducted. That is, the first 5 quarters of 284
transaction data was exclusively used as training data. For the validation data, observations from the 285
last observed quarter, i.e., the 1st quarter of 2020, and from within the Canton of Zurich were selected 286
and randomly divided into two folds of 113 observations each. The corresponding rest is being used 287
21
for the training data. Therefore, for both folds the training data consists of 2791 observations and 113 288
testing locations, respectively. The split of the data into training and validation sets is illustrated in 289
Table 5. 290
Table 5: Layout of validation.
Quarters:
2018Q4
2019Q1
2019Q2
2019Q3
2019Q4
2020Q1
Within Ct. of ZH:
Outside Ct. of ZH:
Legend:
Training
Training
Validation
Fold 1:
Fold 2:
291
The root mean square error (RMSE) was chosen as a measure of comparison and computed for in-292
sample estimates and out-of-sample predictions. The results are given in Table 6. While the SVC model 293
has an advantage in terms of in-sample fit, the RMSE of the two models on the out-of-sample data are 294
virtually identical. This shows that while the SVC models are better at accounting for spatially varying 295
age effects as discussed in the previous section, this feature does not translate to more accurate out-296
of-sample predictions. Probably, the reason behind this is a too heterogeneous set of samples within 297
each location grid cell. As the Swiss banking secrecy does not allow to disclose exact SFH locations, the 298
data set at hand reaches its limits with respect to application of SVC models. A possible vintage effect 299
is present at a much smaller scale, or, even at an object individual level. A possible increase of 300
predictive performance would only be observable if spatially varying age effects, such as vintage 301
effects, would be observable at a greater scale. Thus, with the current resolution of the spatial data 302
set at hand, we cannot use this to our advantage. 303
22
Table 6: Results of the two-fold cross validation between the SVC and geostatistical model. The
best performing method is highlighted using italic font.
In-Sample RMSE
Out-of-Sample RMSE
Model
Fold 1
Fold 2
Fold 1
Fold 2
SVC Model (5)
0.14637
0.13732
0.18888
0.20999
Geostatistical Model (8)
0.16036
0.15980
0.18744
0.19669
6 Conclusion 304
To the best of our knowledge, the presented work is the first of its kind to investigate a spatially varying 305
age effect for SFH. While we find a purely depreciative age effect for some locations in the Canton of 306
Zurich, there appears to be a substantial price premium for older SFHs for other locations. The 307
existence of a not purely depreciative age effect is in line with the scientific literature and the 308
assumptions of real estate experts. In this context, we consider it very likely that age or the year of 309
construction acts as a proxy for unmeasured covariates that directly have an impact on prices, such as 310
quality of built or architectural style (e.g. room height, architectural details) of the object as has been 311
suggested by the existing literature (Case, Clapp, Dubin, & Rodriguez, 2004; Goodman & Thibodeau, 312
1995). 313
Overall, our analysis suggests a spatially varying or at least object specific age effect, which for certain 314
locations manifests as a vintage effect. Further research on the topic based on data from different 315
regions or with much higher resolution would be desirable. 316
317
318
319
320
23
7 List of Abbreviations 321
Ct.
Canton
FPRE
Fahrländer Partner Raumentwicklung
GP(s)
Gaussian process(es)
h.m.b.
higher meaning better
ML(E)
maximum likelihood (estimation)
RMSE
root mean square error
SFH
single-family house
SVC
spatially varying coefficients
yoc
year of construction
ZH
Zurich
322
8 Declarations 323
8.1 Availability of Data and Materials 324
The data used for this analysis is subject to Swiss banking secrecy and can therefore neither be made 325
available publicly nor on request. 326
8.2 Competing Interests 327
The authors declare that they have no competing interests. 328
8.3 Funding 329
This study was jointly funded by Innosuisse (the Swiss Innovation Agency) and Fahrländer Partner 330
Raumentwicklung (FPRE) in the framework of a project on space-time machine learning models for 331
valuation and prediction of real estate objects (Innosuisse project number 28408.1 PFES-ES). The 332
design of this study, the collection, analysis and interpretation of the data and the writing of the 333
manuscript were not influenced by the funding body. 334
24
8.4 Author’s Contributions 335
JD and FS contributed the statistical fundamentals serving as a basis for this paper. The model 336
estimates and other calculations were carried out by SK, with support from JD. An analysis of the data 337
and results was performed by JD and SK. The interpretation of the results was performed to a large 338
extent by JD, with support from ML, AZ and JS. JD was responsible for writing the paper, with selective 339
contributions from SK. FS, ML, AZ and SF have revised the paper and have initiated a number of 340
changes to the paper. 341
8.5 Acknowledgements 342
Not applicable. 343
9 References 344
Banerjee, S., Gelfand, A. E., Finley, A. O., & Sang, H. (2008). Gaussian Predictive Process Models for 345
Large Spatial Datasets. Journal of the Royal Statistical Society, Series B. 70(4), pp. 825-848. 346
Brunauer, W. A., Lang, S., Wechselberger, P., & Bienert, S. (2010). Additive Hedonic Regression Models 347
with Spatial Scaling Factors: An Application for Rents in Vienna. Journal of Real Estate Finance 348
and Economics 41(4), pp. 390-411. 349
Cao, K., Diao, M., & Wu, B. (2019). A Big Data-Based Geographically Weighted Regression Model for 350
Public Housing Prices: A Case Study in Singapore. Annals of the American Association of 351
Geographers 109(1), pp. 173-186. 352
Case, B., Clapp, J. M., Dubin, R., & Rodriguez, M. (2004). Modeling Spatial and Temporal House Price 353
Patterns: A Comparison of Four Models. Journal of Real Estate Finance and Economics 29(2), 354
pp. 167-191. 355
Clapp, J. M., & Giaccotto, C. (1998). Residential Hedonic Models: A Rational Expectations Approach to 356
Age Effects. Journal of Urban Economics 44, pp. 415-437. 357
25
Cressie, N. (2011). Statistics for Spatio-Temporal Data. Hoboken, N.J: Wiley. 358
Dambon, J. A., Sigrist, F., & Furrer, R. (2019). varycoef: Modeling Spatially Varying Coefficients. 359
Retrieved from https://cran.r-project.org/web/packages/varycoef/index.html 360
Dambon, J. A., Sigrist, F., & Furrer, R. (2020). Maximum Likelihood Estimation of Spatially Varying 361
Coefficient Models for Large Data with an Application to Real Estate Price Prediction. Spatial 362
Statistics 100470. Retrieved from https://doi.org/10.1016/j.spasta.2020.100470 363
Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical Nearest Neighbor Gaussian 364
Process Models for Large Geostatistical Datasets. Journal of the American Statistical 365
Association 111(514), pp. 800-812. 366
Fahrländer, S. S. (2006). Semiparametric Construction of Spatial Generalized Hedonic Models for 367
Private Properties. Swiss Journal of Economics and Statistics 142(4), pp. 501528. 368
Federal Offce of Topography swisstopo. (1900). LV03. Retrieved from 369
https://www.swisstopo.admin.ch/en/knowledge-facts/surveying-geodesy/reference-370
frames/local/lv03.html 371
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically Weighted Regression: The 372
Analysis of Spatially Varying Relationships. Chichester: Wiley. 373
Gelfand, A. E., & Schliep, E. M. (2016). Spatial Statistics and Gaussian Processes: A Beautiful Marriage. 374
Spatial Statistics 18(Part A), pp. 86-104. 375
Gelfand, A. E., Kim, H.-J., Sirmans, C. F., & Banerjee, S. (2003). Spatial Modeling with Spatially Varying 376
Coefficient Processes. Journal of the American Statistical Association 98(462), pp. 387-396. 377
Geng, J., Cao, K., Yu, L., & Tang, Y. (2011). Geographically Weighted Regression Model (GWR) based 378
Spatial Analysis of House Price in Shenzhen. 2011 19th International Conference on 379
Geoinformatics, pp. 1-5. 380
26
Goodman, A. C., & Thibodeau, T. G. (1995). Age-Related Heteroskedasticity in Hedonic House Price 381
Equations. Journal of Housing Research 6(1), pp. 25-42. 382
Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., . . . Zammit-Mangion, A. 383
(2019). A Case Study Competition Among Methods for Analyzing Large Spatial Data. Journal of 384
Agricultural, Biological and Environmental Statistics 24(3), pp. 398-425. 385
Lasinio, G. J., Mastrantonio, G., & Pollice, A. (2013). Discussing the "big n problem". Statistical Methods 386
& Applications 22, pp. 97-112. 387
Malpezzi, S., Ozanne, L., & Thibodeau, T. G. (1987). Microeconomic Estimates of Housing Depreciation. 388
Land Economics 63(4), pp. 372-385. 389
R Core Team. (2020). R: A Language and Environment for Statistical Computing. Retrieved from 390
http://www.R-project.org/. 391
Rasmussen, C. E., & Williams, C. K. (2006). Gaussian Processes for Machine Learning. Cambridge: MIT 392
Press. 393
Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2013). Gaussian Processes for 394
Time-Series Modelling. Philosophical Transactions of the Royal Society A 371: 20110550. 395
Rubin, G. M. (1993). Is Housing Age a Commodity? Hedonic Price Estimates of Unit Age. Journal of 396
Housing Research 4(1), pp. 165-184. 397
SIA Zürich. (2003). SIA 416, Flächen und Volumen von Gebäuden. Zürich. 398
Tobler, W. R. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region. Economic 399
Geography 46(Supplement), pp. 234-240. 400
van Eggermond, M., Lehner, M., & Erath, A. (2011). Modeling Hedonic Prices in Singapore. Retrieved 401
from 402
27
https://www.researchgate.net/publication/266868391_MODELING_HEDONIC_PRICES_IN_SI403
NGAPORE 404
Wheeler, D. C., & Calder, C. A. (2007). An Assessment of Coeffcient Accuracy in Linear Regression 405
Models with Spatially Varying Coeffcients. Journal of Geographical Systems 9(2), pp. 145-166. 406
Wheeler, D. C., & Waller, L. A. (2009). Comparing Spatially Varying Coeffcient Models: A Case Study 407
Examining Violent Crime Rates and their Relationships to Alcohol Outlets and Illegal Drug 408
Arrests. Journal of Geographical Systems 11(1), pp. 1-22. 409
Wheeler, D. C., Páez, A., Spinney, J., & Waller, L. A. (2014). Bayesian Hedonic Price Analysis. Papers in 410
Regional Science 93(3). 411
Wu, Y., Hernández-Lobato, J. M., & Ghahramani, Z. (2014). Gaussian Process Volatility Model. Advances 412
in Neural Information Processing Systems 27 (NIPS 2014). 413
414
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In regression models for spatial data, it is often assumed that the marginal effects of covariates on the response are constant over space. In practice, this assumption might often be questionable. In this article, we show how a Gaussian process-based spatially varying coefficient (SVC) model can be estimated using maximum likelihood estimation (MLE). In addition, we present an approach that scales to large data by applying covariance tapering. We compare our methodology to existing methods such as a Bayesian approach using the stochastic partial differential equation (SPDE) link, geographically weighted regression (GWR), and eigenvector spatial filtering (ESF) in both a simulation study and an application where the goal is to predict prices of real estate apartments in Switzerland. The results from both the simulation study and application show that the MLE approach results in increased predictive accuracy and more precise estimates. Since we use a model-based approach, we can also provide predictive variances. In contrast to existing model-based approaches, our method scales better to data where both the number of spatial points is large and the number of spatially varying covariates is moderately-sized, e.g., above ten.
Article
Full-text available
The Gaussian process is an indispensable tool for spatial data analysts. The onset of the "big data" era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics. Supplementary materials regarding implementation details of the methods and code are available for this article online. Electronic supplementary material: Supplementary materials for this article are available at 10.1007/s13253-018-00348-w.
Article
Full-text available
This paper presents hedonic regressions for the Singaporean residential real-estate market. To these means, asking prices were collected from an online commercial property portal in February 2011. Transaction prices were collected from governmental data sets. These data sets are enhanced with locational data, such as vicinity to bus stops, MRT stations, supermarkets, (top) primary schools and other points of interests. Models were estimated with standard OLS, spatial auto regressive and geographically weighted regression methods for several sub-markets: private rental & buying and public (HDB) rental & buying. Floor area and distance to CBD are the most important drivers of house price. Dependent on the market, vicinity to public transport has a positive result. A higher floor level is considered positive as well. Furthermore, we find that spatial models function better than traditional OLS models and that using asking prices and transaction prices yields similar results despite the large difference between both types of prices.
Article
In this research, three hedonic pricing models, including an ordinary least squares (OLS) model, a Euclidean distance–based (ED-based) geographically weighted regression (GWR) model, and a travel time–based GWR model supported by a big data set of millions of smartcard transactions, have been developed to investigate the spatial variation of Housing Development Board (HDB) public housing resale prices in Singapore. The results help identify factors that could significantly affect public housing resale prices, including the age and the floor area of the housing units, the distance to the nearest park, the distance to the central business district (CBD), and the distance to the nearest Mass Rapid Transit (MRT) station. The comparison of the three models also explicitly shows that the two GWR models perform much better than the traditional linear hedonic regression model, given the identical variables and data used in the calibration. Furthermore, the travel time–based GWR model has better model fit compared to the ED-based GWR model in the case study. This study demonstrates the potential value of the big data–based GWR model in housing research. It could also be applied to other research fields such as public health and criminal justice.
The accurate prediction of time-changing variances is an important task in the modeling of financial data. Standard econometric models are often limited as they assume rigid functional relationships for the variances. Moreover, function parameters are usually learned using maximum likelihood, which can lead to overfitting. To address these problems we introduce a novel model for time-changing variances using Gaussian Processes. A Gaussian Process (GP) defines a distribution over functions, which allows us to capture highly flexible functional relationships for the variances. In addition, we develop an online algorithm to perform inference. The algorithm has two main advantages. First, it takes a Bayesian approach, thereby avoiding overfitting. Second, it is much quicker than current offline inference procedures. Finally, our new model was evaluated on financial data and showed significant improvement in predictive performance over current standard models.
Article
Two important objectives in hedonic price analysis are to predict sale prices and delineate submarkets based on geographical and functional considerations. In this paper, we applied Bayesian models with spatially varying coefficients in an analysis of housing sale prices in the city of Toronto, Ontario to address these objectives. We evaluated model performance and identified patterns of submarkets indicated by the spatial coefficient processes. Our results show that Bayesian spatial process models predict housing sale prices well, provide useful inference regarding heterogeneity in prices within a market, and may be specified to include expert market opinions.
Article
Through applying spatial statistical analysis, Geographical Weighted Regression (GWR) model and GIS technology, this study aims at finding the relationship between the effects of various factors and spatial distribution of residential house price. The traditional regression models are reviewed firstly, the model without the consideration of spatial characteristics cannot reach very nice precision to simulate the spatial distribution of the house price. In this study, the spatial statistical model, coupled with GIS as well as GWR model, is developed. The proposed model is validated using the house price data in Shenzhen, China, when considering these factors such as the land price, transportation, the distance to the commercial center, the distance to hospital, school, the house type, the brand of the house etc. It is demonstrated that our approach provides an effective model to present the distribution of the residential house price and serve as a tool for house price appraisal during the property tax levy process.