ArticlePDF Available

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso

March 2009
Environment and Planning A 41(3):722-742

March 2009
41(3):722-742

DOI:10.1068/a40256

Source
RePEc

Authors:

David C Wheeler

Virginia Commonwealth University

In the field of spatial analysis, the interest of some researchers in modeling relationships between variables locally has led to the development of regression models with spatially varying coefficients. One such model that has been widely applied is geographically weighted regression (GWR). In the application of GWR, marginal inference on the spatial pattern of regression coefficients is often of interest, as is, less typically, prediction and estimation of the response variable. Empirical research and simulation studies have demonstrated that local correlation in explanatory variables can lead to estimated regression coefficients in GWR that are strongly correlated and, hence, problematic for inference on relationships between variables. The author introduces a penalized form of GWR, called the ‘geographically weighted lasso’ (GWL) which adds a constraint on the magnitude of the estimated regression coefficients to limit the effects of explanatory-variable correlation. The GWL also performs local model selection by potentially shrinking some of the estimated regression coefficients to zero in some locations of the study area. Two versions of the GWL are introduced: one designed to improve prediction of the response variable, and one more oriented toward constraining regression coefficients for inference. The results of applying the GWL to simulated and real datasets show that this method stabilizes regression coefficients in the presence of collinearity and produces lower prediction and estimation error of the response variable than does GWR and another constrained version of GWR—geographically weighted ridge regression.

GWR estimated coefficients (left) and GWL-local estimated coefficients (right) for the income (B1) and housing value (B2) covariates in the Columbus crime data

…

Figures

…

Figures - uploaded by David C Wheeler

Content may be subject to copyright.

Content uploaded by David C Wheeler

Content may be subject to copyright.

Simultaneous Coefficient Penalization

and Model Selection in Geographically

Weighted Regression: The Geographically

Weighted Lasso

David C. Wheeler

Technical Report 07-08

October 2007

Department of Biostatistics

Rollins School of Public Health

Emory University

Atlanta, Georgia

Correspondence Author: Dr. David Wheeler

Telephone: (404) 727-8059 FAX: (404) 727-1370

e-mail: dcwheel@sph.emory.edu

Simultaneous Coefficient Penalization and Model Selection in Geographically

Weighted Regression: The Geographically Weighted Lasso

Abstract. In the field of spatial analysis, the interest in modeling relationships between

variables locally by some researchers has lead to the development of regression models

with spatially varying coefficients. One such model that has been widely applied is

geographically weighted regression (GWR). In the application of GWR, marginal

inference on the spatial pattern of regression coefficients is often of interest, as is, less

typically, prediction and estimation of the response variable. Empirical research and

simulation studies have demonstrated that local correlation in explanatory variables can

lead to estimated regression coefficients in GWR that are strongly correlated, and hence,

problematic for inference on relationships between variables. We introduce in this paper

a penalized form of GWR called the geographically weighted lasso (GWL) that adds a

constraint on the magnitude of the estimated regression coefficients to limit the effects of

explanatory variable correlation. The geographically weighted lasso also performs local

model selection by potentially shrinking some of the estimated regression coefficients to

zero in some locations of the study area. We introduce two versions of GWL, one

designed to improve prediction of the response variable and one more oriented for

constraining regression coefficients for inference. The results of applying GWL to

simulated and real datasets show that this method stabilizes regression coefficients in the

presence of collinearity and produces lower prediction and estimation error of the

response variable than does GWR and another constrained version of GWR,

geographically weighted ridge regression.

Key Words: geographically weighted regression, penalized regression, lasso, model

selection, collinearity, ridge regression

1 Introduction

In the field of spatial analysis, the interest of some researchers in modeling

relationships between variables locally has lead to the development of regression models

with spatially varying coefficients. This is evidenced by the spatial expansion method

(Casetti, 1992), geographically weighted regression (GWR) designed to model spatial

parametric nonstationarity (Brunsdon et al, 1996; Fotheringham et al, 2002), and

geographically weighted regression designed to model variance heterogeneity (Páez et al,

2002). Of these, GWR as a model for spatial parametric nonstationarity has experienced

the widest application to date, at least partly due to readily available software for this

technique. One can see the similarities of GWR to nonparametric local, or locally

weighted, regression models that were first developed in the field of statistics (Cleveland,

1979; see also Loader, 1999 and Hastie et al, 2001 for more details). A clear

methodological link between local regression and GWR is found in the similarity of the

estimation procedures for loess smoothing, which is synonymous with local regression, in

Martinez and Martinez (2002, p. 292-293) and the GWR model in Fotheringham et al

(2002), which suggests viewing GWR as a local smoothing method. A key difference

between GWR and locally weighted regression is that in GWR weights arise from a

spatial kernel function applied to observations in a series of related local weighted

regression models across the study area, whereas the weights in locally weighted

regression are from a kernel function applied in variable space. Historically, GWR is

based on the replacement of attribute space in locally weighted regression for curve

fitting with geographical space in locally weighted regression for modeling potentially

spatially varying relationships. GWR also differs from local regression in the focus of its

typical application. Most published applications of GWR are concerned with measuring

statistically significant variation in the estimated regression coefficients and then

visualizing and interpreting the varying regression coefficients, as is in line with the

primary proposed benefit of GWR (Fotheringham et al, 2002). In contrast, local

regression is concerned with fitting a curve to the response variable (Loader, 1999, p. 19).

This difference in objectives may be summarized as one of inference on relationships in

GWR and estimation and prediction of the response variable in local regression. The

discrepancy between the principle applied focus of GWR and its methodological origins

appears to be a noteworthy one, and perhaps a seemingly more appropriate use of GWR

in line with its theoretical statistical origins is for estimation and prediction of the

response variable.

One issue of concern with GWR models expressed in the literature is with

correlation in the estimated coefficients, at least partly due to collinearity in the

explanatory variables of each local model. Wheeler and Tiefelsdorf (2005) show that

while GWR coefficients can be correlated when there is no explanatory variable

collinearity, the coefficient correlation increases systematically with increasingly more

collinearity. The collinearity in explanatory variables can apparently be increased by the

GWR spatial kernel weights, and moderate collinearity of locally weighted explanatory

variables can lead to potentially strong dependence in the local estimated coefficients

(Wheeler and Tiefelsdorf, 2005), which makes interpreting individual coefficients

problematic. As an additional example, Wheeler (2007) applies collinearity diagnostic

tools in a Columbus, Ohio crime dataset to clearly link local collinearity to strong GWR

coefficient correlation and increased coefficient variability for two covariates at

numerous data locations with counter-intuitive regression coefficient signs.

Another issue in GWR is with the customary standard error calculations

associated with regression coefficient estimates. The standard error calculations in GWR

are only approximate due to reuse of the data for estimation at multiple locations

(Congdon, 2003; Lesage, 2004) and due to using the data to estimate both the kernel

bandwidth and the regression coefficients (Wheeler and Calder, 2007). In addition, local

collinearity can increase variances of estimated regression coefficients in the general

regression setting (Neter et al, 1996). The issue with the standard errors implies that the

confidence intervals for estimated GWR coefficients are only approximate and are not

entirely reliable for local model selection via significance tests. An issue related to

inference on the regression coefficients is that of multiple testing in GWR, where tests of

coefficient significance are carried out at many locations using the same data. One

potential solution is to use a Bonferroni adjustment to adjust the significance level of

individual tests to achieve an overall significance level.

There are methods in the statistical literature that attempt to circumvent

collinearity in traditional linear regression models with constant coefficients. These

methods include ridge regression, the lasso, principal components regression, and partial

least squares. Hastie et al (2001) and Frank and Friedman (1993) independently provide

performance comparisons of these methods. Ridge regression and the lasso are both

penalization, or regularization, methods that place a constraint on the regression

coefficients, and principal components regression and partial least squares are both

variable subset selection methods that use linear combinations of the explanatory

variables in the regression model. Ridge regression was designed specifically to reduce

collinearity effects by penalizing the size of regression coefficients and decreasing the

influence in the model of variables with relatively small variance in the design matrix.

The lasso is a more recent development that also shrinks the regression coefficients, but

shrinks the least significant variable coefficients to zero, thereby simultaneously

performing coefficient penalization and model selection. The name for the lasso

technique is derived from its function as a “least absolute shrinkage and selection

operator” (Tibshirani, 1996). Ridge regression and the lasso are deemed as better

candidates than principal components regression and partial least squares to address

collinearity in local spatial regression models because they more directly reduce the

variance in the regression coefficients while retaining interpretability of covariate effects.

To address the issue of collinearity in the GWR framework, Wheeler (2007)

implemented a ridge regression version of GWR, called GWRR, and found it was able to

constrain the regression coefficients to counter local correlation present in an existing

dataset. Another finding was a reduced prediction error for the response variable in

GWRR compared to that from GWR. The lasso has not yet been introduced into the

GWR framework in the literature, and its implementation in GWR is the goal of this

paper. The lasso is appealing in the GWR framework due to its ability to carry out

coefficient shrinkage and local model selection, as well as for its potential to improve on

the performance of GWR for estimating the response variable, in terms of lower

prediction and estimation errors. While ridge regression in GWR has the potential to

control the variability in estimated regression coefficients, the lasso in theory should be

able to constrain the coefficients and additionally perform local model selection by

eliminating covariates from individual local models. Thus, the lasso offers a key

advantage to ridge regression in the GWR framework and should lessen the reliance on

approximate confidence intervals in GWR for identification of insignificant local effects.

In this paper, we first review the GWR and lasso methods and then introduce the lasso in

the GWR framework. We then demonstrate the benefit of using the geographically

weighted lasso (GWL) through a comparative analysis with GWR and GWRR of two

existing crime datasets and simulated data.

2 Methods

Geographically Weighted Regression

In the application of GWR, data are often mean measures of aggregate data at

fixed points with associated spatial coordinates; for example, see the Georgia county

example in Fotheringham et al (2002), although this need not be the case. The spatial

coordinates of the data are used in calculation of distances that are input into a kernel

function to determine weights for spatial dependence between observations. Typically, a

regression model is fitted at each point location in the dataset, called a model calibration

location. Local regression models are related through sharing data, but the dependence

between regression coefficients at different model calibration locations is not specified in

the model. For each calibration location, 1, ,in

K, the GWR model at location i is

() () () ()

iiii=+Xβε, (1)

where ( )

i is the dependent variable at location i, ( )iX is the row vector of explanatory

variables at location i, ( )iβ is the column vector of regression coefficients at location i,

and ( )

iε is the random error at location i. The vector of estimated regression coefficients

at location i is

ˆ() [ () ] () ,

ii i

−

=⋅ ⋅ ⋅ ⋅βXW XXW

(2)

where [ (1); (2); ; ( )]

TT TT

n=XX X XK is the design matrix of explanatory variables,

which typically includes a column of 1's for the intercept; 1

( ) [ ( ), , ( )]

idiagwi wi

WK is

the diagonal weights matrix that is calculated for each calibration location i and applies

weights to observations 1, ,jn=K, with typically more weight applied to proximate or

neighboring observations;

is the 1n

vector of dependent variables; and

()

ˆˆˆˆ

() , , , T

ii ip

ββ β

=βK is the vector of 1

local regression coefficients at location i

for

explanatory variables and an intercept term.

The weights matrix, ( )i

W, is calculated from a kernel function that places more

emphasis on observations that are closer to the model calibration location i. There are

numerous choices for the kernel function, including the Gaussian function, the bi-square

nearest neighbor function, and the exponential function. The exponential kernel function

is utilized in this paper. The weight from the exponential kernel function between any

location

and the model calibration location i is calculated as

( ) exp( / )

jij

wi d

=− , (3)

where ij

d is the distance between the calibration location i and location

, and

is the

kernel bandwidth parameter.

To fit the GWR model, the kernel bandwidth is first estimated, often in practice

by leave-one-out cross-validation (CV) across all the calibration locations. Cross-

validation is an iterative process that finds the kernel bandwidth with the lowest

associated prediction error of all the responses ( )

i. For each calibration location i, it

removes the data for observation i in the model calibration at location i and predicts

()

i using the other data points and the kernel weights associated with the current

bandwidth. An alternative to CV in kernel bandwidth estimation is the Akaike

Information Criterion (AIC), as discussed in Fotheringham et al (2002). CV and the AIC

are tools used in model selection and more general information on the AIC and model

selection are available elsewhere (Burnham and Anderson, 2004). It is currently unclear

whether CV or AIC will generally return the same solution or one method should be

favored in certain situations. The need for more research in this area is stressed by Farber

and Páez (2007). Next, the kernel weights are calculated at each calibration location

using the estimated bandwidth in the kernel function. Then, the regression coefficients

are estimated at each model calibration location, and, finally, the responses are estimated

by the expression ˆ

ˆ() () ()

iii=Χβ .

The Lasso

Shrinkage methods such as ridge regression and the lasso introduce a constraint

on the regression coefficients. The ridge regression coefficients minimize the sum of a

penalty on the size of the squared coefficients and the residual sum of squares (see

Wheeler, 2007 for details). The lasso takes the shrinkage of ridge regression a step further

by potentially shrinking the regression coefficients of some variables to zero. The lasso

specification is similar to ridge regression, but it has a 1

L coefficient penalty in place of

the ridge 2

L penalty, where 1

L denotes a sum of absolute values and 2

L denotes a sum

of squared values. The lasso is defined as

ˆargmin

subject to

Riikk

ββ

⎛⎞

=−−

⎜⎟

⎝⎠

≤

∑∑

∑

. (4)

Tibshirani (1996) notes that the lasso constraint k

≤

∑

is equivalent to adding the

penalty term k

∑ to the residual sum of squares, hence there is a direct

correspondence between the parameters s and

that control the amount of shrinkage of

the regression coefficients. The equivalent statement for the lasso coefficients is

11 1

ˆargmin pp

Riikkk

ik k

βλβ

== =

⎧⎫

⎛⎞

⎪⎪

=−−+

⎨⎬

⎜⎟

⎝⎠

⎪⎪

⎩⎭

∑∑ ∑

β. (5)

The absolute value constraint on the regression coefficients makes the problem nonlinear

and a typical way to solve this type of problem is with quadratic programming.

There are, however, ways to estimate the lasso coefficients outside of the

mathematical programming framework. Tibshirani (1996) provides an algorithm that

finds the lasso solutions by treating the problem as a least squares problem with 2p

inequality constraints, one for each possible sign of the k

’s, and applying the constraints

sequentially. An even more attractive way to solve the lasso problem is proposed by

Efron et al (2004a), who solve the lasso problem with a small modification to the least

angle regression (LARS) algorithm, which is a variation of the classic forward selection

algorithm in linear regression. The modification ensures that the sign of any non-zero

estimated regression coefficient is the same as the sign of the correlation coefficient

between the corresponding explanatory variable and the current residuals. Grandvalet

(1998) shows that the lasso is equivalent to adaptive ridge regression and develops an EM

algorithm to compute the lasso solution.

It is worthwhile to describe in more detail the LARS and lasso algorithms of

Efron et al (2004a) because these methods have not been previously introduced in the

geography literature at the time of this writing. The LARS algorithm is similar in spirit to

forward stepwise regression, which we now describe. The forward stepwise regression

algorithm is:

(1) Start with all coefficients k

equal to zero and set

, where r is the residual

vector and

is the dependent variable vector.

(2) Find the predictor k

most correlated with the residuals r and add it to the model.

(3) Calculate the residuals ˆ

=−r

(4) Continue steps 2-3 until all predictors are in the model.

While the LARS algorithm is described in detail algebraically in Efron et al

(2004a), Efron et al (2004b) restate the LARS algorithm as a purely statistical one with

repeated fitting of the residuals, similar to the forward stepwise regression algorithm. The

statistical statement of the LARS algorithm is:

(1) Start with all coefficients k

equal to zero and set

(2) Find the predictor k

most correlated with the residuals r.

(3) Increase the coefficient k

in the direction of the sign of its correlation with r,

calculating the residuals ˆ

=−r

at each increase, and continue until some other

predictor m

has as much correlation with the current residual vector r as does

predictor k

(4) Update the residuals and increase ( , )

in the joint least squares direction for the

regression of r on ( , )

x until some other predictor j

has as much correlation

with the current residual r.

(5) Continue steps 2-4 until all predictors are in the model. Stop when corr( , ) 0

rx j=∀,

which is the OLS solution.

As with ridge regression, typically the response variable is centered and the

explanatory variables are centered and scaled to have equal (unit) variance prior to

starting the LARS algorithm. In other words, 10

∑, 10

∑, and 2

∑ for

1, ,jm=K. Efron et al (2004a) show that a small modification to the LARS algorithm

yields the lasso solutions. In a lasso solution, the sign of any nonzero coefficient k

must

agree with the sign of the current correlation of k

and the residual. The LARS algorithm

does not enforce this, but Efron and coauthors modify the algorithm to do so by removing

from the lasso solution if it changes in sign from the sign of the correlation of k

and

the current residual. This modification means that in the lasso solution, the active set of

variables in the solution does not necessarily monotonically increase as the routine

progresses. Therefore, the LARS algorithm typically takes less iterations than does the

lasso algorithm. The modified LARS algorithm produces the entire range of possible

lasso solutions, from the initial solution with all coefficients equal to zero, to the final

solution, which is also the OLS solution.

In some of the lasso algorithms, such as the modified LARS algorithm and the

algorithm Tibshirani describes, the shrinkage parameter s (or t) must be estimated

before finding the lasso solutions. Hastie et al (2001) estimate the parameter

ols

∑

(6)

through ten-fold cross-validation, where t is some positive scalar that reduces the

ordinary least squares coefficient estimates. Tibshirani (1996) uses five-fold cross-

validation, generalized cross-validation, and a risk minimizer to estimate the parameter t,

with the computational cost of the three methods decreasing in the same order. Efron et al

(2004a) also recommend using cross-validation to estimate the lasso parameter. If t is

one or less, there is no shrinkage and the lasso solutions for the coefficients are the least

squares solutions. One can also define the lasso shrinkage parameter as

pols

=∑

∑, (7)

and s ranges from 0 to 1, where 0 corresponds to the initial lasso solution with all

regression coefficients shrunk to 0 and 1 corresponds to the final lasso solution, which is

also the OLS solution. Then, s can be viewed as the fraction of the OLS solution that is

the lasso solution. This is the definition of the lasso shrinkage parameter that we will use

in the subsequent work in this paper.

Geographically Weighted Lasso

The lasso can be implemented in GWR relatively easily, and the result is here

called the geographically weighted lasso (GWL). An efficient implementation of the

GWL outlined here uses the lars function from the package of the same name written

in the R language by Hastie and Efron (see the R Project web site: http://cran.r-

project.org/). The lars function implements the LARS and lasso methods, where the

lasso is the default method, and details are described in Efron et al (2004a; 2004b). To

make use of the lars function in the GWR framework, the

and

variables input to

the function must be weighted by the kernel weights at each model calibration location.

The lars function must be run at each model calibration location. This can be done in

one of two ways: separate models with local scaling of the explanatory variables (GWL-

local) or one model with global scaling of the explanatory variables (GWL-global). The

first way, local scaling, requires n calls of the lars function, one for each location, and

the weighted

and

are centered and the

variables are scaled by the norm in the

lars function. This effectively removes the intercept and equates the scales of the

explanatory variables to avoid the problem of different scales. The local scaling version

estimates the lasso parameter to control the amount of coefficient shrinkage at each

calibration location, so there is a shrinkage parameter i

s estimated at each location i.

Since we are working here in the GWR framework, we will estimate the model shrinkage

and kernel bandwidth parameters using leave-one-out cross-validation while minimizing

the root mean square prediction error (RMSPE) of the response variable. Therefore, the

n i

s parameters and the kernel bandwidth

must be estimated in GWL with CV before

the final lasso coefficient solutions are estimated. We have chosen to estimate these

parameters simultaneously, as the lasso solution will likely depend on the kernel

bandwidth. The algorithm to estimate the local scaling GWL parameters using cross-

validation is:

• For each attempted bandwidth

in the binary search for the lowest RMSPE

o Calculate the nn× weights matrix W using an nn

inter-point distance

matrix D and

o For each location i from 1, , nK

 Set 12() sqrt(diag( ()))ii=WW and 12() 0

W, that is, set the ( , )ii

element of the square root of the diagonal weights matrix to 0 to

effectively remove observation i.

 Set 12()i=

XW X and 12()i=

using the square root of the

kernel weights ( )i

W at location i.

 Call lars(,

), save the series of lasso solutions, find the lasso

solution that minimizes the error for i

, and save this solution.

• Stop when there is only a small change in the estimated

. Save the estimated

In the previous algorithm, saving the lasso solution entails saving the estimated shrinkage

fraction i

s at each location, as well as an indicator vector b of which variable

coefficients are shrunken to zero. The algorithm uses a binary search to find the

that

minimizes the RMSPE. The small change in

is set exogenously. The square root of the

weights are used to weight the data because this is how the weights are applied to the data

in the estimation of GWR regression coefficients in equation (2).

The algorithm to estimate the final local scaling GWL solutions after cross-

validation estimation of the shrinkage and kernel bandwidth parameters is:

• Calculate the nn× weights matrix W using an nn

inter-point distance matrix D

and

• For each location i from 1, , nK

o Set 12() sqrt(diag( ()))ii=WW.

o Set 12()i=

XW X and 12()i=

using the square root of the diagonal

weights matrix ( )i

W at location i.

o Call lars(,

) and save the series of lasso solutions.

o Select the lasso solution that matches the cross-validation solution according

to the fraction i

s and the indicator vector b.

The second GWL method, global scaling, calls the lars function only one time,

using specially structured input data matrices. This method fits all the local models at

once, using global scaling of the

variables. It also estimates only one lasso parameter

to control the amount of coefficient shrinkage. The weighted design matrix for the global

version is a ()()nn np⋅×⋅ matrix and the weighted response vector is ()1nn⋅×. This

results in a ()1np⋅×

vector of estimated regression coefficients. The weighted design

matrix is such that the design matrix is repeated n times, shifting

columns in its

starting position each time it is repeated. The kernel weights for the 1st location are

applied to the first n rows of the matrix, the weights for the 2nd location are applied to

the next n rows of the matrix, and so forth. The weighted response vector has the

response vector repeated n times, with the weights for the 1st location applied to the first

n elements of the vector, and so on. The algorithm to estimate the global scaling GWL

parameters using cross-validation is:

• For each attempted bandwidth

in the binary search for the lowest RMSPE

o Calculate the nn× weights matrix W using an nn

inter-point distance

matrix D and

o Set diagonal of W = 0.

o Set 12 ()

=×⋅

using the square root of each element of the weights

matrix W and the column unity vector 1 of length n. The operator ×

indicates element-by-element multiplication here. Set 1k

and 1m=.

o For each location i from 1, , nK

 Set (1)jkn n=⋅− − and (1)lmp p

⋅− −.

 Set 12()i=

XW X using the square root of the kernel weights ( )iW

at location i. Set ( : , : )

Gjnklpm⋅⋅=

XX.

 Set 1kk=+

and 1mm

o Call lars(,vec()

) and save the series of lasso solutions, where the

vec() operator turns a matrix into a vector by sequentially placing columns,

starting with the first, into one row.

In the previous algorithm, saving the lasso solution entails saving the estimated overall

shrinkage fraction s, as well as a vector b that indicates which of the variable

coefficients are shrunken to zero. The algorithm uses a binary search to find the

that

minimizes the RMSPE. The small change in

is set exogenously.

The algorithm to estimate the final global scaling GWL solutions after cross-

validation estimation of the shrinkage and kernel bandwidth parameters is:

• Calculate the nn× weights matrix W using an nn

inter-point distance matrix D

and

• Set 12 ()

=×⋅

using the square root of each element of the weights matrix

W and the column unity vector 1 of length n. The operator

indicates element-by-

element multiplication here. Set 1k

and 1m

• For each location i from 1, , nK

o Set ( 1)jkn n=⋅− − and ( 1)lmp p

⋅− −.

o Set 12()i=

XW X

using the square root of the kernel weights ( )iW at

location i. Set ( : , : )

Gjnklpm⋅⋅=

XX.

o Set 1kk=+ and 1mm=+.

• Call lars(,vec( )

) and save the series of lasso solutions, where vec() turns the

matrix into a vector.

• Select the lasso solution that matches the cross-validation solution according to the

fraction s and the indicator vector b.

In comparing the local and global scaling GWL algorithms, the global GWL

algorithm requires more computational time due to the matrix inversion of a much larger

matrix. The global GWR algorithm must invert a ( )np np

⋅

×⋅ matrix, while the local

GWR algorithm must invert a ( )

n times, which is clearly faster. Considering that

calculating the inverse of a general

matrix takes between 2

()Oj and 3

()Oj time

(Banerjee et al. 2004), there can be quite a difference in the computation time for the two

versions of GWR. In general, global GWL can take between two and three times more

computation time than local GWL. In fact, global GWL may not be possible for large

datasets, where large is defined relative to the computing environment, as the memory

requirements of the method could exceed available computer system memory. In terms of

expected model performance, the local GWL method should produce lower prediction

error of the response variable than the global GWL method, as adding more shrinkage

parameters generally increases model stability and hence lowers prediction error. The

benefit of global GWL may be in lower estimation error of the regression coefficients, as

the one shrinkage parameter may control excessive coefficient variation in GWR without

stabilizing the model to the degree of local GWL. In summary, the local GWL should be

faster than the global GWL and should have lower prediction error. The local and global

versions of GWL will be compared empirically to each other and to GWR in the data

example and simulation study in the next two sections.

3 Houston and Columbus Crime Examples

In this section, we demonstrate the use of the GWL methodology with two

existing data sets dealing with crime in Houston, TX and Columbus, OH and compare the

GWL results with those from both GWR and GWRR. Waller et al (2007) previously

analyzed violent crime incidence related to alcohol sales and drug law violations in the

Houston dataset using GWR and a Bayesian hierarchical model. The Columbus crime

dataset has been analyzed in spatial analysis work (Anselin, 1988) and in GWR-related

work (LeSage, 2004; Wheeler, 2007). Wheeler (2007) demonstrated with diagnostic tools

the presence of collinearity in a GWR model for Columbus neighborhood crime rates

using median income and housing values. We use the Columbus crime dataset here as an

illustrative example to compare model performance and select it for its problem with

collinearity in the GWR model. In analyzing the Columbus crime data, Wheeler (2007)

used a nearest neighbor bi-square kernel function with cross-validation to estimate the

GWR kernel bandwidth. In this work, we use an exponential kernel function with cross-

validation to demonstrate that the collinearity issue persists with a different kernel

function. All subsequent GWR-related models presented here use this kernel function.

Wheeler (2007) introduced the collinearity diagnostics of variance-decomposition

proportions, condition indexes, and variance inflation factors for GWR and applied them

to the Columbus crime data to illustrate collinearity issues with the GWR model. The

details for the diagnostics are available in that paper and are omitted here for brevity.

Instead, we briefly summarize the results of applying the variance-decomposition

diagnostic tool to the Columbus crime data. The GWR model is

01122

() () () () () () ()

iiixiixii

ββ ε

=+ + +, (8)

where

is residential and vehicle thefts combined per thousand people for 1980, 1

mean income, 2

is mean housing value, and i is the index for neighborhoods. Through

cross-validation, the estimated GWR kernel bandwidth ˆ1.26

=. This estimated

bandwidth is used in the variance-decomposition of the kernel weighted design matrix to

assess the collinearity in the model. The variance-decomposition is done through singular

value decomposition and it has an associated condition index, which is the ratio of the

largest singular value to the smallest singular value. In diagnosing collinearity, the larger

the condition index, the stronger is the collinearity among the columns of the GWR

weighted design matrix. Belsley (1991) recommends a conservative value of 30 for a

condition index that indicates collinearity, but suggests the threshold value could be as

low as 10 when there are large variance proportions for the same component. The

variance-decomposition proportion is the proportion of the variance of a regression

coefficient that is affiliated with one component of its decomposition. In addition, the

presence of two or more variance proportions greater than 0.5 in one component of the

variance-decomposition indicates that collinearity exists between at least two regression

terms, one of which may be the intercept. Of the 49 records in the data, 6 have a

condition index above 30, 12 have a condition index above 20, and 45 have a condition

index above 10 and have large shared variances for the same component. There are many

observations with large variance proportions (> 0.5) from the same component, with the

shared component being between a covariate and the intercept for some records and

between the two covariates for other records. Of the 47 records with a large shared

variance component, 23 are with the intercept and income, 4 are with the intercept and

housing value, and 20 are between income and housing value. Overall, the diagnostic

values indicate local collinearity in the GWR model.

Due to the collinearity in the GWR model, it is beneficial to apply the GWL

models to these data and compare their performance to the GWR and GWRR models in

terms of prediction and estimation error of the response variable. The accuracy of the

estimated and predicted responses is measured by calculating the root mean square error

(RMSE) and root mean square prediction error, respectively. The RMSE is the square

root of the mean of the squared deviations of the estimates from the true values and

should be small for accurate estimators. The results of fitting all four models to the data

provide the error values in Table 1. The lowest prediction error and estimation error

among the four models are listed in bold font. In this case, the constrained versions of

GWR do substantially better than GWR at predicting the dependent variable, and GWL-

local performs better than GWRR and GWL-global. The RMSPE for the GWL-local

model is 32% lower than for GWR and 24% lower than for GWRR. For estimating the

dependent variable, GWL-global performs best and substantially better than the other

models. The RMSE for the GWL-global model is 17% lower than for the GWR model.

Overall, GWL performs better than both GWR and GWRR. Figure 1 shows the estimated

GWR coefficients and the GWL-local coefficients for income 1

()

and housing value

()

. The figure shows the nature of the shrinkage in the estimated GWL coefficients

and how GWL enforces local model selection by shrinking some estimated coefficients to

zero. In some neighborhoods, either the income or housing value has been effectively

removed from the model. The estimated shrinkage parameter ˆ0.75s

for the GWL-

global model and the mean estimated shrinkage parameter is ˆ0.76

for the GWL-local

model.

The Houston crime data consist of 439 census tracts in the City of Houston with

attributes from year 2000. The number of violent crimes per person in each census tract is

displayed in Figure 2. There are a few census tracts with a total number of violent crimes

that exceeds the population size. For the Houston crime data, the GWR model notation is

the same as in equation 8, but where

is the number of violent crimes (murder, robbery,

rape, and aggregated assault) per person, 1

is the number of drug law violations per

person, 2

is the number of alcohol outlets per person, and i is the index for census

tracts. Since the distribution of the response variable is positively skewed, we use the

natural logarithm of violent crimes in the model and also use the natural logarithm of

both covariates to maintain linear relationships with violence rates. The GWR estimated

kernel bandwidth ˆ0.89

= found through cross-validation. To assess collinearity in the

GWR model, we use the variance-decomposition diagnostic. The variance-decomposition

proportions and condition indexes are listed in Table 2 for records with the largest

condition indexes. These 10 records are labeled in the left plot of estimated GWR

coefficients for the drug and alcohol covariates in Figure 3. These labeled records

comprise many of the more extreme points in the plot. Observation 153 is clearly the

most extreme of the points, as it has the largest value for the drug rate effect and the

smallest value for the alcohol rate effect. In Table 2, this record has large variance

proportions for the same component for all three regression terms. Of the 439 records in

the dataset, 5 have a condition index above 30, 10 have a condition index above 20, and

41 have a condition index above 10. There are 411 records in the data with large variance

proportions (> 0.5) from the same component, with the shared component being between

a covariate and the intercept for some records and between the two covariates for other

records. Overall, the variance-decomposition proportions and condition index values

indicate the presence of local collinearity in the GWR model.

Given the presence of local collinearity in the GWR model for violent crime in

Houston, we also fit the constrained versions of GWR and compare them to GWR in

terms of model performance. The RMSE and RMSPE values for the response variable are

listed in Table 3 for the GWR, GWRR, GWL-global, and GWL-local models. As with

the Columbus crime data, the constrained versions of GWR improve on GWR in

prediction of the response variable. The GWL-local model again produces the lowest

RMSPE, 18% lower than the GWR model. In estimating violent crime, the GWL models

improve upon the GWR model. The GWL-global model produces the lowest RMSE and

its RMSE is 14% lower than in the GWR model. The estimated regression coefficients

for the GWL-global model in the right plot in Figure 3 show that the GWL model has

penalized some of the most extreme coefficients in the GWR model in the left side of

Figure 3, particularly record 153. Figure 4 displays the estimated regression coefficients

for the drug covariate from the GWL-local model plotted against the estimated

coefficients from the GWR model. This figure shows the effective shrinkage of the

GWL-local model, where the GWL-local model shrinks certain larger GWR coefficients,

some to zero. The large estimated regression coefficient for record 153 is greatly reduced

in the GWL-local model. The estimated shrinkage parameter ˆ0.92s

for the GWL-

global model and the mean estimated shrinkage parameter is ˆ0.65

for the GWL-local

model. The correlation in the estimated regression coefficients for the drug and alcohol

covariates is -0.41 with GWR, -0.39 with GWRR, -0.37 with GWL-global, and 0.03 with

GWL-local. The results with the crime data examples consistently show that the

constrained versions of GWR improve on the performance of GWR and that the GWL-

local model produces the lowest prediction error and the GWL-global model produces the

lowest estimation error.

4 Simulation Study

In this section, we use a simulation study to evaluate and compare the accuracy of

the predicted and estimated responses and the estimated regression coefficients from the

GWR, GWRR, and GWL models. We assess the accuracy of the models both when there

is no collinearity in the explanatory variables and when there is collinearity, expressed at

various levels. The expectation is that the GWL model will improve on GWR for

regression coefficient estimation when there is collinearity in the model. Another

expectation is that the GWL model will improve on GWR for prediction and estimation

of the response variable. While it has been conventional for researchers to apply a newly

introduced method to an existing dataset as a demonstration of the utility of the method,

we make use of simulated data here to learn about the performance of the method in a

comparative setting. It is necessary to use simulation in order to set the “true” values of

the regression coefficients, which are unknown with existing data, so that it is possible to

measure the deviation from the truth of the estimates from competing models. The

simulation study presented here is not intended to be exhaustive, but rather is an

appealing alternative to existing data for demonstrating the performance of the introduced

method in a certain situation.

The data-generating model in the simulation study has four explanatory variables,

with the true coefficients used to generate the data set equal to nearly zero for one

explanatory variable. The model to generate the data for this simulation study is

** * *

11 22 33 44

() () () () () () () () () ()

iixiixiixiixii

βββε

=++++, (9)

where 234

,,,

xxxx are the first four principal components from a random sample drawn

from a multivariate normal distribution of dimension ten with a mean vector of zeros and

an identity covariance matrix, the errors ε are sampled independently from a normal

distribution with mean 0 and variance of 2*

, and i denotes the location. The star

notation denotes the true values of the parameters used to generate the data. Note that

there is no true intercept in the model used to generate the data and we do not fit an

intercept in the simulation study. The data points are equally spaced on a 14 14× grid, for

a total of 196 observations. The goal of the simulation study is to use the model in

equation (9) to generate the data and see if the regression coefficient estimates match *

and if the estimated and predicted responses approximate

for the GWR, GWRR, and

GWL models. To produce comparable summary measures of deviance of the estimates

and responses from the true values, we generate 100 realizations of the coefficient

process, estimate the model parameters and responses for each data realization, measure

the error in the estimates, and then produce average errors over the many realizations of

the data. Using 100 realizations of the data-generating process is advantageous compared

to one dataset because it allows us to assess model performance over 100 datasets.

Each realization of the true regression coefficients, *

β, is sampled through the

distribution

|, ( ,)

N×

⎡⎤

=⊗

⎣⎦

ββ ββ

Σ1

Σ, (10)

where the vector 0

(,, )

ββ

μμ

μK contains the means of the regression coefficients

corresponding to each of the

explanatory variables, and spatial dependence in the

coefficients is specified through the covariance, β

Σ. We assume a separable covariance

matrix (Gelfand et al, 2003) for β of the form

()

=⊗

ΣHT, (11)

where ( )

H is the nn × correlation matrix that captures the spatial association between

the n locations,

is the spatial dependence parameter, T is a positive-definite

matrix for the covariance of the regression coefficients at any spatial location, and ⊗

denotes the Kronecker product operator, which is the multiplication of every element in

()

H by T. In the specification of the variance in the distribution for β (equation 11),

the Kronecker product results in a np np

positive definite covariance matrix, since

()

H and T are both positive definite. The elements of the correlation matrix ( )

() ( ;)

jk j k

Hii

ργ

=−, are calculated from the exponential function ( ; ) exp( / )dd

γγ

=− .

For this simulation study, the true values used to generate the data are

*(1,5,5,0)

=μ, 2*

= 1, *10

=, and *

T= diag(.1, .5, .5, .0000001), where diag() makes

a diagonal matrix with the input numbers on the diagonal. The mean of 0 and the small

variance for the fourth type of regression coefficient produce a variable effect that is

effectively zero across the study area. More information regarding drawing samples from

the coefficient distribution utilized here is available from Wheeler and Calder (2007). In

general, as *

increases there are more consistent and clear patterns in the true regression

coefficients. The range is the distance beyond which the spatial association becomes

insignificant and is approximately *

with the covariance function parameterization

used here, so there is some dependence in the coefficients for each covariate throughout

the study area. Figure 5 illustrates the pattern in the true coefficients for two covariates

for one realization of the coefficient process, and shows that there is some smoothness

and spatial variation in the true coefficients when *10

. This pattern reflects a situation

where there is spatial parametric nonstationarity, in other words, one in which GWR is

intended to be applied.

In this simulation study, we start with no substantial collinearity in the model and

systematically increase it until the explanatory variables are nearly perfectly collinear.

This is done by replacing one of the original explanatory variables with one created from

a weighted linear combination of the original explanatory variables, where the weight

determines the amount of correlation of the variables. The formula for the new weighted

variable is

21 2

(1 )

cx c x=⋅ + − ⋅ , (12)

where 2

replaces 2

in the model in equation (9) and c is a weight between 0 and 1.

The simulation study is carried out with four levels of explanatory variable collinearity.

The weights used in equation (12) to create the collinearity are (0.0, 0.5, 0.7, 0.9)c

which coincide with explanatory variable correlation of (0.0, 0.74, 0.93, 0.99)r

. These

levels of correlation correspond to no collinearity as a baseline, and then moderate,

strong, and nearly perfect collinearity. In this study, the model parameters and responses

are estimated for each realization for each of the following models: GWR, GWRR,

GWL–global, and GWL–local. The kernel bandwidth is estimated for each data

realization using cross-validation and is thus potentially different for each realization. To

measure the accuracy of the estimated regression coefficients and estimated responses,

the RMSPE and RMSE are calculated for the responses ˆ

and the RMSE is calculated

for the coefficients ˆ

β for each data realization. The average RMSE for ˆ

β and ˆ

and the

average RMSPE for ˆ

are then calculated from averaging the individual RMSE’s and

RMSPE’s from the 100 realizations of the coefficients.

The average RMSPE and RMSE for ˆ

and the average RMSE for ˆ

β for each

model are listed in Table 4. The lowest value for each error measure for each level of

variable correlation (column) is in bold font. The results in the table show that the GWL-

local model produces the lowest prediction error of the response. The GWL-local model

prediction error is approximately 20% lower on average than the GWR error. This is not

an unexpected result, as the GWL-local model adds the most local penalization

parameters to the GWR model, which should lower the prediction error by stabilizing the

model. The next best performer in terms of RMSPE of the response is the GWL-global

model. GWR has the highest average prediction error of the response at each level of

collinearity. These results demonstrate that adding penalization terms for the regression

coefficients in GWR results in lower prediction error of the response than with GWR.

The RMSE results in the table show that the GWL-local model produces the

lowest estimation error of the response at all levels of collinearity. The GWL-local

estimation error is approximately 20% lower on average than the GWR error. Overall, the

simulation study shows that the GWL models perform better than GWR in explaining the

response variable. The better performance of the two versions of GWL relative to the

GWR is not unexpected, given that the GWL methods can shrink the regression

coefficients to zero to match the true values for one of the variables in an effort to

estimate the response variable. Taken together, the results from Table 4 indicate that the

GWL-local model is best for predicting and estimating the response variable in the

presence of an insignificant explanatory variable.

The RMSE results for ˆ

β in the table show that the GWL-global model produces

the lowest average estimation error of the regression coefficients. The GWR model

performs next best, except when there is nearly perfect collinearity and the GWRR model

outperforms GWR considerably. An explanation for the leading performance of the

GWL-global model is that it applies moderate shrinkage to the coefficients towards zero

for the variable with true coefficients set to zero to effectively remove its effect from the

model. It strikes a balance between the stronger shrinkage of GWL and the weaker

shrinkage of GWRR. The RMSE results for ˆ

β suggest that an improvement in marginal

inference on the regression coefficients in the presence of collinearity or insignificant

explanatory variables is possible with the GWL-global model used in place of GWR.

An example of the difference in the estimated coefficients from GWR and GWL-

local is illustrated in Figure 6, which displays the estimated coefficients for 4

from the

GWR and GWL-local models for one realization of the coefficient process when there is

no collinearity in the model. The true coefficients for this variable are all approximately

zero, so a plot of them would be a constant white surface. Figure 6 shows that the GWL

model estimates more of the coefficients near zero for this variable through coefficient

shrinkage than does GWR. This results in lower prediction and estimation error of the

response variable.

Many times in traditional regression analyses, researchers only consider using

penalization methods, such as the lasso and ridge regression, when there are many

explanatory variables to include in the model. However, the results from this simulation

study show that one can improve on GWR in terms of prediction and estimation of the

response and estimation of the regression coefficients for even relatively small models.

There may be situations, however, where it is beneficial to use GWR without

penalization when prediction is not of primary interest, particularly for quick descriptive

analyses of spatially varying relationships in data where collinearity is not present.

However, we anticipate that the benefits of penalization in GWR for prediction will

increase with an increasing number of potentially correlated explanatory variables.

5 Conclusions

There has been an increasing interest in spatially varying relationships between

variables in recent years in the spatial analysis literature. Recent attempts at modeling

these relationships have resulted in numerous forms of geographically weighted

regression, which has technical origins in locally weighted regression. While GWR offers

the promise of an understanding of the spatially varying relationships between variables,

local collinearity in the weighted explanatory variables used in GWR can produce

unstable models and dependence in the local regression coefficients, which can interfere

with conclusions about these relationships. While GWR has been applied to numerous

real world datasets in the literature, there has been inadequate consideration of the

accuracy of inferences derived from this model and an unclear distinction as to its use for

prediction and estimation of the response variable versus its role in inference on the

relationships between variables. The work in this paper uses real and simulated data to

evaluate the accuracy of the response variable estimates and predictions provided from

GWR and constrained versions of GWR, namely geographically weighted ridge

regression and the newly introduced geographically weighted lasso models. It also

evaluates the accuracy of regression coefficients from GWR, GWRR, and the GWL

models using simulated data, while considering the presence of collinearity and an

insignificant variable.

The work presented here shows that it is possible to implement the lasso in the

geographically weighted regression framework to perform regression coefficient

shrinkage while simultaneously performing local model selection and reducing prediction

and estimation error of the response variable. The data example and simulation study

results show that the penalized versions of GWR can outperform GWR in terms of

response variable prediction and estimation, both when there is no collinearity and where

there are various levels of collinearity in the model. In both the real and simulated data,

the GWL-local model produces the lowest prediction error of the response variable

among the methods considered. For the actual data, the GWL-global model produced the

lowest response variable estimation error. Other related preliminary work (Wheeler,

2006) suggests that the geographically weighted lasso may perform better at dependent

variable estimation than a Bayesian spatially varying coefficient process (SVCP) model

(Gelfand et al, 2003) that may be viewed as an alternative to GWR. Wheeler and Calder

(2007) recently demonstrated that the SVCP model can offer more accurate coefficient

inference and lower response variable estimation error than GWR, although at a greater

computational cost. A theoretical comparison of the performance of GWR, all penalized

versions of GWR, and the SVCP model is planned for future work. In summary, the

penalized versions of GWR introduced in this paper extend the method of GWR to

improve prediction and estimation of the response variable, which is in agreement with

its statistical theoretical origins.

References

Anselin L, 1988 Spatial Econometrics: Methods and Models (Kluwer, Dorddrecht)

Banerjee S, Carlin B P, Gelfand A E, 2004 Hierarchical Modeling and Analysis for

Spatial Data (Chapman & Hall, Boca Raton)

Belsley D A, 1991 Conditioning Diagnostics: Collinearity and Weak Data in Regression

(John Wiley, New York)

Brunsdon C, Fotheringham A S, Charlton M, 1996, “Geographically weighted regression:

a method for exploring spatial nonstationarity” Geographical Analysis 28(4) 281 -

298

Burnham K, Anderson D, 2004, Model Selection and Multi-Model Inference: A Practical

Information-Theoretic Approach (Springer-Verlag, Berlin)

Casetti E, 1992, “Generating models by the expansion method: applications to

geographic research” Geographical Analysis 4 81 - 91

Cleveland W S, 1979, “Robust locally-weighted regression and smoothing scatterplots”

Journal of the American Statistical Association 74 829 - 836

Cleveland W S, Devlin S J, 1988, "Locally-weighted regression: an approach to

regression analysis by local fitting" Journal of the American Statistical

Association 83(403) 596 - 610

Congdon P, 2003, “Modelling spatially varying impacts of socioeconomic predictors on

mortality outcomes” Journal of Geographical Systems 5 161 - 184

Efron B, Hastie T, Johnstone I, Tibshirani R, 2004a, “Least angle regression” Annals of

Statistics 32(2) 407 - 451

Efron B, Hastie T, Johnstone I, Tibshirani R, 2004b, “Rejoinder to least angle regression”

Annals of Statistics 32(2) 494 - 499

Farber S, Páez A, 2007, “A systematic investigation of cross-validation in GWR model

estimation: empirical analysis and Monte Carlo simulations” Journal of

Geographical Systems, forthcoming

Fotheringham A S, Brunsdon C, Charlton M, 2002 Geographically Weighted Regression:

The Analysis of Spatially Varying Relationships (John Wiley & Sons, West

Sussex)

Frank I E, Friedman J H, 1993, “A statistical view of some chemometrics regression

tools” Technometrics 35(2) 109 - 148

Gelfand A E, Kim H, Sirmans C F, Banerjee S, 2003, “Spatial modeling with spatially

varying coefficient processes” Journal of the American Statistical Association 98

387 - 396

Grandvalet Y, 1998, “Least absolute shrinkage is equivalent to quadratic penalization”, in

ICANN'98, Volume 1 of Perspectives in Neural Computing Eds L Niklasson, M

Boden, T Ziemske (Springer-Verlag, Berlin) pp 201 - 206

Hastie T, Tibshirani R, Friedman J, 2001 The Elements of Statistical Learning: Data

Mining, Inference, and Prediction (Springer-Verlag, New York)

LeSage J P, 2004, “A family of geographically weighted regression models” in Advances

in Spatial Econometrics. Methodology, Tools and Applications Eds L Anselin, R J

G M Florax, S J Rey (Springer Verlag, Berlin) pp 241 - 264

Loader C, 1999 Local Regression and Likelihood (Springer, New York)

Martinez W L, Martinez A R, 2002 Computational Statistics Handbook with Matlab

(Chapman & Hall, Boca Raton)

Neter J, Kutner M H, Nachtsheim C J, Wasserman W, 1996 Applied Linear Regression

Models (Irwin, Chicago)

Páez A, Uchida T, Miyamoto K, 2002, “A general framework for estimation and

inference of geographically weighted regression models: 1. location-specific

kernel bandwidths and a test for locational heterogeneity” Environment and

Planning A 34 733 - 754

Tibshirani R, 1996, “Regression shrinkage and selection via the lasso” Journal of the

Royal Statistical Society B 58(1) 267 - 288

Waller L, Zhu L, Gotway C, Gorman D, Gruenewald P, 2007, “Quantifying geographic

variations in associations between alcohol distribution and violence: a comparison

of geographically weighted regression and spatially varying coefficient models”

Stochastic Environmental Research and Risk Assessment 21(5) 573 - 588

Wheeler D, 2007, “Diagnostic tools and a remedial method for collinearity in

geographically weighted regression” Environment and Planning A 39(10)

Wheeler, D, Calder C, 2007, “An assessment of coefficient accuracy in linear regression

models with spatially varying coefficients” Journal of Geographical Systems 9(2)

145 - 166

Wheeler D, Tiefelsdorf M, 2005, “Multicollinearity and correlation among local

regression coefficients in geographically weighted regression” Journal of

Geographical Systems 7 161 - 187

Tables

Method RMSPE(y) RMSE(y)

GWR 11.074 2.640

GWRR 9.808 2.800

GWL - global 9.946 2.197

GWL - local 7.483 2.687

Table 1. RMSPE and RMSE of the response variable for the GWR, GWRR, GWL-

global, and GWL-local models using the Columbus crime data

ID k p1 p

2 p

1 27.60 0.996 0.995 0.136

2 87.66 0.992 0.992 0.001

5 21.29 0.995 0.993 0.188

27 24.25 0.997 0.690 0.947

33 35.45 0.865 0.949 0.045

67 29.49 0.994 0.982 0.371

114 40.58 0.739 0.988 0.283

116 39.45 0.579 0.996 0.922

153 38.38 0.737 0.999 0.609

158 21.94 0.955 0.942 0.006

Table 2. Record number, condition index (k), and variance-decomposition proportions (p1

= intercept, p2 = drug, p3 = alcohol) for the Houston crime data

Method RMSPE(y) RMSE(y)

GWR 0.720 0.342

GWRR 0.713 0.349

GWL - global 0.714 0.300

GWL - local 0.590 0.311

Table 3. RMSPE and RMSE of the response variable for the GWR, GWRR, GWL-

global, and GWL-local models using the Houston crime data

Correlation

r = 0.00 r = 0.74 r = 0.93 r = 0.99

Method RMSPE(y)

GWR 1.187 1.154 1.158 1.174

GWRR 1.187 1.153 1.158 1.168

GWL - global 1.181 1.144 1.148 1.158

GWL - local 0.928 0.932 0.954 0.959

RMSE(y)

GWR 0.856 0.873 0.869 0.860

GWRR 0.856 0.873 0.871 0.877

GWL - global 0.849 0.862 0.858 0.821

GWL - local 0.662 0.675 0.669 0.700

RMSE(B)

GWR 0.503 0.553 0.689 1.586

GWRR 0.504 0.554 0.691 1.515

GWL - global 0.499 0.549 0.686 1.513

GWL - local 1.815 2.147 2.101 1.991

Table 4. RMSPE and RMSE of the response variable and RMSE of the regression

coefficients for each model used in the simulation study at four levels of explanatory

variable correlation

Figures

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5

-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5

-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2

Figure 1. GWR estimated coefficients (left) and GWL-local estimated coefficients (right)

for the income (B1) and housing value (B2) covariates in the Columbus crime data

Figure 2. Number of violent crimes per person in Houston in year 2000

0.00.51.01.52.02.5

-2 -1 0 1

GWR Be ta1

GWR B eta2

153

114

116

158

0.0 0.5 1.0 1.5 2.0 2.5

-2 -1 0 1

GWL-global Beta1

GWL-global Beta2

153

114

116

158

Figure 3. GWR estimated coefficients (left) and GWL-global estimated coefficients

(right) for the drug (Beta1) and alcohol (Beta2) covariates in the Houston crime data

0.0 0.5 1.0 1.5 2.0 2.5

0.0 0.5 1.0 1.5

GWR Beta1

GWL-local Beta1

Figure 4. GWR estimated coefficients (x-axis) and GWL-local estimated coefficients (y-

axis) for the drug (Beta1) covariate in the Houston crime data

x coordinate

y coordinate

1234567891011121314

0.2

0.4

0.6

0.8

1.0

1.2

1.4

x coordinate

y coordinate

1234567891011121314

4.5

5.0

5.5

6.0

6.5

7.0

Figure 5. Coefficient patterns for the first two *

parameters for one realization of the

coefficient process in the simulation study. The left plot is *

and the right plot is *

x coordinate

y coordinate

1234567891011121314

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

x coordinate

y coordinate

1234567891011121314

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

Figure 6. Coefficient estimates for 4

from GWR (left) and GWL-local (right) for one

realization of the coefficient process in the simulation study

COMPARATIVE STUDY IN ADDRESSING MULTICOLLINEARITY USING LOCALLY COMPENSATED RIDGE-GEOGRAPHICALLY WEIGHTED REGRESSION (LCR-GWR) AND GEOGRAPHICALLY WEIGHTED LASSO (GWL)

Article

Jan 2024

In spatial data, multicollinearity and spatial heterogeneity are often encountered simultaneously. To overcome the problem of heterogeneity in spatial data, GWR method can be used but this method can only overcome heterogeneity but not multicollinearity. Therefore, another method is needed to overcome multicollinearity in spatial data. The purpose of this study is to look at the ability of LCR-GWR and GWL methods to overcome multicollinearity problems simultaneously. The best method is determined by the results of the study which has smaller AIC and RMSE values. The results showed that the GWL method has lower AIC and RMSE values compared to the LCR-GWR model. Therefore, it can be said that GWL is better able to overcome multicollinearity and spatial heterogeneity in Income data compared to LCR-GWR.

Assessing the effectiveness of the Italian Citizenship Income on tackling poverty and inequalities: evidences from Italian municipalities

Preprint

Full-text available

Apr 2024

This paper evaluates the influence of multidimensional phenomena on a guaranteed minimum income policy aimed at supporting the incomes of Italian families in difficulty, namely the Italian Citizenship Income, from 2018 to 2022. We implement a variety of spatial econometric models that relate the number of households benefiting from income support interventions with wealth and poverty indicators, including the average per capita income, share of poverty, and the Gini index. Spatial models handle the strong spatial heterogeneity exhibited by the recipient households by grouping municipal units into homogeneous and spatially-contiguous groups and estimating local relationships. In this way, we are enabled to evaluate how geographical and local factors influence the effectiveness of income support policies. Results show that the presence of multidimensional phenomena significantly influences the requests for income support. However, the sign and the magnitude of the estimated correlation strongly depend on the type of indicator used and by the local structural characteristics. Also, a remarkable augment in term of complexity of the social phenomenon and spatial heterogeneity throughout the period of interest. We estimate positive and statistically significant correlations regarding per capita income and the share of municipal poverty, in particular where both higher socio-economic vulnerability and low-income levels persist. Also, we observe that where both average per capita income and income inequality are high, the policy was unable to reach potential household targets, while in areas characterized by low income but lower income inequality, the income support reached a high number of households. JEL Classification: H53 , I38 , R12 , C21

Understanding cross-data dynamics of individual and social/environmental factors through a public health lens: explainable machine learning approaches

Article

Full-text available

Oct 2023

Introduction The rising prevalence of obesity has become a public health concern, requiring efficient and comprehensive prevention strategies. Methods This study innovatively investigated the combined influence of individual and social/environmental factors on obesity within the urban landscape of Seoul, by employing advanced machine learning approaches. We collected ‘Community Health Surveys’ and credit card usage data to represent individual factors. In parallel, we utilized ‘Seoul Open Data’ to encapsulate social/environmental factors contributing to obesity. A Random Forest model was used to predict obesity based on individual factors. The model was further subjected to Shapley Additive Explanations (SHAP) algorithms to determine each factor’s relative importance in obesity prediction. For social/environmental factors, we used the Geographically Weighted Least Absolute Shrinkage and Selection Operator (GWLASSO) to calculate the regression coefficients. Results The Random Forest model predicted obesity with an accuracy of >90%. The SHAP revealed diverse influential individual obesity-related factors in each Gu district, although ‘self-awareness of obesity’, ‘weight control experience’, and ‘high blood pressure experience’ were among the top five influential factors across all Gu districts. The GWLASSO indicated variations in regression coefficients between social/environmental factors across different districts. Conclusion Our findings provide valuable insights for designing targeted obesity prevention programs that integrate different individual and social/environmental factors within the context of urban design, even within the same city. This study enhances the efficient development and application of explainable machine learning in devising urban health strategies. We recommend that each autonomous district consider these differential influential factors in designing their budget plans to tackle obesity effectively.

Ridge regularization for spatial autoregressive models with multicollinearity issues

Article

Apr 2024

This work proposes a new method for building an explanatory spatial autoregressive model in a multicollinearity context. We use Ridge regularization to bypass the collinearity issue. We present new estimation algorithms that allow for the estimation of the regression coefficients as well as the spatial dependence parameter. A spatial cross-validation procedure is used to tune the regularization parameter. In fact, ordinary cross-validation techniques are not applicable to spatially dependent observations. Variable importance is assessed by permutation tests since classical tests are not valid after Ridge regularization. We assess the performance of our methodology through numerical experiments conducted on simulated synthetic data. Finally, we apply our method to a real data set and evaluate the impact of some socioeconomic variables on the COVID-19 intensity in France.

Local Signal Detection on Irregular Domains with Spatially Varying Coefficient Model

Thesis

Full-text available

Dec 2021

In areas such as spatial analysis and time series analysis, it is essential to understand and quantify spatial or temporal heterogeneity. In this dissertation, we focus on a spatially varying coefficient model, in which spatial heterogeneity is accommodated by allowing the regression coefficients to vary in a given spatial domain. We propose a model selection method for spatially varying coefficient models using penalized bivariate splines. It uses bivariate splines defined on triangulation to approximate nonparametric varying coefficient functions and minimizes the sum of squared errors with local penalty on L2 norms of spline coefficients for each triangle. Our method partitions the region of interest using triangulation and provides efficient approximation of irregular domains. In addition, we propose an efficient algorithm to obtain the proposed estimator using the local quadratic approximation. We also establish the consistency of estimated nonparametric coefficient functions and the estimated null regions. Moreover, we develop model confidence regions as the inference tool to quantify the uncertainty of the estimated null regions. The numerical performance of the proposed method is evaluated in both simulation case and real data analysis.

Modeling variation in mixture effects over space with a Bayesian spatially varying mixture model

Article

Feb 2024
STAT MED

Mixture analysis is an emerging statistical tool in epidemiological research that seeks to estimate the health effects associated with mixtures of several exposures. This approach acknowledges that individuals experience many simultaneous exposures and it can estimate the relative importance of components in the mixture. Health effects due to mixtures may vary over space driven by to political, demographic, environmental, or other differences. In such cases, estimating a global mixture effect without accounting for spatial variation would induce bias in effect estimates and potentially lower statistical power. To date, no methods have been developed to estimate spatially varying chemical mixture effects. We developed a Bayesian spatially varying mixture model that estimates spatially varying mixture effects and the importance weights of components in the mixture, while adjusting for covariates. We demonstrate the efficacy of the model through a simulation study that varies the number of mixtures (one and two) and spatial pattern (global, one‐dimensional, radial) and magnitude of mixture effects, showing that the model is able to accurately reproduce the spatial pattern of mixture effects across a diverse set of scenarios. Finally, we apply our model to a multi‐center case‐control study of non‐Hodgkin lymphoma (NHL) in Detroit, Iowa, Los Angeles, and Seattle. We identify significant spatially varying positive and inverse associations with NHL for two mixtures of pesticides in Iowa and do not find strong spatial effects at the other three centers. In conclusion, the Bayesian spatially varying mixture model represents a novel method for modeling spatial variation in mixture effects.

Research Status and Prospects of Apportionment Methods for Heavy Metal Pollution Sources of Farmland Soil at Regional Scale

Article

Jan 2024

海元丁

Patterns and predictors of racial/ethnic disparities in HIV care continuum in the Southern USA: protocol for a population-based cohort study

Article

Full-text available

Dec 2023

Introduction Health disparities exist at every step of the HIV care continuum (HCC) among racial/ethnic minority population. Such racial/ethnic disparities may have significantly delayed the progress in HCC in the Southern US states that are strongly represented among geographic focus areas in the 2019 federal initiative titled ‘Ending the HIV Epidemic: A Plan for America’. However, limited efforts have been made to quantify the long-term spatiotemporal variations of HCC disparities and their contributing factors over time, particularly in the context of COVID-19 pandemic. This project aims to identify the spatiotemporal patterns of racial disparities of each HCC outcome and then determine the contribution of contextual features for temporal change of disparities in HCC. Methods and analysis This cohort study will use statewide HIV cohort data in South Carolina, including all people living with HIV (PLWH) who were diagnosed with HIV in 2005–2020. The healthcare encounter data will be extracted from longitudinal EHR from six state agencies and then linked to aggregated county-level community and social structural-level data (eg, structural racism, COVID-19 pandemic) from multiple publicly available data sources. The South Carolina Revenue of Fiscal and Affairs will serve as the honest broker to link the patient-level and county-level information. We will first quantify the HCC-related disparities by creating a county-level racial/ethnic disparity index (RDI) for each key HCC outcomes (eg, HIV testing, timely diagnosis), examine the temporal patterns of each RDI over time and then using geographical weighted lasso model examine which contextual factors have significant impacts on the change of county-level RDI from 2005 to 2020. Ethics and dissemination The study was approved by the Institutional Review Board at the University of South Carolina (Pro00121718) as a Non-Human Subject study. The study’s findings will be published in peer-reviewed journals and disseminated at national and international conferences and through social media.

A Review on Geographically Weighted Methods and their Future Directions地理的加重法の研究動向と今後の展望

Article

Jan 2021

Geographically weighted (GW) method is a type of spatial statistical framework. GW methods have been developed to tackle spatial heterogeneity in data, with a kernel that moves across geographical space. The GW method applies to a wide range of statistical analysis methods to explore the local geographical characteristics of data and its relationships in bivariate and multivariate data analysis. GW methods currently include (generalized) linear regression, summary statistics, and principal components analysis. They have further potentials to be extended to any statistical methods. To discuss future directions of GW method developments, we reviewed previous works regarding the state-of-art GW methods and available software and tools. As its customization is flexible, the GW method is feasible for any spatial phenomenon in cases where spatial heterogeneity is to be considered.

Influence diagnostics in geographically weighted ridge regression

Article

Sep 2023

Semra Turkan

A Family of Geographically Weighted Regression Models

Article

Full-text available

Dec 2001

James P. Lesage

A Bayesian treatment of locally linear regression methods intro-duced in McMillen (1996) and labeled geographically weighted regres-sions (GWR) in Brunsdon, Fotheringham and Charlton (1996) is set forth in this paper. GWR uses distance-decay-weighted sub-samples of the data to produce locally linear estimates for every point in space. While the use of locally linear regression represents a true contribution in the area of spatial econometrics, it also presents problems. It is ar-gued that a Bayesian treatment can resolve these problems and has a great many advantages over ordinary least-squares estimation used by the GWR method.

Studies in Operational Regional Science

Book

Jan 1988

Luc Anselin

1: Introduction.- 2: The Scope of Spatial Econometrics.- 3: The Formal Expression of Spatial Effects.- 4: A Typology of Spatial Econometric Models.- 5: Spatial Stochastic Processes: Terminology and General Properties.- 6: The Maximum Likelihood Approach to Spatial Process Models.- 7: Alternative Approaches to Inference in Spatial Process Models.- 8: Spatial Dependence in Regression Error Terms.- 9: Spatial Heterogeneity.- 10: Models in Space and Time.- 11: Problem Areas in Estimation and Testing for Spatial Process Models.- 12: Operational Issues and Empirical Applications.- 13: Model Validation and Specification Tests in Spatial Econometric Models.- 14: Model Selection in Spatial Econometric Models.- 15: Conclusions.- References.

Regression Shrinkage and Selection via the LASSO

Article

Jan 1996

R. J. Tibshirani

Model Selection and Multi-Model Inference: A Practical-Theoretical Approach

Book

Jan 2002

Geographically Weighted Regression: The Analysis of Spatially Varying Relationships (review)

Article

Jan 2003
GEOGR ANAL

David O'Sullivan

Conditioning Diagnostics: Collinearity and Weak Data in Regression.

Article

Mar 1994

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

Book

Jan 2002

A statistical view of some chemometrics regression tools. (With discussion)

Article

May 1993

Chemometrics is a field of chemistry that studies the application of statistical methods to chemical data analysis. In addition to borrowing many techniques from the statistics and engineering literatures, chemometrics itself has given rise to several new data-analytical methods. This article examines two methods commonly used in chemometrics for predictive modeling—partial least squares and principal components regression—from a statistical perspective. The goal is to try to understand their apparent successes and in what situations they can be expected to work well and to compare them with other statistical methods intended for those situations. These methods include ordinary least squares, variable subset selection, and ridge regression.

Local Regression Likelihood

Book

Jan 1999
TECHNOMETRICS

Clive R. Loader

This book, and the associated software, have grown out of the author’s work in the field of local regression over the past several years. The book is designed to be useful for both theoretical work and in applications. Most chapters contain distinct sections introducing methodology, computing and practice, and theoretical results. The methodological and practice sections should be accessible to readers with a sound background in statistical meth- ods and in particular regression, for example at the level of Draper and Smith (1981). The theoretical sections require a greater understanding of calculus, matrix algebra and real analysis, generally at the level found in advanced undergraduate courses. Applications are given from a wide vari- ety of fields, ranging from actuarial science to sports. The extent, and relevance, of early work in smoothing is not widely appre- ciated, even within the research community. Chapter 1 attempts to redress the problem. Many ideas that are central to modern work on smoothing: local polynomials, the bias-variance trade-off, equivalent kernels, likelihood models and optimality results can be found in literature dating to the late nineteenth and early twentieth centuries. The core methodology of this book appears in Chapters 2 through 5. These chapters introduce the local regression method in univariate and multivariate settings, and extensions to local likelihood and density estima- tion. Basic theoretical results and diagnostic tools such as cross validation are introduced along the way. Examples illustrate the implementation of the methods using the locfit software. The remaining chapters discuss a variety of applications and advanced topics: classification, survival data, bandwidth selection issues, computa- vi tion and asymptotic theory. Largely, these chapters are independent of each other, so the reader can pick those of most interest. Most chapters include a short set of exercises. These include theoretical results; details of proofs; extensions of the methodology; some data analysis examples and a few research problems. But the real test for the methods is whether they provide useful answers in applications. The best exercise for every chapter is to find datasets of interest, and try the methods out! The literature on mathematical aspects of smoothing is extensive, and coverage is necessarily selective. I attempt to present results that are of most direct practical relevance. For example, theoretical motivation for standard error approximations and confidence bands is important; the reader should eventually want to know precisely what the error estimates represent, rather than simply asuming software reports the right answers (this applies to any model and software; not just local regression and loc- fit!). On the other hand, asymptotic methods for boundary correction re- ceive no coverage, since local regression provides a simpler, more intuitive and more general approach to achieve the same result. Along with the theory, we also attempt to introduce understanding of the results, along with their relevance. Examples of this include the discussion of non-identifiability of derivatives (Section 6.1) and the problem of bias estimation for confidence bands and bandwidth selectors (Chapters 9 and 10).

Generating Models by the Expansion Method: Applications to Geographical Research*

Article

Sep 2010
GEOGR ANAL

Emilio Casetti

In this paper an expansion method for the construction and modification of models is defined. It consists of a procedure whereby a terminal model is generated from an initial one by making the parameters of the latter function of some variables. The usefulness of the method for arriving at improved predictive models, for testing hypotheses, and for removing inadequacies of theoretical models is demonstrated by a number of examples.

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso

Abstract and Figures

Recommended publications

Hyper-local geographically weighted regression: Extending GWR through local model selection and loca...

Geographically Weighted Regression

Visualizing and Diagnosing Coefficients from Geographically Weighted Regression Models

Geographically Weighted Regression Analysis of Cardiovascular Diseases: Evidence from Canada Health...

Comparing spatially varying coefficient models: A case study examining violent crime rates and their...