Science topic

Spatial Autocorrelation - Science topic

Explore the latest questions and answers in Spatial Autocorrelation, and find Spatial Autocorrelation experts.
Questions related to Spatial Autocorrelation
  • asked a question related to Spatial Autocorrelation
Question
1 answer
I'm working on a data set consisting of data about geographical districts. Some of the variables are continuous for which I manage to calculate global and local autocorrelation using R package spdep. The data set also has key variables in 9-level ordinal scale. I'm a confused about how to calculate autocorrelation for such data. I have understood that at least for binary nominal data (such as presence/absence) joint count -statistic would be a fitting method, but can it be extented to ordinal or multilevel nominal data?
Any help is greatly appreciated!
Relevant answer
Use multiple linear regression where nominal variables will be classified as dummy variables. I hope I helped.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Geospatial artificial intelligence (GeoAI) is not just a tool but a potential revolution in geospatial data analysis. It is rapidly becoming a powerful force, providing unprecedented insights into complex environmental and societal challenges. From mapping and modeling land cover changes to mapping flood-risk areas, GeoAI is set to transform geospatial decision-making, harnessing the power of machine learning (ML) and deep learning (DL) for more effective solutions.
GeoAI models can more efficiently analyze massive geospatial datasets such as high-resolution satellite imagery than traditional methods. This allows us to uncover patterns and trends that would be difficult to detect manually. It also helps automate complex tasks such as feature engineering and improve predictions. GeoAI is also valuable for solving real-world problems such as urban planning, forest conservation, disaster risk management, and climate change adaptation.
However, despite these benefits, GeoAI has its challenges. The complexity of ML and DL models often results in a lack of transparency due to their "black box" nature. This makes it difficult for users to trust the results. Data quality can significantly impact the performance of GeoAI models, leading to potential biases or inaccuracies in predictions. Furthermore, spatial autocorrelation and spatial heterogeneity are not readily incorporated into GeoAI, limiting its ability to capture the underlying spatial dynamics of geospatial data fully. Interpreting these results requires specialized knowledge, which may limit the accessibility of GeoAI for broader audiences.
Please share your experiences or thoughts on how we can effectively balance the benefits of GeoAI with the need for transparency and trust in geospatial data analysis.
Are you interested in learning more? You can download the ebook GeoAI Unveiled: Case Studies in Explainable GeoAI for Environmental Modeling here: https://aigeolabs.com/books/geoai/.
Relevant answer
Answer
Great job! Keep up the good work. This information is super helpful. I'm really looking forward to reading this book.
  • asked a question related to Spatial Autocorrelation
Question
6 answers
For my master thesis, I am working on Mobile Laser Scanner data which my duty is Extraction of Powerlines Poles. My data is about 10 Kilometers long and has approximately 60 the powerline poles. Fortunately, my algorithm has extracted 58 the poles correctly and two others poles were not completely extracted by Mobile Laser Scanner system which caused proposed algorithm can not extract them. The proposed algorithm is completely automatic and does not need many parameters for extraction.
My main question is that which circumstances do need my implementation to be published in a good ISI journal?
Relevant answer
Answer
To get your thesis accepted in a reputable ISI journal, ensure that your implementation is:
1. Original: Offers a novel contribution or solution.
2. Well-Researched: Thoroughly reviews and builds on existing literature.
3. Methodologically Sound: Uses clear, reproducible methods.
4. Significant: Demonstrates clear advantages or improvements over existing work.
5. Well-Written: Communicates ideas clearly and is well-organized.
Meeting these criteria increases your chances of acceptance.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Hello all,
I am trying to learn how to conduct a Moran's I test in R for my 4 species distribution models generated in MaxEnt. I want to be able to show that my four models hopefully show little spatial autocorrelation and do not need to be redone.
I have found lots of people discussing the packages and functions used to complete this task but no scripts that are useful to learn from. I would like to understand the meanings behind the code and how it works. I was wondering if anyone had any tips or R-scripts that would help me?
Any direct help/useful information would be greatly appreciated.
Kind regards,
William
Relevant answer
Answer
Dear William,
it is really simple, just two lines of code. If you have loaded the "raster" package, the RasterLayer containing the predictions and the RasterLayer containing the occurrence (presence/background) data, then do the following:
1) calculate the difference between the predictions and the occurrence (i.e. the residuals). Please note that this calculation assumes that all non-presence raster cells (=background cells) are absences, which is not a valid assumption in the case of a presence-only dataset.
2) calculate the global Moran's I measure of the residuals. Please note that Moran's I, by definition, should be calculated using the rook's case weight matrix, hence, you should change the w parameter of function Moran() from its default value, which is queen's case.
Here is a sample code:
library(raster)
predictions = raster("file_name_of_predictions.tif")
occurrences = raster("file_name_of_occurrences.tif")
residuals = predictions - occurrences
Moran(x = residuals, w = matrix(data = c(0, 1, 0, 1, 0, 1, 0, 1, 0), nrow = 3, ncol = 3))
HTH,
Ákos
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I'm wondering if you should differ between presence data of highly mobile species such as Raptors and immobile species (e.g. Plants). The dispersal of plants is limited to a certain distance, so the occurance might be clustered because of that. Birds on the other hand should be able to search for suitable nesting sites. If nesting sites are close together, could that be an indicator for great suitability?
Thanks, Tim
Relevant answer
Answer
Spatial autocorrelation can have different implications and interpretations depending on the specific context and characteristics of the species being studied. It is not always considered a problem in habitat suitability modeling, but rather a phenomenon that needs to be understood and accounted for appropriately.
In the case of highly mobile species like raptors, their ability to disperse and explore larger areas may result in a more random or dispersed distribution of presence points. Spatial autocorrelation may still be present, but it might be weaker compared to immobile species due to their greater mobility. In this situation, the presence of raptors in close proximity could indicate suitable foraging or nesting areas, rather than clustering due to limited dispersal.
On the other hand, for immobile species like plants, their dispersal is often limited, and their distribution can exhibit spatial autocorrelation and clustering. This clustering can be an indicator of suitable habitat conditions, as certain environmental factors may favor the growth and survival of plants in specific locations. The limited dispersal range of plants can lead to localized colonization and establishment, resulting in spatially clustered occurrences.
When modeling habitat suitability for immobile species, it is important to consider the potential influence of spatial autocorrelation. Techniques such as spatial filtering, spatial weighting, or spatially explicit modeling approaches can be used to account for spatial autocorrelation and ensure that the modeling process appropriately captures the relationships between environmental variables and species occurrences. These techniques can help mitigate the impact of spatial autocorrelation and provide more accurate estimations of habitat suitability.
The presence of spatial autocorrelation in habitat suitability modeling should be assessed in relation to the characteristics and mobility of the species being studied. While spatial autocorrelation can be informative and indicative of suitable habitat conditions for immobile species, it may have different implications for highly mobile species. Understanding these differences and tailoring the modeling approach accordingly is crucial for accurate and meaningful habitat suitability assessments.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
A Moran I (spatial autocorrelation) has been prepared in Arc Map 10.4 and GeoDa for comparison. Please find attached those results for your valuable input.
Relevant answer
Answer
The main difference between Moran I in ArcMap 10.4 and GeoDa is the way that they calculate the spatial lag. In ArcMap 10.4, the spatial lag is calculated using a queen contiguity matrix, while in GeoDa, the spatial lag can be calculated using a variety of different matrices, including queen contiguity, rook contiguity, and k nearest neighbors.
The queen contiguity matrix is a binary matrix that indicates whether or not two cells are adjacent to each other. The rook contiguity matrix is also a binary matrix, but it only indicates whether or not two cells are horizontally or vertically adjacent to each other. The k nearest neighbors matrix is a weighted matrix that indicates the distance between each cell and its k nearest neighbors.
The choice of spatial lag matrix can affect the results of the Moran I test. For example, if the spatial lag is calculated using a queen contiguity matrix, then the Moran I test will be more sensitive to spatial autocorrelation that is present in the data. However, if the spatial lag is calculated using a rook contiguity matrix, then the Moran I test will be less sensitive to spatial autocorrelation that is present in the data.
In addition to the choice of spatial lag matrix, the results of the Moran I test can also be affected by the size of the spatial window. The spatial window is the area around each cell that is used to calculate the spatial lag. The larger the spatial window, the more likely it is that the Moran I test will detect spatial autocorrelation. However, the larger the spatial window, the more likely it is that the Moran I test will detect spurious spatial autocorrelation.
It is important to choose the spatial lag matrix and spatial window carefully when conducting a Moran I test. The choice of these parameters can have a significant impact on the results of the test.
Here are some additional tips for conducting a Moran I test:
  • Use a variety of different spatial lag matrices and spatial windows to see how the results change.
  • Use a robust Moran I test to reduce the impact of outliers.
  • Use a Monte Carlo simulation to assess the significance of the Moran I test.
I hope this helps! . Please recommend my reply if you find it useful .Thanks
  • asked a question related to Spatial Autocorrelation
Question
2 answers
Both Ripley's k-function and Moran's index measure the statistically significant clustering within data. However, how to know, which method is performing better for our data?
What are the advantages and disadvantages of each method which can help to choose a better method?
Relevant answer
Answer
Maybe of minor concern, but let me note that it is kind of misleading to call Ripley's K and Moran's I methods of 'cluster analysis'. I know that ArcGIS does so, but still. Traditionally, 'cluster analysis' means making groups of initially ungrouped observations within a sample. In contrast, Ripley's K and Moran's I analyse the spatial structure of the sample. They do not create groups, but describe/test aggregation of a point pattern or spatial autocorrelation of a variable.
Ripley's K and Moran's I serve different purposes. Ripley's K describes the expected number of neighbours of any point in a point pattern across different radii. You can use it to analyse point patterns in the space, whether they are aggregated, random, or systematic. The only information you use is the coordinates (or distances between them). Moran's I analyses how a measured variable is structured in space. It uses a spatial neighbourhood matrix between the observations and a measured variable for each observation. It has a global and a local variant; with the latter, you can step across different distance classes to reveal scale-dependency in the spatial correlation. If reasonable null-hypotheses are provided, both can be used for hypothesis testing.
Anyway, it would be interesting to outline a real clustering method based on the Ripley's K or Moran's I. Maybe such a method exists already.
  • asked a question related to Spatial Autocorrelation
Question
2 answers
Trying to investigate whether the Autoregressive behaviour is as a result of Lags (SER) or errors (SEM).
Diagnistic check on the weights matrix
The output is as below
spatdiag, weights(W)
Diagnostic tests for spatial dependence in OLS regression
Fitted model
------------------------------------------------------------
logHPRICE = SPOOL + BQUARTER + WFENCE + PSIZE + NROOMS
------------------------------------------------------------
Weights matrix
------------------------------------------------------------
Name: W
Type: Distance-based (inverse distance)
Distance band: 0.0 < d <= 10.0
Row-standardized: Yes
------------------------------------------------------------
Diagnostics
------------------------------------------------------------
Test | Statistic df p-value
-------------------------------+----------------------------
Spatial error: |
Moran's I | -11.468 1 2.000
Lagrange multiplier | 55.698 1 0.000
Robust Lagrange multiplier | 55.302 1 0.000
|
Spatial lag: |
Lagrange multiplier | 2.544 1 0.111
Robust Lagrange multiplier | 2.148 1 0.143
------------------------------------------------------------
Relevant answer
Answer
The spatial error model is significant and the lag is not. This means you should run a spatial error model
  • asked a question related to Spatial Autocorrelation
Question
1 answer
When I looked up about the above error, certain answers suggested using memory.limit() to expand the memory, but the code is no longer supported in R. How to resolve this vector allocation issue?
Relevant answer
Answer
More than likely you have an error in your program.. I doubt that you have overwhelmed the memory. Check your code. Best wishes David Booth
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I am seeking suggestion on choosing the right projection for map shapefile data considering that the area extends beyond 30 degrees and thereby has an error message during Global Moran I for spatial autocorrelation using ArcMap 10.3. Thanks in advance
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I am studying the effect of the land-use surrounding a location on the abundance of aphids on this location. To do this I fit a linear model with the land-use as independent variable and the abundance of aphids as the dependent variable. To check for spatial autocorrelation I plot the correlogram with the Moran I of the model residuals in function of the lag distance.
However I have multiple years of data: where the aphids have been observed each year together with the surrounding land-use. How can I account for this temporal effect? Should I incorporate a 'Year' variable in the linear model and can I then just look at the correlogram of the whole dataset?
Thanks in advance.
Relevant answer
Answer
Hi,
There are so many ways, yet I would prefer R (Rstudio) software for this.
More methods could be found here: -
doi: 10.1111/j.2007.0906-7590.05171.x
Cheers,
  • asked a question related to Spatial Autocorrelation
Question
5 answers
I have 39 datasets of georeferenced disease severity data for which I would like to conduct a spatial analysis. As a part of this analysis, I would like to compare the amount of spatial autocorrelation present in each dataset.
For disease incidence data (count-based data), there is the SADIE procedure, which is widely used for this kind of task. In contrast, for disease severity data (continuous data), I am not aware of a statistic that can be or is used in that kind of way. The most popular statistic, Moran’s I, seems to be solely used in an inferential kind of way (presence or absence of spatial autocorrelation).
I am aware that the spatial weights matrix used for calculation of Moran’s I complicates the comparison between datasets. But, given a somewhat constant spatial weights matrix between datasets (for example Inverse Distance Weighted?), wouldn’t it be possible to compare the results? In addition, this GeoDa video https://www.youtube.com/watch?v=_J_bmWmOF3I seems to indicate that a comparison based on standardized z-values is in principle possible. Nevertheless, I am not aware of a published study in which this kind of analysis was carried out.
Therefore I would like to ask: Does anyone know of such studies? Or maybe of another statistic that would be better suited for this kind of purpose?
Any suggestions would be greatly appreciated.
Regards,
Marco
Relevant answer
Dear Marco,
That test will give you a coefficient (-1 to +1), as you aware of it. So, there is no need to a special test. You can compare them. Also, for this matter you can take Z-scores into account.
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I want to calculate the slope direction for each polygon in GIS and use spatial autocorrelation analysis to find out if adjacent polygons have similar slope directions between them. I can use the spatial autocorrelation analysis tool, but the slope direction is a circular statistic, so I cannot easily use it. If anyone has experience in approaching this, I would like to know how to do it. Thank you in advance.
Relevant answer
Answer
Convert each polygon from vector to raster making sure that the aspect is retained. Then do an autocorrelation analysis of the aspect of each cell.
  • asked a question related to Spatial Autocorrelation
Question
2 answers
Hi esteemed scholars,
I will appreciate constructive any guide/resources on how to resolve issues related to autocorrelation in a gravity model.
Thank you.
Ngozi
Relevant answer
Answer
Bilateral trade flows traditionally have been analysed by means of the spatial interaction gravity model. Still, (auto)correlation of trade flows has only recently received attention in the literature. This paper takes up this thread of emerging literature, and shows that spatial filtering (SF) techniques can take into account the autocorrelation in trade flows. Furthermore, we show that the use of origin and destination specific spatial filters goes a long way in correcting for omitted variable bias in an otherwise standard empirical gravity equation. For a cross-section of bilateral trade flows, we compare an SF approach to two benchmark specifications that are consistent with theoretically derived gravity. The results are relevant for a number of reasons. First, we correct for autocorrelation in the residuals. Second, we suggest that the empirical gravity equation can still be considered in applied work, despite the theoretical arguments for its misspecification due to omitted multilateral resistance terms. Third, if we include SF variables, we can still resort to any desired estimator, such as OLS, Poisson or negative binomial regression. Finally, interpreting endogeneity bias as autocorrelation in regressor variables and residuals allows for a more general specification of the gravity equation than the relatively restricted theoretical gravity equation. In particular, we can include additional country-specific push and pull variables, besides GDP (e.g., land area, landlockedness, and per capita GDP). A final analysis provides autocorrelation diagnostics according to different candidate indicators.
  • asked a question related to Spatial Autocorrelation
Question
7 answers
Hi,
I am checking for spatial autocorrelation in my dataset. It comprises the ID of the nests, the longitude and latitude for each of the nest boxes and the number of fledged chicks for each nest box. I want to know if reproductive success is spatially autocorrelated in our bird colony.
For this, I computed the distance matrix for nest boxes to know the distance between each nest box and the rest of nest boxes. Following this, I designed distance bands (distance lags) to calculate Moran's I for each lag specifically. As I have multiple data for several years (2014-2020), I wonder if there is any way to get a mean Moran's Index of all the years, instead of calculating an index for each year.
It is my first time doing these types of analysis so any advice would be very much appreciated!!
Thank you.
Iraida
Relevant answer
Answer
Hey, Iraida Redondo García sorry for the delay.
I think that the unit will depend on which unit is the coordinate, no? If it's decimal degrees, then the distance is in decimal degrees.
  • asked a question related to Spatial Autocorrelation
Question
2 answers
I am very interested in the application of seismic noise data on the earth scale, and I have obtained two data sets. I want to use the spatial autocorrelation method (SPAC) to do some experiments, but I have no experience, and I am not clear about the processing parameters such as the applicable frequency bands.
One of my data sets is linearly distributed, and the other is nested triangles. The geometric distribution is as shown in the figure, and the coordinates are latitude and longitude.
The tool I have is mainly geopsy, and I am not very skilled. Looking forward to your guidance or other tools.
Relevant answer
Answer
Peicheng Zhuang you need to select an appropriate initial 1D velocity model in cases of non-existence of a priori geophysical information.
  • asked a question related to Spatial Autocorrelation
Question
22 answers
Dear all,
I know it might depend also in the distribution / behavior of the variable that we are studying. The sample spacing must be able to capture the spatial dependence .
But, since Kriging is very much dependent in the computed variance within lag distance, if we have few number of observations we might fail to capture the spatial dependence because we would have few pairs of points within a specific lag distance. We would also have few number of lags. Specially, when we have points with a very irregular distribution across the study area, with a lot of observation in a specific region and sparce observations in other region, this will also will affect the estimation of computed variance among lag (different accuracy).
Therefore, I think in such circumstances computing semivariogram seems useless. What is the best practices if iwe still want to use kriging instead of other interpolation methods?
Thank you in advance
PS
Relevant answer
Answer
You need to separate two questions, first there is the number and spatial pattern of the data locations used in estimating and modeling the variogram. Secondly there is the number and spatial pattern of the data locations used in applying the kriging estimator/interpolator. These are two entirely different problems. The system of equations used to determine the coefficients in the kriging estimator only requires ONE data location but the results will not be very useful or reliable. Now you must decide whether to use a "unique" search neighborhood to determine the data locations used in the kriging equations or a "moving" neighborhood. Most geostatistical software will use a "moving" neighborhood, if you use a moving neighborhood then about 25 data locations is adequate, using more may result in negative weights and larger kriging variances. Depending on the total number of data locations and the spatial pattern there may be interpolation locations where there are less than 25 data locations. Using a "unique" search neighborhood will likely result in a very large coefficient matrix to invert.
With respect to estimating and modeling the variogram you must first consider how you are going to do this. Usually this will include computing empirical/experimental variograms but for a given data set the empirical variogram is NOT unique. It will depend on various choices made by the user such as the maximum lag distance, the width of the lag classes and whether it is directional or omnidirectional. An empirical variogram does not directly determine the variogram model type, e.g. spherical, gaussian, exponential, etc. It also does not directly determine the model parameters such as sill, range.
Silva's question may seem like a reasonable one to ask but it does NOT have a simple answer. Asking it implies a lack of understanding about geostatistics and kriging.
1991, Myers,D.E., On Variogram Estimation in Proceedings of the First Inter. Conf. Stat. Comp., Cesme, Turkey, 30 Mar.-2 April 1987, Vol II, American Sciences Press, 261-281
  • 1987, A. Warrick and D.E. Myers, Optimization of Sampling Locations for Variogram Calculations Water Resources Research 23, 496-500
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I have a set of data collected as part of a hydroacoustic survey-- essentially a boat drove back and forth over a harbour and took a snapshot of the fish biomass/density underneath the boat every 5 minutes using a sonar-like device. I was worried that all of these snap-shots could be considered pseudoreplicates in that they wouldn't be independent of each other-- i.e. fish sampled at time X could be resampled at time X+1 if they happened to move with the boat. To correct for this I performed a test of spatial independence using a Moran's I test, which came back as non-significant. I also compared the delta AICs of models that included a spatial correction and the basic model with no spatial correction, and the basic model had a lower score. Does this mean that I can consider my samples collected via the hydroacoustic survey as being indpendent from one another and proceed with non-spatial corrected analyses?
Relevant answer
Answer
I am a PhD student in geography, specifically using spatial analysis in environmental geochemistry. For me I would consider that the data size that you collected and the research design about the fish species (like you said you drove back and back). The spatial autucorrelation should be a index that measured by both of these two parameters, so if your results are not significant. Maybe it need to be improved of the sampling design, or maybe it should be called spatial dependence. Here is a paper that I read for you, about the correlated spatial autocorrelation in ecology: . And another paper written by my supervisor abou the sample size on the statistical significance: . And you can find how we interprete the results of a significant z-score of std. residures in a case that we do not need (for residures it should be spatial non-autocorrelation): .Hope this help!
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Hi, I use xsmle command and db table (e.g spregsemxt) function in STATA to estimate spatial panel data. My problems are:
1. If I use xsmle command, the output given are without information on log likelihood function, moran's I and all the model selection diagnostics criteria. But we enable to specify the time specific effect, random effects and spatial fixed effect in the estimation.
2. If I use db table (e.g spregsemxt), the output complete with statistics and model selection diagnostics, BUT we can not specify the time specific effect, random effects and spatial fixed effect in the estimation.
My question is, what is the best/correct STATA command to get output for likelihood fucntion, moran's I, model selection diagnostics criteria with at the same time we can specify the model either time specific effect, random effects and spatial fixed effect.
Is there any complete reference on how analyse spatial panel data include model estimation and diagnostics?
Any suggestions are welcome and appreciated. Thanks.
Relevant answer
Answer
HI
These files can help you
  • asked a question related to Spatial Autocorrelation
Question
10 answers
 How can I get more information abut spatial panel data in STATA (command xsmle)?
i need Example
Relevant answer
Answer
Dear Lubna,
I think that the command in Stata is the following
In xsmle the spatial weighting matrix can be
a Stata matrix
a spmat object
In both cases the matrix can be standardized or not.
e.g.
a Stata matrix can be created using matrix define, imported from Mata using st matrix(”string scalar name”, real matrix) or imported from GIS softwares like GeoDa using spwmatrix gal using path to gal file, wname(name of the matrix) spmat objects are created by spmat spmat import name of the object using path to file .
the command in Stata is the following:
* spmat dta W W*, replace
The spmat dta command allows users to store an spmat object called W in the Stata memory. Notice that, to fit a model using xsmle, one must use the spatial weight matrix as a Stata matrix or an spmat object. The following spmat entry allows users to easily summarize the W object:
* spmat summarize W, links Because xsmle does not make this transformation automatically, the next step consists in the row-normalization of the W object. This can easily be performed using the following:
* spmat dta W W*, replace normalize(row)
  • asked a question related to Spatial Autocorrelation
Question
1 answer
I used Generalized Estimating Equations (geepack in R) on my regularly monitored (twice a year) plant abundance data in grid cells to evaluate their trends of density.
Since these are time-series data, GEE was an ideal non-parametric method to implement, because it takes the pairwise correlations between time points (correlation structure) in account.
But my data and the residuals of GEE are spatially correlated as well.
I found the spind package for R, which handles spatial auto-correlated data in GEE, but it seems that this function only deals with spatial autocorrelation, and the available correlation structures are very restricted (e.g. cannot choose "ar1" anymore, or use my pre-calculated correlation matrix as a "fixed" ).
So GEE can only handle temporal OR spatial autocorrelation but not BOTH?
Did I get it wrong?
Is there any way to solve this in GEE?
Thank you for any idea!
Relevant answer
Answer
Hello,
"geepack" of R has limited features. Instead you can use SAS GENMOD. See user guide ;
  • asked a question related to Spatial Autocorrelation
Question
6 answers
I am working to improve a manuscript and I have been advised to provide a map showing where the correlation is significant. Any information or links to learn about this would be quite helpful. Thank you in anticipation!
Relevant answer
Answer
What is asked to you is to create a spot map showing the areas with significant correlation. You can use specific color for the areas to show whether they have significant r or not? Say, all areas with significant r to shown as red while those areas with non-significant r may be shown as green. I presume you know the calculation of r and testing of it for significance?
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I have spatially distributed land abandonment data (binary), which I want to relate to independent variables like weather, soil characteristics etc (also spatially explicit). What are the programs in stata for performing either autoregressive probit or autologistic regression?
Thank you and regards
Rui
Relevant answer
Answer
Dear Rui,
As Cédri says, you have three libraries in R.
- McSpatial,McMillen (2013)
- spatialprobit,Wilhelm and Godinho de Matos (2013). It is a bayesian approach.
- ProbitSpatial, Martinetti and Geniaux (2015).
Best regards
  • asked a question related to Spatial Autocorrelation
Question
6 answers
Dear All,
I used my taper data to fit a variable-form taper model Kozak 2004-2 ,which is a nonlinear model. The data is longitudinal data that is irregularly spaced and unbalanced.so we need to overcome the inherent autocorrelation by using continuous-time autoregressive error structure CAR().I read some papers in which the authors use SAS /ETS to fit the models.Take A.Rojo2005 ,for example.In A.Rojo(2005),the author incorporated CAR(2) error process into the models to minimize the effect of autocorrelation inherent in the logitudinal data.I did like what Rojo said in the paper.When I add CAR(1) to the model, I can get the result of autoregressive parameter ρ1 .But when I add CAR(2),It is difficult to converge for ρ2.
Could someone can help me to incorporate CAR(2) into Kozak2004-2?
I add the paper A.Rojo(2005) .Thank you very much.
Here are my SAS codes
proc import out=work.taper
datafile='E:/zzs7.csv' dbms=csv replace; getnames=yes;
RUN; /*read data */
data fit_taper;set taper;
if p="f" then output fit_taper;
run;/*Select data for fitting*/
PROC model data=fit_taper method=marquardt sur dw collin;
exogenous bolt tht dbh;
endogenous dob ;
parms b0 0.9884 b1 0.9478 b2 0.0735 b3 0.4884 b4 -0.9783 b5 0.5511 b6 0.1 b7 0.0389 b8 -0.1579 p1 0.8 ;/*start ualue*/
dob=b0*(dbh**b1)*(tht**b2)*((1-(bolt/tht)**(1/3))/(1-(1.3/tht)
**(1/3)))**(b3*(bolt/tht)**4+b4*(1/exp(dbh/tht))
+b5*((1-(bolt/tht)**(1/3))/(1-(1.3/tht)**(1/3)))**0.1
+b6*(1/dbh)+b7*tht**(1-(bolt/tht)**(1/3))
+b8*((1-(bolt/tht)**(1/3))/(1-(1.3/tht)**(1/3)))); /*Kozak2004-2*/
fit dob ;
run;
Relevant answer
Answer
The use of temporal autocorrelation models is not a satisfactory solution to taper data. This is because the errors have a certain structure: for example the residual for height 1 meters is negatively correlated with residual at 2 meters, and the residual for height 1.3 meters is always zero and in general, residuals below breast height are negatively correlated with residuals above breast height. We have some discussion and references about this in chapter of 12 Mehtätalo & Lappi (2020).
  • asked a question related to Spatial Autocorrelation
Question
6 answers
After exploring my dataset for Ph.D. thesis and learning several spatial econometric techniques, I successfully applied ordinary least squares (OLS), logistic regression, Spatial Autoregressive models [i.e., Spatial Lag model(SLM), Spatial Error Model(SEM), Spatial Durbin Model(SDM)], and most importantly Geographically Weighted Regression (GWR), and Geographically Weighted Logistic Regression models to find evidence of spatial and socioeconomic inequality in flood risk. The performance of all regression models was significantly improved when I accounted for spatial heterogeneity at the local level compared to non-spatial global models such as OLS and logistic regression.
I am amazed that several research papers were published so far in high-rank journals based on global regression results only, which I could have done a couple of months ago. The results do not make sense because the nature of the spatial heterogeneity could prevail in flood exposure. In my view, flood exposure and/ effects of flood risk cannot be locally independent by census tracts or dissemination areas or census subdivisions; they must be spatially autocorrelated. There remain ripple effects, spillover effects or indirect effects to adjacent neighbourhoods and to the overall economy. Populations from affected or flooded neighbourhoods could move to nearby safer neighbourhoods, looking for jobs and safe accommodation. Many other indirect socio-demographic effects could prevail around the flooded neighbourhoods. Do you agree? Please, justify your response.
Relevant answer
Answer
GWR is a good method to take into account heterogeneity at the local level. I have used GWR quite a bit since it provides local regression coefficients. Try Geoda program also developed by Prof Anselin.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
I know that this is a fact but i need evidence in the form of a scientific paper.
I looked it up in books, google scholar and so on but never found something reliable.
Maybe someone has something in mind and can help me out.
Thank you!
Relevant answer
Answer
As mentioned above, most solutions are relatively straightforward. If the serial correlation is in the cross-section (clustering), most software will readily calculate cluster-adjusted standard errors. If the serial correlation is in the time dimension, my advice is usually to get rid of as much of the time dimension as possible (i.e. collapse to a cross section), but there may be alternative solutions. Do not try and apply the weird adjustments used in time-series analysis.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Hi, I want to compare three methods of spatial analysis and examine their application in crash/accident analysis. These three methods are: Cluster/Otlier Analysis by Moran's I statistic, Hot Spot Analysis by Getis-Ord & Kernel density estimation.
What do you think are the features of these methods? What are the differences between them? (For crash/accident analysis).
Relevant answer
Answer
Moran's I can provide spatial autocorrelation which is not as useful as kernel density which can provide "heat maps" of problem areas. Please look at some of my publications in RG where there is a presentation on traffic crashes in a state as well as kernel density applications in other work.
  • asked a question related to Spatial Autocorrelation
Question
6 answers
Hey everyone, I hope someone can help me. Please!
I've carried out a spatial PCA using the adegenet package in R, following Dr. Jombart's tutorial (i.e. NAs in data replaced to mean allele frequency, etc.). My problem is with the interpretation of the variance explained by each component... Obviously it's not like a regular PCA where you need all the components together in order to explain 100% of the variance in data.
Here in sPCA it's easy to see (in the screeplot for example) that combining just a couple of principal components exceeds 100% of the variance.
Showing the summary for the sPCA:
[Call: spca.genind(obj = mi_genind, xy = mi_genind$other$xy, cn = data.graph,
scannf = FALSE, nfposi = 2, nfnega = 0)]
Scores from the centred PCA
_________var___________cum___________ratio____________moran
Axis 1___1.184406_____1.184406_____0.07550004____0.3562353
Axis 2___1.022800_____2.207206_____0.14069851____0.1799373
sPCA eigenvalues decomposition:
___________eig_______________var_______________moran
Axis 1_____0.15675044______1.0088656_______0.6214918
Axis 2_____0.08220275______0.7455009_______0.4410605
###################################################
So I want to have some sort of idea whether this analysis is meaningful to explain the pattern in variability. As Jombart says in the tutorial: "The maximum attainable variance by a linear combination of alleles is the one from an ordinary PCA, indicated by the vertical dashed line on the right [of the screeplot]". I could take that value as my 100% variance and calculate the percentage explained by my Axis 1 on the sPCA... but I'm still confused because doing this to just a couple of principal components and then combining them would exceed 100% of variance explained.
Thanks for any help you can give me!
Relevant answer
Answer
The principal components (PCs) with the largest Eigen value are the most important and explain the larger variation in the data set. The eigen value is the measure of importance of the PCs. The first and second PC usually accounted for the total variance in the entire data set. You can equally consider the eigen vectors of the variables under the PCs to determine and evaluate the parameter loading/significance.
Kindly go through the attached article for your reference purpose.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Does anyone have an example to share on a statistical model that incorporates temporal and spatial autocorrelation terms simultaneously?
Examples in ecology and hydrology research would be optimal. Thanks in advance.
Relevant answer
Answer
You might want to check the following out - the first is an interesting article/tutorial that covers how to use R-INLA for performing the spatial and spatiotemporal modelling in R.
The second is an extremely useful textbook you should definitely get a hold off. You will find what you need in chapter 6 and 7. Beneath it is a link to the data and codes used in all chapters.
Most examples are epidemiological but highly applicable to your area of expertise.
Anwar.
  • asked a question related to Spatial Autocorrelation
Question
7 answers
I am working on regression modeling where geographical units (zones) are the observations. I found there is significant spatial autocorrelation in the response variable (based on Moran's I). However, when I develop an ordinary least squares regression model, the model residuals do not show significant spatial autocorrelation (based on Moran's I). In this case, do I need to go for a spatial regression model?
Relevant answer
Answer
Dear Mouni, thanks for your comment: it makes me realize that I reversed SAC in the residuals and SAC in the response.
Let start from the basic assumption of the OLS.
A basic assumption of OLS is that E(e_i|X) = 0, for all i.
When residuals are spatially autocorrelated this assumption is not satified.
So, SAC in residuals makes the OLS biased and unconsistent.
When the response is spatially autocorrelated, no basic assumption are violated, but a desirable one is: that of uncollinearity among explanatories, since X and WY are likely correlated.
So, when you have SAC in the response, OLS is not biased, but you may have wrong standard errors.
All in all, reviewer is partially right: ignoring SAC in the residuals (when you have it) is a big mistake, ignoring SAC in the response (when you have it) is a moderate mistake.
About the Moran I, you might look to their p-values, the Moran Index is not informative, per se.
I hope this helps,
Best, Rodolfo
  • asked a question related to Spatial Autocorrelation
Question
5 answers
I'm working with a mechanistic model to predict mosquito abundance with temperature input. I also have a raster with land-use classification for the same area and I now want to know how land-use dictates mosquito abundance (as land-use influences micro-climate). I think I should correct for spatial autocorrelation, but am struggling with how to do that best. One of my ideas is to find out up to what lag distance autocorrelation is present/significant and then do a ANOVA+post-hoc only on cells sampled x lag distance apart.
However, I'm confused about the different approaches amongst different functions/packages in R. On the one hand I tried lm.morantest, which computes Moran's I over the residuals of a linear model (in my case mosquito vs land-use). On the other hand I came across the raster package function , which calculates Moran's I just over the variable values (so either autocorrelation within the mosquito raster or in the land-use raster). Also sp.correlogram from the spdep package takes a variable vector as input instead of a linear model. The latter two functions give almost double the amount of autocorrelation compared to the first method.
So, I think I understand why you would check for autocorrelation in the residuals, but why in the variable input? Part of the correlation you're finding with the latter method might already be explained by your other variables as you're doing for example in linear regression? Are there options for doing something like a correlogram but with model residuals instead of variable input?
Relevant answer
Answer
for checking for spatial autocorrelation between two variables...lookup bivariate Moran's I..in Geoda..its very easy to do. You can check our publication which uses it. http://mires-and-peat.net/pages/volumes/map26/map2604.php
  • asked a question related to Spatial Autocorrelation
Question
8 answers
I am trying to find the best way to calculate energy poverty/ consumption on spatial basis using arcmap, so, what type of data do I need? and what is the best methods to do so?
Please, any suggestions of publications also can help!
Relevant answer
Answer
For any spatial data, you can calculate Moran's I, which determines the presence of spatial autocorrelation. This can be done in ArcGIS, Geoda, and R software.
You may go through the below link for determining Moran's I using Geoda: http://geodacenter.github.io/workbook/5a_global_auto/lab5a.html#the-moran-scatter-plot
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Hello! I am lookinng to use a large brown bear telemetry dataset to create a standard distribution model using MaxEnt and Wallace (R package). I currently have over 50,000 GPS points from 17 different animals, gathered at different points in time, both collected in Greece. I am trying to figure out the best way of handling the data in terms of autocorrelation. I was wondering if any of you have any advice on how autocorrelation is tested and managed in such datasets for the creation of SDMs.
Firstly I am unsure whether checking and handling autocorrelation is at all necessary for SDMs given that what I am looking to create is a suitability model for bears in Greece - wouldn't larger use of an area correlate to higher suitability in this case? I don't want to end up thinning the data in a way that excludes these habitat preferences.
I was also unsure on how Spatial autocorrelation differs from Temporal autocorrelation in this case?
Any advice would be very much appreciated.
Thank you,
Angeliki
Relevant answer
Answer
Dear Angeliki, in your analysis spatial auto-correlation refers to the data set from each of your individual cameras and temporal auto-correlation refers to the set of data measurements made at a given point in time from all cameras. The term auto-correlation indicates the process of statistical analysis of a valid data set who's quantifying variables are a defining constant.
Clearly in a given subset of the GPS points bear No. is a defining constant and so forth. As far as I understand the statistics in question, the SDM (Squared Distance Mean) may be used in your analysis. The concept of auto-correlation would indeed help if the bears were interacting territoriality.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Using GeoDa software, I have run a spatial error model and obtained coefficients for each of my independent variables, in addition to constant and lambda variables. I am now wondering how to write my spatial error equation. Any help would be appreciated.
Relevant answer
Answer
There are different ways of incorporating spatial autocorrelation into regression models through the error term. One method for instance that works with linear regression models is Simultaneous Autoregressive Model (SAR).
You can use Geoda in R for getting a correction for the regression coefficients in the case of spatial autocorrelation using Lagrange multiplier tests.
Also, although written for beginners to spatial statistics, this chapter and the references there may help you:
  • asked a question related to Spatial Autocorrelation
Question
8 answers
Obviously, the world is watching covid-19 transmission very carefully. Elected officials and the press are discussing what "the models" predict. As far as I can tell, they are talking about the SIR model: (Susceptible, Infected, Recovered). However, I can't tell if they are using a spatial model and if the spatial model they are using is point pattern or areal.This is critical because the disease has very obvious spatial autocorrelation and clustering in dense urban areas. However, there appears to be a road network effect and a social network effect. For example, are they using a Bayesian maximum entropy SIR? Or a Conditional Autoregressive Bayesian spatio-temporal model? An agent based model? Random walk?
I mean "they" generally. I'm sure different scholars are using different models, but right now I think I can find one spatio-temporal model, and what these scholars meant is that they did two cross sectional count data models (not spatial ones either) in two different time periods.
Relevant answer
  • asked a question related to Spatial Autocorrelation
Question
5 answers
Hi,
I have a long term point data of occurrence of a spatial event. I want to analyze the long tern spatial variability of these events.
can you suggest any Spatio-statistical method for such variability analysis.
You Suggestion will be appreciate.
Thank You
Relevant answer
Answer
Dear Somnath,
I suggest to check the space-time pattern mining tool of Arc Map software.
Best,
Behzad
  • asked a question related to Spatial Autocorrelation
Question
10 answers
My observations are points along a transect, irregularly spaced.
I aim at finding the distance values that maximize the clustering of my observation attribute, in order to use it in the following LISA analysis (Local Moran I).
I iteratively run Global Moran I function with PySAL 2.0, recreating a different distance-based weight matrix (binary, assigning 1 to neighbors and 0 to not neighbors) with a search radius 0.5m longer at every iteration.
At every iteration, I save z_sim,p_sim, I statistics, together with the distance at which these stats have been computed.
From these information, what strategy is best to find distances that potentially show underlying spatial processes that (pseudo)-significantly cluster my point data?
PLEASE NOTE:
  • Esri style: ArcMap Incremental Global Moran I tool identify peaks of z-values where p is significant as interesting distances
  • Literature: I found many papers that simply choose the distance with the higher absolute significant value of I
CONSIDERATIONS
Because with varying search radius the number of observations considered in the neighborhood change, thus, the weight matrix also change, the I value is not comparable
Relevant answer
Answer
Hi everyone,
after a little research, I finally came up with the answer I was looking for.
Short answer:
when using Global Moran's I index (I) with incrementally increasing distance searches (thus, changing the weight matrix at every iteration), only the the z-values are independent from both weight matrices and variable intensity variations, thus, they are comparable across multiple analyses.
The I in Moran's I statistics is not comparable across analyses, i.e, if with distance of 10m I=0.3 and distance 15m I=0.6, we cannot say that with a distance of 15m the clustering strength is double.
We could only say that in both cases there is a positive (sign of the I) spatial autocorrelation.
For the strengths, we use the z-values.
That is why ESRI plots distances in the x-axis and z-values in the y axis, indicating significant (p-value < than specified signification level) peaks as interesting distances.
For more information, it is clearly explained during a class that Luc Anselin in this Global Autocorrelation class, given in 2016 in Chicago University.
follow from minute 38 when he talks about the permutation approach.
Enjoy!
  • asked a question related to Spatial Autocorrelation
Question
4 answers
In detail, I know the 'spdep'package in R could do it, but the question is that I don't know what is the data format before to analyze in R, and some analysis code. I hope someone can help me, thank you!
Relevant answer
Answer
Hi Haibin
ESRI ArcGIS has autocorrelation analysis, you will get the interpretation result in *html format.
  • asked a question related to Spatial Autocorrelation
Question
1 answer
Hi all.
I have a panel dataset containing 5 dependent variables (X) and a single independent variable (y). The dataset is based on repeated observations on a spatial grid (T = 78, n = 686, N = 53508, where T is the number of months, and n is the number of grid cells).
I believe that y can be expressed as a function of X, but I don't know if the coefficients of such a model are static or if they too vary with T and n (in theory, the coefficients are likely to vary with n and potentially T, but I don't know if my dataset has enough observations to support either possibility).
To start with I have tried constructing a fixed-effects model using the plm R library, where n is the fixed effect. I get a reasonable R2 and all the variables are statistically significant. As well as this, the Hausman test suggests that the fixed effects model is better than a random effects one.
However, I have run a Breusch-Godfrey and Pesaran CD test, which say that my model suffers from serial correlation and cross-sectional dependence. As this is my first attempt at regression modelling I am not sure how to remedy this. What should I do to make my model more robust, and is there a way to test whether the fit coefficients vary with at least n? Is there a better way to fit a spatio-temporal model in R? Thanks in advance!
Relevant answer
Answer
Have a look at
which covers both panel data (serial auto-correlation ) and spatial dependence. As you are an academic in the UK, you can access MLwiN for free. You can stay in the R environment and use R2MLwin to use the power and flexibility of the bespoke software.
and on panel data per se
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Hello everybody !
I’m actually doing a master thesis on a bird, the Corn bunting, more specifically on the effect of agri-environment schemes (AES) on the distribution of territories of the bird during the breeding season in Belgium.
But now it’s time to work on the statistical analysis and I’m not the best on this field.
First, I’ve used quadrat sampling that are separate from each other to avoid spatial autocorrelation and each quadrats sampling have been visited only once. I’ve created a matrix in which I have, for each quadrat sampling :
- the surface area of 6 categories of fields,
- the number of Corn bunting encounter (and presence/absence),
- the surface area of a number of AES.
For the analysis, I’m considering the quadrat sampling as the terrritories of the birds inside it (it can biased the results but it’s necessary because of the lack of time).
I’m now wondering several questions :
- Is a GLM adequate for this analysis or should I use a GLMM ?
- If I’m working with the abundance of Corn bunting, is the Poisson Distribution the best one to use ? Is there any parameters to set ?
- If i’m working with the presence/absence, is the Binomial Distribution a good one ? Is there any parameters to set ?
- Is it better to work on the abundance or presence/absence ? I always heard that we are loosing informations when you change from abundance to presence/absence...
- I’m using R to do this statistical analysis, is the command « step » the best one to find the best model (by selecting « both » direction) ?
Thank you for your time and your precious help !
Relevant answer
Answer
I should say that I generally agree with Dr. Fernandes with 2 caveats:
1
  • asked a question related to Spatial Autocorrelation
Question
3 answers
I'm currently working on a project (I work in the TV industry) to predict the launch day's reach % from data in the past year
After cutting out all the variables that have Pearson correlations >0.5, I narrowed it down to 2-3 predictor variables.
However, the DW test statistic turns out to be <0.5 in most markets and I suspect a funnel shape in my scatter plot of standardized residual values against standardized predicted value
Although I know that these campaigns were launched on particular dates in the past year, I do not consider them as time series data because the time intervals are not equally spaced (new shows can be launched any time of the year, though there may be seasonality in the trend due to festivals etc.)
How do I resolve this issue of heterskedasticity and autotcorrelation in such a case?
Relevant answer
Answer
If the ones you describe as continuous are not times series I think the answer is yes. Check my suggestions on references but then you probably need a real time series person which you can find on this site. Best wishes, David
  • asked a question related to Spatial Autocorrelation
Question
8 answers
Dear All,
I think the question is clear.
Is it necessary to perform Global Moran's statistic before Anselin Local Moran's I to measure spatial autocorrelation?
I mean is it true that always we should perform Global Moran's and if the p value is in the significant range then using Local Moran's test?
Thank you.
Relevant answer
No. Local Moran and Global Moran are used for different purposes and it depends on the assumptions you are using on your research. Global Moran implies that one single statistic can account for all of your data. Local Moran will return local clusters that may or may not be correlated.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
I have sulphate (pct) and total sulphur (pct) value of numerous samples. i want to see how they relate and spatial distribution of changing ratios in the area. I plotted x,y graph and drew a trendline which did not intercept 0,0. Should i make it intercept to zero because when there is no total s there should be no sulphate. Next step is spatial distribution estimate. If it intercepts 0, i will be working on modelling the ratios (SO4/S).
Relevant answer
Answer
Thanks for answers,
Actually I do not consider the values which are below a cutoff (2pct by mass S) and extremely high values for analysis. In this case, the plot seems to have a linear trend. When I apply a linear trendline result is SulphateSulphur(pct)= 0.8xSulphur(pct) - 1. It says 80% us total S belongs to sulphate molecules but what does fixed value 1 mean here? Another problem is number of S measurements are very high around lower values. I think they dominate the trendline.
For spatial distribution, I am wondering If I can plot the distribution of (SSpct+1)/(Spct) ratios over an area and to classify possible domains using statistical tools. Thats why i am aligned to choose linear regression.
Other than that, If you suggest a better statistical approach, it will be very much appreciated
Best regards
  • asked a question related to Spatial Autocorrelation
Question
6 answers
I recently moved from distance-based techniques to model-based techniques and I am trying to analyse a dataset I collected during my PhD using the Bayesian method described in Hui 2016 (boral R package). I collected 50 macroinvertebrate samples in a river stretch (approximatively 10x10 m, so in a very small area) according to a two axes grid (x-axis parallel to the shoreline, y-axis transversal to the river stretch). For each point I have several environmental variables, relative coordinates inside the grid and the community matrix (site x species) with abundance data. With these data I would create a correlated response model (e.i. including both environmental covariates and latent variables) using the boral R package (this will allow me to quantify the effect of environmental variable as well as latent variables for each taxon). According to the boral manual there are two different ways to implement site correlation in the model: via random row-effect or by assuming a non-independence correlation structure for the latent variables across sites (in this case the distance matrix for sites has to be added to the model). As specified at page 6, the latter should be used whether one a-priori believes that the spatial correlation cannot be sufficiently well accounted for by row effect. However, moving away from an independence correlation structure for the latent variables massively increases computation time for MCMC sampling. So, my questions are: which is the best solution accounting for spatial correlation? How can be interpreted the random row-effect? Can it be seen as a proxy for spatial correlation?
Any suggestion would be really appreciated
Thank you
Gemma
Relevant answer
Answer
Thank you very much for your return. Thank you for this exchange, for links and documents too.
cordially
  • asked a question related to Spatial Autocorrelation
Question
2 answers
Hello, everyone.
I intend to control spatial autocorrelation in a generalized linear mixed model with binomial error distribution by including the distance-based Moran's eigenvector maps (dbMEM), but I am not sure whether I should include dbMEM as a fixed or random variable in the model. So, what is the best approach? Any help is welcome.
Best wishes,
Rafael.
Relevant answer
Answer
The best solution is using a spatial GLM, otherwise use dbMEM as a covariate.
Regards,
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Hi all,
I'm examining the effect of fire occurrence on visits to National Parks and Forests using a panel dataset. Since my Y variable is a count variable, I intend to do so using a negative binomial fixed effects model. However, my units of analysis are not iid; there is spatial correlation between. I created a spatial weighting matrix on ArcMap which has 3 columns: my unit ID, neighboring unit ID, and weight. However, I'm having a hard time doing anything with it on Stata. I have two questions:
1) When I do spmat import using "weights.dta", I keep getting errors that say "error in line 1 of file." I don't know why that's happening. Here's what's in line 1: realpudfid nid weight 248 249 .2736487
2) All I'm trying to do is cluster standard errors properly. Is it even possible to do this with a negative binomial model? Could someone walk me through how?
Thank you so much!
Relevant answer
Answer
So interesting Question.
Thank you
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Me and my colleague are writing our master thesis and have a few struggles in our econometric procedure. We want to use a dynamic linear model to explain a macroeconomic phenomenon, and are expecting to include lags. According to the AIC, 2 lags is suitable.
In order to check for autocorrelation in our regression model, we want to do a Breuch-Godfrey test. The test acquire to fill in lag order, and this is when we met insecurity. Should we:
1) Use a simple lm to this test and exclude the lags intended to use, or 2) should we include a model including the 2 lags we intend to use. Will the lagged model disturb and give wrong output (as the test requires you to specify lag order)?
Including the lagged model we get 95% significance up to 15 lags, which is a lot more than what the AIC expressed. With the same significance level, our basic linear model shows that 2 lags is suitable.
We have also done a Durbin-Watson test using the lagged model, which indicated no signs of autocorrelation.
We appreciate every answer we can get.
Relevant answer
Answer
The number of lags depends on the kind of autocorrelation pattern you wish to test for. One lag if AR(1) errors and so on. Wooldridge's Introductory Econometrics is a very good reading.
Are you sure your time series are stationary? That goes before checking for autocorrelation.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
I am trying to fit a Generalized Linear Model with a binomial error distribution to my data accounting for spatial autocorrelation in R. I have tried many approaches with no success. I recently discovered the spaMM package and used the glmmPQL function to fit a binomial model, as described in the help of the argument corMatern (link bellow). However, I obtained the following error: 'Error in getCovariate.corMatern(object, data = data): cannot have zero distances in "corMatern"'. Thus, do I need to exclude the closest data (with zero distance) or is there another alternative to account for spatial autocorrelation in binomial GLM? Any advice is welcome.
Relevant answer
Answer
Depending on the structure of your data and your design, a way to account for spatial autocorrelation is to average the predictions from several submodels built on random subsets of your data.
See for instance this paper by Parisien & Parks 2014: " An analysis of controls on fire activity in boreal Canada: comparing models built with different temporal resolutions"
Hope it helps.
  • asked a question related to Spatial Autocorrelation
Question
2 answers
The external predictive adaptive response (external PAR: Nettle et al. 2013. Proc Biol Sci 280: 20131343; Gluckman et al. 2005. Trends Ecol Evol 20: 527-533 ) assumes that current environment condition can predict future environment condition. In other words, the environment factors are auto-correlated. How can this kind of auto-correlation being resolved in statistical procedures designed to test the external PAR, as one of basic assumptions of statistical model is the independence between observations.
Open to any suggestions.
Relevant answer
Answer
Thanks for your excellent suggestions.
I'm sorry that I have not made the question clear. I want to predict some biological responses from climate variables (e.g. annual mean temperature). The climate variables may be some kind of time series (https://www.stat.pitt.edu/stoffer/tsa4/tsa4.pdf) . This means that the climate variables may be time dependent with the value in year t can to some extent be predicted from the value in year t - 1. However, the biological responses may not be time series (I'm not sure). In such condition, will the independent assumption of linear models be violated?
  • asked a question related to Spatial Autocorrelation
Question
8 answers
There are some methods and software prepared for spatial autocorrelation, that occur in 2-D space (with geo-coordinates). For example, I used Spatial Analysis in Macroecology (SAM). However, it seems, that the procedures implemented in SAM, like Moran I index, spatial correlation, spatial autoregression are designated for 2-D or 3-D space. However, auto-correlation within a linear habitat seems to be different, specific, type of auto-correlation. So, is there a specific method for the analysis of the spatial auto-correlation in such linear habitats or could one use typical methods, just assuming that one of two arbitrary geo-coordinate units is constant?
Relevant answer
Answer
Dear All,
thank you for taking part in this discussion. I hope I will finf optimal tool.
Best wishes
  • asked a question related to Spatial Autocorrelation
Question
2 answers
I am revisiting some data I collected eons ago with a Hewlett Packard 3721A correlator (now obsolete). I used the cross-correlation facility to determine the transit time between two sensors from which I calculated the mean velocity of the fluid. I also measured the corresponding auto-correlation function values but never used them. I vaguely recollect being told at the time that signal coherence could be calculated from the auto-correlation peak values. Now, many years later, I would like to do this but am not sure how.
One example of the data: the auto-correlation peaks for Sensors A and B are 2.6 and 2.4 respectively and the cross-correlation peak (C) is 1.8. The units I believe are milli volts. Can signal coherence be calculated from these data? If so, what equation should be used?
Relevant answer
Answer
Washim, Thank you for taking the time to answer my question and provide the most valuable references, Sincerely, John Wheeldon
  • asked a question related to Spatial Autocorrelation
Question
6 answers
Hi, I have a problem to select the best model between spatial panel (SAR and SEM) and nonspatial panel. The results of LM lag and LM error show insignificant value, but rho/lamda is significant. Does it means spatial model is not necessary? Because most of literature refer to LM tests, not rho/lambda coefficient.
Thanks.
Relevant answer
Answer
The insignificant LM tests on your SAR and SEM models indicate that there is no further spatial autocorrelation (SA) in these models. The significant rho and lamda values suggest that there was SA in the error terms of the OLS non-spatial model. To test this, you should use the Moran I test on the error terms of your OLS model. I would expect that it is significant. Your next hurdle is to choose between the SAR and SEM model. In principle, this should be based on theoretical arguments. For instance, is there a physical or economic justification for the dependent variable to be spatially autocorrelated? If there is not, the SEM model is preferable. You might also test the GSM (General Spatial Model), that combines the SAR and SEM.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Hello everybody,
I am currently trying to do a Gaussian linear regression in R with data that may be spatially autocorrelated. My dataset contains geographic coordinates (value of longitude, value of latitude), species, independent variables (BS and LTS) and some explanatory variables. The dataset also include the values of latitude and longitude in separated columns.
I extracted positive eigenvector-based spatial filters from a truncated matrix of geographic distances among sampling sites. I would like to treat spatial filters as candidate explanatory variables in my linear regression model. I did this as following:
First of all, I created a neighbor list object (nb). In my case of irregular samplings, I used the function knearneight of the R package spdep:
knea8 <-knearneight(coordinates(dataset), longlat=TRUE, k=8) neib8 <-knn2nb(knea8)
Then, I created a spatial weighting matrix with the function nb2listw of the R package spdep:
nb2listw(neib8) distgab8 <- nbdists(neib8, coordinates(dataset)) str(distgab8) fdist<-lapply(distgab8, function(x) 1-x/max(dist(coordinates(dataset)))) listwgab8 <- nb2listw(neib8, glist = fdist8, style = "B")
Then, I built spatial predictors to incorporate them in the Gaussian linear regression. I did this with the mem function of the R package adespatial, as following:
mem.gab8 <- mem(listwgab8)
Additionally, Moran's I were computed and tested for each eigenvector with the moran.randtest function, as following:
moranI8 <-moran.randtest(mem.gab8, listwgab8, 99)
I obtained some eigenvectors with significant positive spatial autocorrelation. Now, I would like to include them in the Gaussian linear regression. I tried to do this with the function ME of spdep, as following:
GLM1 <- ME(BS~LATITUDE, data=dataset, listw=listwgab8, family=gaussian, nsim=99, alpha=0.05)
Unfortunately, I receive this error:
Error in sW %*% var : Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90
How do I solve this error? Or, there is another way to perform the spatial eigenvector selection in a Gaussian linear regression?
Relevant answer
Answer
Dear Diego, Although I don't know the answer to your question, I suggest you to post it to the r-sig-geo mailing list, where it will surely be answered.
HTH,
Ákos
  • asked a question related to Spatial Autocorrelation
Question
11 answers
Hello everybody,
I am trying to convert some spatial points (given a value of latitude and a value of longitude) to a neighbour list in R. I need this in order to perform a linear regression with a spatial autocorrelation.
I have a list of individual animal species occurrences worldwide, each one with a single value of latitude and longitude. I would need to create an object class nb (neighbour list), but I do not know how to do this conversion in R.
My data looks like:
SPECIES: LATITUDE, LONGITUDE species A: -85, 134 species B: 34 , 2 species B: 42, 3 species B: 45, 5 species C: -2, 80 species C: -5, 79 (...)
The dataset also contains other columns with the values of certain variables, but I think this is not important for my purpose.
Any help will be appreciated. Thank you in advance
EDIT:
I am using the package "spdep". First of all, I converted my data frame to a spatial object:
coordinates(dataset) <- ~ LONGITUDE + LATITUDE
Then, I am trying to create a graph-based neighbours list, computed from polygons (making a Delauney triangulation). However, when I try the following:
delau <- rgeos::gDelaunayTriangulation(dataset)
neib <- poly2nb(delau)
I receive an error that I cannot find how to figure it out: Error in rgeos::gDelaunayTriangulation(dataset) : duplicate points not permitted
Would anybody know how to solve this?
Relevant answer
Answer
Hi Diego,
Sorry my bad I forgot about the fact the two different species could occur in a same point.
Try this approach
require(dplyr)
require(reshape2)
species_data <- dataset %>%
select(LONGITUDE, LATITUDE, SPECIES )
species_data <- dcast(LONGITUDE + LATITUDE ~ SPECIES, data = especies_data)
species_data <- especies_data %>%
group_by(LONGITUDE, LATITUDE)
summarise_all(.,sum(.))
envir_var <- dataset %>%
select(-c(SPECIES, CLASS, FAMILY))
envir_var <- envir_var %>%
group_by(LONGITUDE, LATITUDE, OCEAN) %>%
summarise_all(.,mean(., na.rm = TRUE))
data <- plyr::join(envir_var, species_data)
coordinates(data) <- ~ LONGITUDE + LATITUDE
delau <- rgeos::gDelaunayTriangulation(data)
neib <- poly2nb(delau)
This should solve your problem. Could be some typo errors in my code, but I hope that u can get the idea how to solve your problem.
Regards,
Moreno
  • asked a question related to Spatial Autocorrelation
Question
7 answers
conceptually, what is the difference between autocorrelation and partial autocorrelation
In decing the lag lenght for AR or ARMA Models which one should be considered.
thanks in advance
Relevant answer
Answer
Autocorrelation of lag k is the correlation between Xt and Xt+k where the time series is {Xt}. The partial autocorrelation of lag k is the conditional correlation of Xt and Xt+k given the values of Xt+1, Xt+2, ..., Xt+k-1. There lies the difference.
  • asked a question related to Spatial Autocorrelation
Question
9 answers
Hi everyone. I have applied multiple logistic regression to create a model based on my independent parameters (x, y & w). My generated model function is Z=ax+by+cw-d where Z is an exponential term including the probability of the occurrence of my dependent parameter (Z=exp(P)/(exp(p)+1)), and all of the parameters are binary.
Now in order to interpret the output, I have calculated the probability of the occurrence of my dependent variable, for all values of all possible permutations of the variables as follow:
1: x=0, y=0, w=0 ------> P=0.74%
2: x=0, y=1, w=0 ------> P=2.3%
3: x=1, y=0, w=0 ------> P=1.35%
4: x=1, y=1, w=0 ------> P=4.14%
5: x=0, y=0, w=1-------> P=1.65%
.
.
8: x=1, y=1, w=1------> P=8.83%
Since the sign of all coefficients (a, b & c) is positive, apparently the highest probability occurs when x, y and w be 1. But in this case the probability got its highest value as only 8.8%. Is this result rational?
And how can I interpret the magnitude of each independent parameter? Can I say that since all the variables are binary and have a positive coefficient, a variable with bigger coefficient have bigger impact on the probability derived from Z?
Thank you all in advance for your kind replies.
Relevant answer
Answer
Hello Mohsen,
Generally, yes … all variables being binary. However, I would argue (as you have in your opening question) that you need to transform your (logit) coefficients into probabilities. This helps in discussing and comparing the parameters in a way that the reader can understand. It’s next to impossible to communicate different types of non-linear parameters (like logit, cubic, probit, etc) in a write-up and make sense of them.
So, I think you are on the right track with transforming the regression parameters into probabilities. Keep in mind, that you don’t really have to break them down into all possible groupings. Each transformed parameter, is the increase/decrease in probability for the DV for that independent parameter =1 (assuming your binary variables are 0 or 1), HOLDING all other variables in the equation equal.
However, because these predictors have covariance (as one would expect), the probabilities are not simply additive (e.g., the probability for x & y =1 is not the same as adding up the probabilities of when only x=1 and separately when y=1; in your example at the top).
The odds ratio is a better way of showing the magnitude of for each independent parameter.
I have added a spreadsheet for doing the logit conversions, in case it is helpful.
Wishing you well,
  • asked a question related to Spatial Autocorrelation
Question
12 answers
Can UTM data for casualties and obstacles (or in general other structural elements of roads such as bends), be used as distances in spatial autocorrelation analysis?
Relevant answer
Answer
In the new version of ArcMap (10.5 and 10.6) and in ArcGIS Pro a new tool is included in the Spatia Statistics Toolbox "Similarity Search". You ca find how it works in the Online Help of the program.
  • asked a question related to Spatial Autocorrelation
Question
5 answers
Is there any differences between spatial autocorrelation and spatial non-stationarity? If yes could you explain the differences and novel methods to address them?
Relevant answer
Answer
Dear Bipin,
Yes, spatial autocorrelation and spatial non-stationarity are different concepts.
SPATIAL AUTOCORRELATION
If you measure something over space, for example the household income, it is likely that two observations that are close to each other in space are also similar in measurement. This assumption is also known as the First Law of Geography: "everything is related to everything else, but near things are more related than distant things" (Tobler, 1970). In this sense, Spatial Autocorrelation is a measure of similarity (correlation) between nearby observations. It describes the degree two which observations (values) at spatial locations (whether they are points, areas, or raster cells), are similar to each other. A commonly used statistic that describes spatial autocorrelation is Moran’s I, Geary’s C and, for binary data, the join-count index.
SPATIAL NON-STATIONARITY
Spatial nonstationarity is a condition in which a simple “global” model cannot explain the relationships between some sets of variables. The nature of the model must alter over space to reflect the structure within the data (Brundson et al., 1996). For example, you are trying to model the relationship between two variables, number of cars per person and income, in a given city. Using a global linear regression, you find that neighborhoods with higher incomes have more cars per person. However, in some neighborhoods where public transport is better this relationship changes, and even with a high average income, people tend to use public transportation. Thus, the relationship you are modeling is non-stationary throughout space. The Geographically weighted regression (GWR) is a technique mainly intended to indicate where non-stationarity is taking place on the map, that is where locally weighted regression coefficients move away from their global values.
Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(Supplement): 234-240.
Best,
Sacha
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Dear researchers,
Is it necessary for the data (640 entries and 26 variables) to follow normal distribution for me to use Spatial Lag Model or Spatial Error Model in GeoDa.
Thanks in advance.
Relevant answer
Answer
GeoDa includes tests for non-normality that may be used to determine which model would be more robust.
See the chapter on the Spatial Lag model:
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Durbin-Watson Statistics Table has three types of critical values for significance at 1%, 2.5% and 5% level. So how to choose which one to use when evaluating Durbin-Watson statistics (e.g. d=1.12)?
Thank you!
Relevant answer
Answer
Usually 5% significance level is used. However, this test is bit different from others. You need to get both lower limit and the upper limit. Please use the attach guideline to see how it is performed.
Kind regards and good luck with your research.
Thushara
  • asked a question related to Spatial Autocorrelation
Question
15 answers
Elith and Leathwick (2009) recommended the Moran's I to testing for spatial patterns in raw data and residuals. I read many literature regarding this and many packages in R, but could not perform this test. Can anyone help me with detail method in data preparation for Moran's I test for raw dataset and model residual?
Relevant answer
Answer
Pretty old question, but maybe it's still worth adding to the comments given so far:
Type of weights: Spatial weights form a crucial part in estimating spatial autocorrelation. The choice of weights depends on a range of factors like the type of spatial behaviour you assume in the analysed process, the type of data you have at hand, the scale of your process, etc. Thus, there is no easy answer regarding an appropriate choice of weigths and it is ultimately up to a specific analytical task.
Software/different p-values: I'm not familiar with ape but have used spdep. spdep allows you to calculate distance-based weights. The function dnearneigh gives you binary weights indicating whether neighbours are located at a disc of radius dmax (dmin is also possible). If you need something like IDW weights, then you need to do some (not too extensive) R programming indeed, but it is still possible. In addition, spdep also provides functionality for creating k-nearest-neighbour weights through knearneigh.
Inference about Moran's I: There are two ways of drawing inference about Moran's I. One is called the Normal assumption (N), the other one is based on Randomisations (R). The N assumption is based on the so-called Pitman-Koopmans theorem that roughly states that Moran's I is independent of its denominator under iid normal variates. This assumption allows you to evaluate p-values of Moran's I from a normal distribution regardless of sample size, but under the (somewhat strict) assumption of normality. This is obviously a very efficient way of assessing p-values. In contrast, the R assumption is based on a permutation argument that assumes random reallocations of your values within the fixed set of spatial units. Therefore, the R assumption is a non-parametric one. While N is based on sampling with replacement, R corresponds to sampling without replacement, causing statistical power to be lower (because your decision is based on less information content). In the latter case you would construct a bootstrap from recurrently calculating Moran's I a certain number of times. The (pseudo) p-values are then assessed from that bootstrap distribution. However, even under R, Moran's I converges to normality rather quickly (though this is also influenced by the layout of your spatial units/geographic region). A valuable source for this is the series of seminal books published by Cliff and Ord (1973, 1981).
  • asked a question related to Spatial Autocorrelation
Question
4 answers
In particular, my questions are:
-How to deal with abundances recorded during multiple visits (2 or more) to each sampling unit? I see that a common practice is to consider the maximum over the visits as the abundance in the sampling unit. I wonder whether is it possible to account for species detectability directly in the RDA (as in unmarked for univariate models).
-Is it possible in RDA to account for spatial non-independence of
sampling units?
Finally:
-Is it better to consider occurrence (presence-absence) or abundance in RDA analysis? Which give more robust and reliable results?
Relevant answer
Answer
Thank you Elia, the subsequent question is: how to implement random factors in RDA (in R vegan)? By partial-RDA?
Andrew, RDA per se doesn't account for spatial non-independence, I suppose. Probably there are specific techniquest to deal with it in RDA. I am wondering how...
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I have some aggregated data for 20 geographical places and would like to understand Moran results based on the data. I know that the cluster field corresponds to low P-values and higher z-scores as well as positive Moran index, however I am not sure why the cluster gives number 2 for each clustered place. And how can I interpret the results for non-spatial statisticians:
Place Ii Ei Zi p.value Xi wXj Cluster
1 0.527153 -0.05263 1.521057 0.128245 1.099403 0.818118 0
2 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
3-0.14793 -0.05263 -0.25003 0.802567 0.24454 -1.45796 0
4 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
5 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
6 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
Many thanks,
Eiman
Relevant answer
Answer
Several observations
1. Morans"I is an empirical measure of spatial correlation and is based on "randomization" (in contrast with a spatial autocorrelation function)
2.It is not a question of whether the "samples" are independent, first of all independence is a theoretical property of random variables (not of data)
3. Don't mix up confidence level and significance in a test of hypotheses. Significance is the probability of a Type I error (incorrect rejection of the null hypothesis) whereas confidence level is the related to whether the true but unknown parameter (the one being estimated) is inside the confidence interval (a confidence level is not a probability but is related to the probability that the estimatOR value is inside of an specified interval) You also have to be careful about the distinction between a one sided confidence interval and a two sided confidence interval.
4. The phrase " the correlation between variable, X, and the “spatial lag” of X formed by averaging all the values " doesn't make any sense. It is completely wrong
  • asked a question related to Spatial Autocorrelation
Question
6 answers
As a geography student, I know that spatial autocorrelation does exist everywhere, and I know that spatial autcorrelation of spatial variables violates the assumption of classic statistics, i.e., independence of sample, so we have to consider it when analyzing influence of spatial phenomena or spatial modeling. However, I still do not know to use it in reality. For example, if we know the Moran's I index of housing price in New York is 0.8, then what? What can we do with it?
Relevant answer
Answer
Hello Zhang!
As you know that, "every thing related to every thing else but near thing are more alike than farther" (Tobler). Here, the question is, how strong they related. The strength of relation can be measured by spatial autocrrelation. Moran-I is of of such index used to measure the spatial autocrrelation.
In your case Moran-I with 0.8 represents high clustering of housing prices, that means housing prices in New York cannot be random or dissimilar. Instead, they follow a spatial pattern i.e., the highest housing prices are mostly near by higher housing prices and higher are near by high like that. This pattern is natural and logical too. You can justify the natural observed phenomena by your results.
--
As you said, "I know that spatial autcorrelation of spatial variables violates the assumption of classic statistics, i.e., independence of sample, so we have to consider it when analyzing influence of spatial phenomena or spatial modeling."
Explanation: Yes you are right! we need to take it account in modeling because the high spatial dependency of spatial variable make models biased. It means estimated model may not give us the right information. It doesn't mean, the spatial auto correlation among spatial variable is a problem. It is a natural phenomena and should exist.
Now, we will re-frame our approach, we are using model to understand the natural phenomena. Since model are abstraction of reality. They cannot be perfect, now the question is 'to what extent can we trust our model?' This generally done by test the spatial autocorrelation among standard residuals from our model. And we assume standard residuals of a good model are randomly related in space. in other words, the spatial auto-correlation among the residuals are null (Moral-I is 0).
regards
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I am looking into Spatial Neighbors to address autocorrelation in my dataset, but I find it difficult to find arguments as to which method to prefer. I am using the R package "spdep" and functions dnearneigh and knearneigh to determine the distance-based neighbors and k-nearest neighbors, respectively. However, could someone advise me on the main differences between the two methods, as well as on how to determine d2 (upper distance bound) and k (number of nearest neighbors).
Relevant answer
Answer
"semivariograms and distance",there are two problems with Fararini's comment.
With only data values you can only estimate and model the variogram (using an empirical variogram). This is not an exact process, i.e. there is no way to be able to claim that you have chosen "the" model and have exactly estimated the parameters of the model. Moreover for a given (finite) data set the empirical variogram is not unique, you must choose the lag spacings and also you must distinguish between an omnidirectional empirical variogram and directional empirical variograms.
Secondly, the variogram is only indirectly related to correlation (unlike a covariance function or (auto)correlation function). A variogram model need not have a true range, e.g. the exponential and Gaussian models do not, moreover the Power models do not even have a sill. It is common to refer to a range for both the exponential and Gaussian models but this is not a theoretical result, rather it is the distance at which the value of the variogram model is 95% of the sill (95% is an arbitrary choice). If the variogram has a geometric anisotropy then the range depends on the direction (in 3D there are two direction parameters)
However the suggestion of using the empirical variogram to choose a maximum distance is not a bad one but recognize that you are choosing a distance, you are certainly NOT determining the "exact distance at which the autocorrelation stops/starts". It is critical to distinguish between an empirical variogram and a theoretical model.
Finally, "semivariogram" is long out dated terminology (since about 1988 when G. Matheron stopped using it)
To Chris Broekhoven, since you are using R packages, you might want to check references on the author of those packages. Do a search in Google for "R package dnearneigh", you will find a lot of documentation and some tutorial examples as well as for "R package knearneigh"
Both dnearestneighbor and knearestneighbor pertain to data sets where the interest is mostly or completely in the locations of the data points, variograms pertain to data sets where there is a value (for some variable, e.g. hydrologic parameter) at each data location and the interest is in how that value relates to the position coordinates.
  • asked a question related to Spatial Autocorrelation
Question
18 answers
What are the technique can be applied for crime mapping/Analysis/Prediction in Indian context?
Please provide reference
Relevant answer
Answer
Dear Firoz Ji,
Please check the following attached pdf file, hope you will find these Articles helpful.
Regards
Amit
  • asked a question related to Spatial Autocorrelation
Question
7 answers
What are the spatial issues related to housing submarket analysis?
Relevant answer
Answer
Dear All
From my analysis in city of bangalore,found that planning emphasis is high for industry than housing,however traditional forward groups as well elite have been able to win in layout allocation.With not adequate coordination among developing agents has caused high congestion.Both within as well in urban sprawl.
  • asked a question related to Spatial Autocorrelation
Question
11 answers
Hi, 
I'm currently trying to use Maxent for roadkill analysis, but one of my colleagues raised a question that using spatially correlated datasets in maximum entropy models leads to biased results and the dataset should be tested and correlated occurrences removed.
I have no clue how to test roadkill point dataset for spatial autocorrelation to get rid of correlated points. What's the easiest and the most common way to determine which points need to be removed to avoid the bias? 
If you know / have any information / articles / website for me to look over, please advise on this. 
Your direct advise to solve this issue is greatly appreciated. 
Best,
Bryan 
Relevant answer
Answer
Hi Hoegun,
Are you referring to the spatial correlation among the variables used in Maxent? Or the spatial bias in occurrence points?  If you are talking about the latter, maybe you may want to use only one point per grid when modeling.
Cheers!
Eric
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Suppose you have a set of point data which contains two variables (say var1 and var2), both of which have discrete values (say 1, 2, and 3). You try to map them (see attached files var1.jpeg and var2.jpeg for reference) and decide you want to see how similar these two data are in terms of statistics. You create a "matrix" comparing the values of the two variables (see sim.jpg) and obtain a %similarity by taking the percentage of the total count of all points which had the same value for both variables (e.g. var1 and var2 = 1, etc.) in relation to the total number of points.
Is that the only way to compare the two variables? What other methods can be performed to compare these two?
P.S.: Scatter plots are ineffective, since they only result in a set of 9 visible points (because the data is discrete, the only possible values for each axis is 1, 2 and 3, resulting in only 9 possible combinations of ordered pairs) (see scatter.jpg). Q-Q plots only show 5 points (see qq.jpg).
Thank you very much!
Relevant answer
Answer
If you want to make the statistical comparison between two functions it is possible to use the covariance functions. This is a procedure that will allow you to find statistical similarities between two time series. For this there is the theoretical definition from the integration operations and also several available computing algorithms. I recommend using the Matlab application which has built-in almost all covariance functions.
  • asked a question related to Spatial Autocorrelation
Question
10 answers
I have a question about the best way to test for violations of Hardy-Weinberg Equilibrium (HWE) among microsatellite loci for a species that is continuously distributed across a study area and showing IBD. We are looking at how landscape features affect gene flow among Eastern Indigo Snakes across a 25 x 50 km study area. We have 110 samples and about half are clustered in the southern half of the study area. A spatial correlogram of individual genetic distance shows that spatial autocorrelation among samples becomes non-significant at 5-10 km. We have used COLONY to identify full-sibs and found about 15 full-sib families although family size was usually two (max. four). There is significant IBD within our study area. STRUCTURE identifies K=4 with all 110 samples but when we randomly exclude all but one full-sib from each full-sib family STRUCTURE identifies K=1-2. We suspect that these STRUCTURE results are the result of neighborhood effects and IBD, respectively.
When we test for violation of HWE at our 15 loci, four have significant violations of HWE. Estimated null allele frequencies at these four loci are 6-15%. When we randomly excluded all but one member from each full-sib family, these four loci were still significantly out of HWE.
In a situation such as this, is it appropriate to test for HWE using all samples? I know that in systems with discrete populations researchers often test for HWE within each population, since violations may represent a mixture of multiple populations. But any designations of “populations” in our study area seem very arbitrary (e.g., driven by sampling intensity rather than the distribution of individuals).
Does anyone have any suggestions about the appropriate way(s) to test for HWE in a system such as ours?
Thanks,
Javan Bauder
Relevant answer
Answer
You expect HWE in a panmictic population. If there is population structure (IBD), why should you test for HWE?
  • asked a question related to Spatial Autocorrelation
Question
3 answers
I've been using SAS and GeoDa/ArcGIS
Relevant answer
Answer
Simple!
1) After fitting the logistic regression, estimate residuals from the model
2) Export the sample ID and residual to a text file with X and Y
3) Create a shape file from the text file of point no.2 in arc gis
4) Now you can use "Spatial Auto correlation (Moran's I)" tool under Analyzing patterns of Spatial statistic tool box in arc tool box, to estimate the Global Moran-I.
Hope this help
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Dear all,
I have data in 6 region monthly for 2 years and its coordinates (attached), I would like to run spatial autocorrelation analysis with my data in R especially with Moran's I. Which the suite method/packages for my data?
I want to know, whether the rice prices among 6 region have spatial autocorrelation? Anyone can help me please?
Best wishes and thanks before...
Yoga
Relevant answer
Answer
You may want to have a look at R package "spatstat" which can be very handy if one wants to perform statistical analyses on spatial data sets. Please follow the link:
Also, here you will find how to compute spatial auto correlation in R:
Hope this helps.
  • asked a question related to Spatial Autocorrelation
Question
7 answers
Dear all,
I am doing a TS analysis with four variables from 1980-2015. The post-estimation test statistics and p-values obtained are listed below:
1) Durbin-Watson (for autocorrelation): 2.1876; the following are p-values:
2) Breusch-Pagan (for heteroscedasticity): 0.1815
3) Breusch-Godfrey (for higher-order autocorrelation): 0.4595
4) ARCHLM: 0.9151
5) Ramsey RESET (for omitted variables): 0.5355
but that of Jarque-Bera (normality test) is 0.0238 indicating that the errors are not normally dIstributed.
Should I be worried?
Thank you.
Relevant answer
Answer
Dear Ngo,
Normality is not a big issue in ARDL because even if normality is solved the estimates does change that much. You know this of course from your elementary statistics that JB is for a large sample n=100 so if you try to add dummy using standardized residual to identify the outlying data point and the results do not improve just go ahead a use Newey-west  estimator (ignore mentioning JB) since all the nonspherical errors have been contained
  • asked a question related to Spatial Autocorrelation
Question
4 answers
Are there other reliable spatial autocorrelation tests aside from using ENM tools?
Relevant answer
Answer
you can use R statistical software for auto correlation. In that case you have to download few packages such as rgdal, raster etc.
  • asked a question related to Spatial Autocorrelation
Question
2 answers
I have a ~ 100000 point shapefile. Now I need to take sample from this points while spatial auto correlation is least. I divided the point area by several square grid spacing like 100by 100 etc. and populated attribute with number of points inside each of these grids  with intention of taking one point from each of this grids as a representative sample for logistic regression input. I checked the auto ccorrelation among the grid centers if it is negetive or at least zero. For this purpose I drew Moran I correllogram.But this process fails since I found 0 value at a distance which is too much and at that distance of grid spacing I will have only 5 grids so 5 points by which logistic tegression is impossible. So could anyone help me with best method for finding optimum grid dimension/spatial scale?
More background can be found at these links.
Relevant answer
Answer
 Hi!
I agree with Mr O'Kelly. By gridding the data you are in danger of compromising your results due to the MAUP (Modifiable Areal Unit Problem). It is important to know what you are aiming for.
A good first step would be to try performing a point pattern analysis, to check whether your data follow a clustered, random or dispersed distribution. This can be done by testing for CSR (Complete Spatial Randomness) by performing tests such as the Quadrat Method, Nearest Neighbor Index or G-function. Remember always to do a significance test for the produced result of every method. 
Since the Quadrat Method as well as the Nearest Neighbor index have issues regarding MAUP and, my suggestion would be to use the G-function. But, that also depends on what you are aiming for. If you choose the Quadrat Method, you will have to test the effect the size of the quadrat has on the result. Does the type of distribution change, if you change the quadrat size? What is a representative quadrat size for your study? etc. Moreover variations of the positioning of points within the quadrat or between neighboring quadrats will not be accounted for if you choose this method. The Nearest Neighbor Index will only detect clustering on a fine spatial scale, since only the nearest neighbor of every point is considered.
You can find more info about the G-function here: http://www.people.fas.harvard.edu/~zhukov/Spatial4.pdf
As a second step, you could compute a correlogram or variogram of your data, to check for spatial autocorrelation. By examining either you could see, if you indeed have spatial autocorrelated data, and, if this is the case, at what distance does the spatial dependence cease to exist. This can be done by checking the Sill and the Range (the distance at which the curve reaches the sill) of your function curve in the variogram. This defines the maximum distance at which there is any spatial dependence. The distance between your points in every grid should not exceed this distance. 
Another way to determine the size of your grids, after you have checked the above, would be to see if your data is clustered by implementing a non-hierarchical clustering method such as K-means or K-medioids (since you have a big dataset). If you find any clusters, you could use the within cluster distance between your points to help you determine a suitable grid size.
Hope this was helpful.
Good Luck!
Best regards,
Karolina
  • asked a question related to Spatial Autocorrelation
Question
14 answers
I see the importance of calculating spatial autocorrelation for species occurrence data, in order to guarantee spatial independence. However, I do not understand how this is done. Could any of the colleagues working on the topic help me?
From now on I thank you.
Relevant answer
Answer
You should consider several aspects of spatial autocorrelation. That is, a first element is that of spatial autocorrelation of the environmental data using approaches like a semivariogram and spatial lag calculations. For exampleias , if you look at climate variation across a small area, there may be strong autocorrelation, such that environmental is very smooth across the area. If you look instead at satellite imagery of land cover over the same region, there may be much more detail and variation, such that more information is available. These analyses can help you to set a minimum separation distance between points, as points closer than that distance will not present independent samplings of environments.
Second, you may wish to seek clumping (autocorrelation) of the occurrence points. This will involve statistics such as Ripley's K. This may reflect differential reporting from a state or country, and these concentrations should probably be reduced further, to avoid overrepresenting particular regions and the associated environments.
Remember, autocorrelation can cause two sorts of problems: (1) bias in niche estimates owing to artificial over- and under-representation of particular environments, and (2) inflation of sample sizes and correspondingly artificial inflation of statistical power.
  • asked a question related to Spatial Autocorrelation
Question
1 answer
I am working with fish population that lives in lakes and ponds. I have GPS data (lat/long) just for the lake not for each individual.
I am trying to use auto correlation method analysis as Smousse & Peakall 1999 implemented in R package in PopGenreport. For this I estimate localization of each individual as the same of population ( It is like pile up all individuals)
I am getting some weird result: the correlogram looks "u" inverted (negative autcorrelation with small and big distances but positive with medium distances).
My guess is due all individuals in one pop are assigned the same position so the autocorrelation will fail. So, I add some noise in lat/long data ( random uniforme -1m,+1m). For "noised" data I get expected correlogram.
My question is, my approach is correct? is there any other method to manage this problem?
Smouse PE, Peakall R. Spatial autocorrelation analysis of multi-allele and multi-locus genetic microstructure. Heredity 82: 561-573
Relevant answer
You can do whatever you like with data analysis, so long as you can justify it. You can be the pioneer lol. Adding stochasticity like that to your model is a very good idea :-)
  • asked a question related to Spatial Autocorrelation
Question
5 answers
Hi,
I'm attempting to generate artificial landscapes with varying levels of spatial autocorrelation and I've read that unconditional Gaussian simulations would be able to do this. I don't have much experience with spatial modeling, but I understand that varying the beta parameter creates landscapes with values centered around that mean, with a variance that is defined (sill parameter). The levels of autocorrelation is then controlled by a range parameter, which determines the distance at which there is no correlation. Are there any equations associated with this method or is it as simple as what is stated above? In general, I am trying to gain a better understanding of this method and would appreciate any help.
Thanks
Relevant answer
Answer
I think it is as simple as you have stated. There is also the nugget which is the level of initial variation. 
This R package can handle variograms, kriging and simulations:
Cheers
Chris
  • asked a question related to Spatial Autocorrelation
Question
65 answers
I need to statistically compare two maps in order to determine if the spatial distribution  of their data is correlated or not. any suggestions? Thanks!
Relevant answer
Answer
I am replying to this question since I recently came across a similar issue. I will give my two cents here, bearing in mind that this solution applies to the specific issue at hand.
I have two rasters, each representing two path systems (actually, two least-cost paths networks). Each cell belonging to each path is given a value of 1, the off-path cells are given 0. The rasters have the same resolution and spatial extent.
I wanted to quantify if and to what extent they can be considered correlated, that is how "strong" is the overlap between them. I focused on the Jaccard coefficient (e.g., http://people.revoledu.com/kardi/tutorial/Similarity/Jaccard.html). 
This coefficient is equal to: the INTERSECTION between the two rasters divided by the UNION between the two rasters.
Now, in terms of this specific example, the INTERSECTION is the number of  only those cells that the two rasters have in common (i.e., the number of overlapping path cells). The UNION is total number of path cells (belonging to either of the two rasters).
In ArcGIS, we can use RASTER CALCULATOR to compute the INTERSECTION and the UNION.
To get the INTERSECTION, we just feed the following formula into RASTER CALCULATOR: "RASTER A" & "RASTER B" (where Raster A and Raster B is the name of the two rasters being analysed).
The same for UNION: "RASTER A" | "RASTER B"
Once we have obtained two new output rasters, to get the Jaccard coefficient, we simply open the attribute table of the two rasters, and take note of the cell count that has value equal to 1, dividing them accordingly (rememeber: INTERSECTION divided by UNION).
In my case, the count of cell with value 1 in the INTERSECTION raster is 22,822, while in the UNION raster is 37,716. The Jaccard coefficient turns out to be about 0.61
I hope this quite long reply will be useful to anyone that will jump here in the future.
A similar approach (in Matlab) is provided here: http://kawahara.ca/matlab-jaccard-similarity-coefficient-between-images/
  • asked a question related to Spatial Autocorrelation
Question
5 answers
Hello,
I would like to ask about any method for testing spatial dependence on categorical data (e.g. vegetation types polygons) and tools for modeling it against environmental data.
Is Multinomial Regression suitable for this task?
Thanks!
Relevant answer
Answer
Hi!
You can perform spatial analysis on categorical data by using join count statistics.
A quick overview of the method can be found here: http://www.gitta.info/DiscrSpatVari/en/html/spat_depend_join_ct_stat.html
Mathematical Functions of the Method: http://www.people.fas.harvard.edu/~zhukov/Spatial2.pdf
Hope this helped :)
Best,
Karolina
  • asked a question related to Spatial Autocorrelation
Question
1 answer
I'm working up to fitting a Tweedie GLMM in R (mgcv). At the moment I'm playing in SPSS to sort out the data and some ideas for random effects. The data are from a survey of coastal dolphins on a sample of sites (Site) around the coast, each measured on a set of transects. The data are the number of groups (NGroups), the group sizes (GSize), and number of individuals (NIndividuals). NIndividuals is the sum of a Poisson number (Ngroups) of Gamma (or Negative Binomial) distributions (Group sizes). Given transect length and the search width, it is possible to calculate the density of sightings for each transect (SightDens). So far I've fitted a Poisson regression to NGroups with a random effect for Site using SPSS Genlinmixed. Now I'd like to see whether there is any spatial autocorrelation among sites. As the sites are strung out around the coast, this can be treated as a purely linear problem based on site order (SiteOrder). I'd like to try to fit a simple AR1 structure to the Site covariance matrix based on SiteOrder. Can anyone help with this?
Relevant answer
Answer
  • asked a question related to Spatial Autocorrelation
Question
7 answers
I am using this method for interpreting the proportion of each predictor variable in a disease occurrence which contains 3 outliers, then I cannot use multiple linear regression, now I want to know that what are the assumptions of Probit-Logit model.
Relevant answer
Answer
Parallel regression/ proportional odds assumption: all coefficients on the predictors/independent variables are equal for every category of the outcome. Hence, the slopes of the estimated equations are identical. Brant's test is used to test this assumption.
  • asked a question related to Spatial Autocorrelation
Question
6 answers
Using GeoDa software, I have run a spatial error model and obtained coefficients for each of my independent variables, in addition to constant and lambda variables. I am now wondering how to write my spatial error equation. Any help would be appreciated.
Relevant answer
Answer
Hi Mahyar,
Have you tried using the actual GWR4 software?
free downlload from this:
This software is developed by Fotheringham's own team. In ArcGIS the GWR output is mainly in graphical presentation (which is of course very helpful) but missing out on the actual model estimates. With the actual GWR estimates, I think (if i understand your problem correctly)  you will not have any problem in doing those analysis, because basically it is basically just a regression (it just that it is done throughout the space continuum).
You might want to try it out.  
I hope it helps
Ari
  • asked a question related to Spatial Autocorrelation
Question
7 answers
My purpose is to map hotspots using LISA tools (or in general local statistics). By developing GIS application using available spatial statistic libraries (i.e. PySAL - ESDA), I have still not found documented methods (or suitable literature) to include in the computation more than two variables.
Have you any suggestions on how to map simultaneously hotspot or significant spatial clusters of more than two variables?
In the following link one of the papers (Anselin et al. 2002) I have considered as reference
Relevant answer
Answer
To maps hotspots considering more than two variables, you could to use Antonio Velázquez, Luis Manuel Martínez and Fátima Maciel Carrillo 'Caracterización climática para la región de Bahía de Banderas mediante el sistema de Köppen, modifiado por García, y técnicas de sistemas de información geográfia' method. Miquel Ninyerola, Xavier Pons and Joan M. Roure ' A methodological approach of climatological modelling of air temperature and precipitation through GIS techniques' method.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
or
How do I relate Housing Submarket with Environmental Characteristics?
in Real Estate/ Housing fields
Relevant answer
Answer
There are different ways to view the world. You have identified two of them. Suppose that you model environmental characteristics correctly. For example, you include variables for the proximity to amenities and disamenities, then the degree of spatial autocorrelation will be greatly diminished. Of course, there will always be questions as to how these "environmental" variables should enter into your model. But that is another question.
You might want to take a look at what is perhaps a middle ground. In the papers below, you will see a non-parametric approach to modeling space. The reader should realize that, in addition to this approach, it is possible to insert relatively standard spatial variable such as the distance to the urban center or the shore, etc.
  • asked a question related to Spatial Autocorrelation
Question
11 answers
I have occurrence data (lat/long) and environmental layers (i.e., raster layers). I am doing SDM using Maxent algorithm. How can I deal with the issue of spatial autocorrelation, and (may be) remove the points that are spatially autocorrelated?
Relevant answer
Answer
G'day Kapil,
Hein is correct to some extent that MaxEnt does introduce spatial auto-correlation by the back door due to the interactions of many of the training layers used. Appropriate selection of your training range and of your forcing factors can go a long way to removing these autocorrelations, however, in that you want to construct your MaxEnt model based upon the most parsimonious forcing factors (i.e. those that are least correlated with each other and are the most powerful contributors to your projection).
The other thing to be aware of here, however, is that MaxEnt doesn't work in geographical space. It works in niche space, which is an entirely different context, is n-dimensional and, to some extent at least, without tangible scale. As long as your training layers are defined broadly enough (in niche space) to allow for the generation of legitimate pseudoabsences, but not so broadly that your pseudoabsences represent the entire variability of your projection layer, you'll have a decent correlative model to project independently of the spatial correlations of your training points. Hence spatial autocorrelations may or may not be critical to the parameterising of your model. A better approach than trying to eliminate these points would be to apply the ExDet tool (see reference below) to interrogate your models after the fact, and re-iteratively refine them. Firstly this will help you identify your correlated forcing factors, secondly Mohsen and co go into some detail on the subject of appropriate background, and finally it will give you not just the Type I divergence (analogous to MESS maps), but also the Type II divergences, where correlations between your training factors have broken down.
Mesgaran, M.B. et al. (2014) Here be dragons: a tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models. Diversity and Distributions DOI: 10.1111/ddi.12209.
I hope that's helpful. MaxEnt is a really powerful tool for correlative SDM projection, but it is critical to understand that the "distribution" that it projects is actually the realised niche. It is not constructed on spatial terms at all, and it does not represent the full extent of suitable habitat for a species under changing conditions, or even current conditions. It merely identifies other areas that have similar characteristics to the present realised niche.
Cheers
Sean
  • asked a question related to Spatial Autocorrelation
Question
6 answers
What is the relationship between Spatial Dependence and Housing Submarket? How can spatial autocorrelation explains Housing Submarket?
Relevant answer
Answer
You may be interested in this paper which implicitly models spatial dependence via a multilevel model:
The multilevel approach sees houses as nested within submarkets however defined and models  between and within submarket differentials simultaneously. A large variance at the higher level suggest greater similarity within that is greater dependence. 
The basic model is more recently elaborated here
and this extension explicitly includes distance - based spatial dependence  within the submarkets
This training guide takes house prices in  neighborhoods as an example throughout
volume 2 considers explicit spatial models in the final chapter
  • asked a question related to Spatial Autocorrelation
Question
8 answers
Hi everyone, I'm going to use one of these two indicators in order to investigate the spatial pattern of my ward wise disease data. Which one is more appropriate and why? I need some reliable references to help me for performing them into my study.
Relevant answer
Answer
The assumptions behind both stats are that your data is continuous (real numbers) and normally distributed in the study area.
If your research question is about measuring the similarity of nearby features you should use Moran's I. The measure only indicates that similar values occur together. It does not indicate whether any cluster is composed of high or low values.
General G statistic can be used to indicate whether high or low values are concentrated over the study area.
Hence, if you wish to find out whether your data is clustered in general (auto correlated) use Moran's I. If you want to know more specifically whether or not there are clusters of high/low values use G stat.
  • asked a question related to Spatial Autocorrelation
Question
3 answers
i have an a-apriori knowledge about the autocorrelation function r(h) , and the more r is near 1 or -1, there is a significant spatial correlation, and also when the lag distance increase , the r(h) will diminish, but i want more details from expert in spatial analysis.
Kind regards Louadj yacine
Relevant answer
Answer
There is a difference between the normal statistical autocorreation function and spatial autocorrelation. In spatial autocorrrelation the relationship measured is essentially how many, or how few, points of the same attributes as a point in space are arranged at different lag distances around that point.
It is a rather large subject to explain in a short article, so I would like to refer you to the Geoda website over here: https://geodacenter.asu.edu/software/downloads where you can download a software that is excellent for displaying spatial autocorrelation. Also download the the Geoda Workbook here http://geodacenter.asu.edu/system/files/geodaworkbook.pdf to see how the software works. The section on spatial autocorrelation is on page 124 See an explanatory article here https://en.wikipedia.org/wiki/GeoDA
The theory of spatial autocorrelation is very well explained in the Geospatial Analysis online textbook by de Smith, Goodchid and Longley. It is available here: http://www.spatialanalysisonline.com/HTML/index.html in chapter 5.5.
  • asked a question related to Spatial Autocorrelation
Question
4 answers
I have 25 data points which represents woody plants richness. Each one summarizes the value of this variable for the same number of 2mx2m quadrats. Field design was 50mx2m vegetation transects.
Is it meaningful to explore spatial autocorrelation using with Moran's I and variograms?
Relevant answer
Answer
There are quite a number of papers that have done that.  One of the links below discusses the different sampling designs.
  • asked a question related to Spatial Autocorrelation
Question
10 answers
What method I can use to evaluate the concentration of spatial data, considering the weight, position and distance between data?.
Relevant answer
Answer
You can look into alpha-shapes, these are essentially non-convex polygons with a specific radius. Another measure you can look into is the D0 dimension, you cover the region with boxes of specific size and see how the number of non-empty boxes scales with the box size. This is called the capacity dimension of a spatial point distribution.
  • asked a question related to Spatial Autocorrelation
Question
2 answers
I am trying to determine the serial autocorrelation in my precipitation data. I have data for 30 years (1986-2015) and I am using the "ZYP" package in R and the Zhang method described in: Zhang, X., Vincent, L.A., Hogg,W.D. and Niitsoo, A., 2000. Temperature and Precipitation Trends
in Canada during the 20th Century. Atmosphere-Ocean 38(3): 395-429.
My code is simple
setwd("C:/Users/sch298/Documents/Kentucky River weather data/Daily homogenization")
y=read.csv("P-GHCNDUSC00150624.csv",as.is=TRUE)[, 2]
zyp.trend.vector(y, x=1:length(y), "zhang", T, T)
It ran fine but all the trend estimates are zero including the lower and upper bounds. I am sure something is wrong. I have attached my data file and output screenshot. Can anyone please help me to understand what might be wrong?
Thanks
Som
Relevant answer
Answer
You can just simply use ACF function in R to detect correlation. The Zhang method may possibly been influenced by decomposition theory. For trend detection, i suggest you to use modified Mann-Kendall test for correlated data.
  • asked a question related to Spatial Autocorrelation
Question
8 answers
Dear all, I would like run spatial autocorrelation analysis with my data in R (or other software such as Minitab, Past or Python). My data comprise 100 1m2 plots with control paired plots 1m far away treatment. In all plots I measured plant cover and I want to measure species co-ocorrence in each plot. All plots are georeferenced with lat and long in degree, minutes and seconds. I want know if had autocorrelation in my sampling. Can someone help me?
Best wishes,
Jhonny
Relevant answer
Answer
Hi Jhonnhy.
You can fit linear models with correlation structures for the error using package nlme (https://cran.r-project.org/web/packages/nlme/nlme.pdf). There is an argument `correlation` in the `lme` function to model spatial correlation. Also the function `Variogram` is used to compute the semi-variogram. Argument `form = ~ x + y` represents a two-dimensional position vector with coordinates x and y, which I think is your case.
  • asked a question related to Spatial Autocorrelation
Question
8 answers
Dear colleagues,
I have a dataset consisting of continuous, categorical, and binomial (presence/absence) variables, and I need to test for spatial autocorrelation in each one of them. I have done it already for the continuous variables using the standard Moran's I (Moran.I function of the ape R-package), but I understand I cannot do this for my other variables. What other R alternatives do I have to test for spatial autocorrelation in categorical and binomial (presence/absence) variables?
Thanks in advance!
Relevant answer
Answer
  • asked a question related to Spatial Autocorrelation
Question
9 answers
I am trying to test my preliminary MaxEnt model residuals for spatial autocorrelation in order to decide whether I need to rarefy my presence points. Since the data came from multiple sources (mostly literature and museum records), it is obvious that these records were gathered through different sampling effort and somewhat clustered in space. From the other hand, the species is rarely collected throughout its range (I have only about 300 points) and I would like to retain for modeling purpose as many records as I can.  The method I use follows Nunez & Medley, 2011, who propose calculating Moran's I at multiple distance classes with SAM software. However, I am having troubles with interpretation of the results. Attached you can find the correlogram I am getting in SAM. From what I can see in the results, the first and the highest Moran's I value = 0.475 (p=0.005) is at distance of 64.007 and the rest has much lower values, with most of them being negative. Does that mean that my presence points can be filtered at minimum distance of 65 km or they are OK and show only weak SAC? I read somewhere that only significant values greater than +-0.5 or +-0.7 can indicate serious spatial pattern. Am I getting it right?
Another related question is: why the correlogram shows Distance units instead of km? I am choosing the Geodesic coordinate system when loading my data but the program keeps showing the distance in units...
I am absolutely newbie to SDM and spatial statistics, so any help/tips will be greatly appreciated!
Kind regards,
Serge   
Relevant answer
Answer
Hi Serhii,
based on screenshots you posted, I don't think you are doing something wrong in your settings. Distances in the correlogram are really in kilometres. You just indicated the form of input (coordinates) in the SAM settings - meaning you have a geodesic coordinate system (probably WGS84) and you know what are X(long) and Y(lat) axes. But it doesn't necessarily mean that calculated distances among points are in decimal degrees as well because they can be converted to metres/kilometres.  Moreover, from the practical point of view, it is much easier to understand distances in m/km rather than in decimal degrees (and SAM authors obviously knew it).
The are you are studying seem to be pretty big (assuming that your coordinates are in WGS84) and you have just around 300 samples. As noted by Philipp, the Moran's I in the correlogram is significant if the red dot (or red line) is outside the interval of confidence. So, your correlogram is might be indicating that records tend to cluster up to +/- 200 km (mutual population?) while it is rather dispersed from the distance of +/- 900 km (different populations?). But I agree with Glen that you may want to adjust your data based on the sources or try also other methods of  spatial patterns analyses (please see the link below - although it uses R, there is a comprehensive overview of methods and its interpretations). You might also want to compute more detailed correlogram.
Regards
Lukas
  • asked a question related to Spatial Autocorrelation
Question
13 answers
I am wondering if anyone here has ever dealt with spatial autocorrelation using Logistic Regression in GIS.
In the literature I have read so far, sometimes the issue is not even addressed. In other instances, the authors used the geographic coordinates as covariates. For example, quoting from Hu, Z., & Lo, C. P. (2007). Modeling urban growth in Atlanta using logistic regression. Computers, Environment and Urban Systems, 31, 667–688: "The second step was including spatial coordinates of data points into the list of independent variables. Spatial autocorrelation can be alleviated to some extent by attempting to introduce location into the link function to remove any such effects present (Bailey & Gatrell, 1995). For example, spatial coordinates of observations might be introduced as additional covariates, or to classify regions in terms of their broad location and treat this classification as an extra categorical explanatory factor in the model."
At the best of my understanding, the latter approach is termed "autocovariate" modeling by: F. Dormann, C., M. McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., … Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography, 30(5), 609–628. http://doi.org/10.1111/j.2007.0906-7590.05171.x
I would like to know your opinion on the issue, and what approach you happened to use.
Relevant answer
Answer
Coordinates...PLUS the previous sampling strategy, do not forget!!!
As for the residuals: what do you mean? The residuals of LR are...residuals. They are continuous. I have read a number of articles in which the (logistic) model residuals were tested for spatial autocorrelation. If you want, tomorrow I could try to locate those references.
UPDATE:
for instance, in the following publication the authors use Moran's I to test for auto-correlation in (auto)LR model residuals:
Analyzing spatial autocorrelation in species distributions
using Gaussian and logit models (doi:10.1016/j.ecolmodel.2007.04.024)
  • asked a question related to Spatial Autocorrelation
Question
2 answers
I need to test for spatial autocorrelation in my data. In R, I used the Moran.I function from the ape package, and the moran.test function from the spdep package, but got strickingly different results. I think this may be due do the weights employed to calculate Moran's I: While in ape your weights are given by your distance matrix, in spdep you must specify spatial weights from a neighbors (nb) object, choosing a given style (binary, standardised, ect). Could anyone please clarify this and perhaps suggest which package is best?
Relevant answer
Answer
Different weights will give different results - think of a chessboard - bishop's case weights will give positive autocorrelation lots of black- black joins and lots of white-white joins. A rook (castle) weighting system would show negative autocorrelation lots of black - white joins and a Queens's case would show no autocorrelation. Same underlying pattern on the board but very different answers. This is a good thing as you can set the weights to evaluate different theoretical schemes.
In this classic paper Peter Haggett evaluated 7 alternative weights structure and showed that an epidemic was dominated by hierarchical spread at the beginning and local contagion later
 Peter Haggett (1976) Hybridizing Alternative Models of an Epidemic Diffusion Process, Economic Geography, Vol. 52, No. 2, Human Health Problems: Spatial Perspectives (Apr., 1976), pp. 136-146
You may be also interested in this guide by a colleague - free to download
  • asked a question related to Spatial Autocorrelation
Question
7 answers
 DEAR MEMBERS
The Durbin-Watson stat is not valid as an indicator of autocorrelation when there is a dependent lagged variable on the right side of the equation.
is it true that the DW stat is not valid for panel data in any case??
what about Rsquare in panel data..what shoud be the value for good model ? if R square is 0.009 R square=.34 ,or Rsquare =.45 ?? not good fit, is it necessary to be 0.50 at least?
Relevant answer
Answer
Durbin Watson is not a good test for autocorrelation in any case. It became pooular because it was the first for which a distribution could be calculated without specifying the matrix of regressors. Much better tests are now available - the problem of distribution can now easily be solved using computer based simulations which were not available at the time DUrbin watson invented their test. This same issue holds for the R-squared -- one can test for significant difference from 0 -- this is what matters not the size of the number directly. See Chapter 6 of my book: Statistical Foundations for Econometric Techniques for discussion of DW. See Section 3 of Chapter 9 for a discussion of the R-squared. Electronic copy of book is available from rwer.wordpress.com
  • asked a question related to Spatial Autocorrelation
Question
3 answers
Dear all,
             Im working in a field experiment with 20 traps (pitfall traps to collect ground atrhopods) in a treatment field and 20 traps in a reference field (so potentially spatially autocorrelated).
I performed a nMDS (non param multidimensional scaling) plot to assess multivariate ordination of those samples and I plotted also 95% confidence ellipses to visualize effective discrimination between the treatment and the reference field. Then I would like to have a statistical measure of this discrimination so my idea was to perform a perMANOVA (adonis function in R software) to test dissimilarity between fields. So my question is: 
-Can I use perMANOVA with such experimental design? If not, is there a way to deal with such autocorrelation? Suggestion on alternatives?
Thanks a lot
Alessandro
Relevant answer
Answer
Thanks Remi, I will try to make a Mantel test comparing animal abundance distance matrix with the location (of the traps in the field) matrix. I should solve the problem in this way. Thanks a lot