Science topics: Geoinformatics (GIS)Spatial Autocorrelation
Science topic
Spatial Autocorrelation - Science topic
Explore the latest questions and answers in Spatial Autocorrelation, and find Spatial Autocorrelation experts.
Questions related to Spatial Autocorrelation
I'm working on a data set consisting of data about geographical districts. Some of the variables are continuous for which I manage to calculate global and local autocorrelation using R package spdep. The data set also has key variables in 9-level ordinal scale. I'm a confused about how to calculate autocorrelation for such data. I have understood that at least for binary nominal data (such as presence/absence) joint count -statistic would be a fitting method, but can it be extented to ordinal or multilevel nominal data?
Any help is greatly appreciated!
Geospatial artificial intelligence (GeoAI) is not just a tool but a potential revolution in geospatial data analysis. It is rapidly becoming a powerful force, providing unprecedented insights into complex environmental and societal challenges. From mapping and modeling land cover changes to mapping flood-risk areas, GeoAI is set to transform geospatial decision-making, harnessing the power of machine learning (ML) and deep learning (DL) for more effective solutions.
GeoAI models can more efficiently analyze massive geospatial datasets such as high-resolution satellite imagery than traditional methods. This allows us to uncover patterns and trends that would be difficult to detect manually. It also helps automate complex tasks such as feature engineering and improve predictions. GeoAI is also valuable for solving real-world problems such as urban planning, forest conservation, disaster risk management, and climate change adaptation.
However, despite these benefits, GeoAI has its challenges. The complexity of ML and DL models often results in a lack of transparency due to their "black box" nature. This makes it difficult for users to trust the results. Data quality can significantly impact the performance of GeoAI models, leading to potential biases or inaccuracies in predictions. Furthermore, spatial autocorrelation and spatial heterogeneity are not readily incorporated into GeoAI, limiting its ability to capture the underlying spatial dynamics of geospatial data fully. Interpreting these results requires specialized knowledge, which may limit the accessibility of GeoAI for broader audiences.
Please share your experiences or thoughts on how we can effectively balance the benefits of GeoAI with the need for transparency and trust in geospatial data analysis.
Are you interested in learning more? You can download the ebook GeoAI Unveiled: Case Studies in Explainable GeoAI for Environmental Modeling here: https://aigeolabs.com/books/geoai/.
For my master thesis, I am working on Mobile Laser Scanner data which my duty is Extraction of Powerlines Poles. My data is about 10 Kilometers long and has approximately 60 the powerline poles. Fortunately, my algorithm has extracted 58 the poles correctly and two others poles were not completely extracted by Mobile Laser Scanner system which caused proposed algorithm can not extract them. The proposed algorithm is completely automatic and does not need many parameters for extraction.
My main question is that which circumstances do need my implementation to be published in a good ISI journal?
Hello all,
I am trying to learn how to conduct a Moran's I test in R for my 4 species distribution models generated in MaxEnt. I want to be able to show that my four models hopefully show little spatial autocorrelation and do not need to be redone.
I have found lots of people discussing the packages and functions used to complete this task but no scripts that are useful to learn from. I would like to understand the meanings behind the code and how it works. I was wondering if anyone had any tips or R-scripts that would help me?
Any direct help/useful information would be greatly appreciated.
Kind regards,
William
I'm wondering if you should differ between presence data of highly mobile species such as Raptors and immobile species (e.g. Plants). The dispersal of plants is limited to a certain distance, so the occurance might be clustered because of that. Birds on the other hand should be able to search for suitable nesting sites. If nesting sites are close together, could that be an indicator for great suitability?
Thanks, Tim
A Moran I (spatial autocorrelation) has been prepared in Arc Map 10.4 and GeoDa for comparison. Please find attached those results for your valuable input.




Both Ripley's k-function and Moran's index measure the statistically significant clustering within data. However, how to know, which method is performing better for our data?
What are the advantages and disadvantages of each method which can help to choose a better method?
Trying to investigate whether the Autoregressive behaviour is as a result of Lags (SER) or errors (SEM).
Diagnistic check on the weights matrix
The output is as below
spatdiag, weights(W)
Diagnostic tests for spatial dependence in OLS regression
Fitted model
------------------------------------------------------------
logHPRICE = SPOOL + BQUARTER + WFENCE + PSIZE + NROOMS
------------------------------------------------------------
Weights matrix
------------------------------------------------------------
Name: W
Type: Distance-based (inverse distance)
Distance band: 0.0 < d <= 10.0
Row-standardized: Yes
------------------------------------------------------------
Diagnostics
------------------------------------------------------------
Test | Statistic df p-value
-------------------------------+----------------------------
Spatial error: |
Moran's I | -11.468 1 2.000
Lagrange multiplier | 55.698 1 0.000
Robust Lagrange multiplier | 55.302 1 0.000
|
Spatial lag: |
Lagrange multiplier | 2.544 1 0.111
Robust Lagrange multiplier | 2.148 1 0.143
------------------------------------------------------------
When I looked up about the above error, certain answers suggested using memory.limit() to expand the memory, but the code is no longer supported in R. How to resolve this vector allocation issue?
I am seeking suggestion on choosing the right projection for map shapefile data considering that the area extends beyond 30 degrees and thereby has an error message during Global Moran I for spatial autocorrelation using ArcMap 10.3. Thanks in advance
I am studying the effect of the land-use surrounding a location on the abundance of aphids on this location. To do this I fit a linear model with the land-use as independent variable and the abundance of aphids as the dependent variable. To check for spatial autocorrelation I plot the correlogram with the Moran I of the model residuals in function of the lag distance.
However I have multiple years of data: where the aphids have been observed each year together with the surrounding land-use. How can I account for this temporal effect? Should I incorporate a 'Year' variable in the linear model and can I then just look at the correlogram of the whole dataset?
Thanks in advance.
I have 39 datasets of georeferenced disease severity data for which I would like to conduct a spatial analysis. As a part of this analysis, I would like to compare the amount of spatial autocorrelation present in each dataset.
For disease incidence data (count-based data), there is the SADIE procedure, which is widely used for this kind of task. In contrast, for disease severity data (continuous data), I am not aware of a statistic that can be or is used in that kind of way. The most popular statistic, Moran’s I, seems to be solely used in an inferential kind of way (presence or absence of spatial autocorrelation).
I am aware that the spatial weights matrix used for calculation of Moran’s I complicates the comparison between datasets. But, given a somewhat constant spatial weights matrix between datasets (for example Inverse Distance Weighted?), wouldn’t it be possible to compare the results? In addition, this GeoDa video https://www.youtube.com/watch?v=_J_bmWmOF3I seems to indicate that a comparison based on standardized z-values is in principle possible. Nevertheless, I am not aware of a published study in which this kind of analysis was carried out.
Therefore I would like to ask: Does anyone know of such studies? Or maybe of another statistic that would be better suited for this kind of purpose?
Any suggestions would be greatly appreciated.
Regards,
Marco
I want to calculate the slope direction for each polygon in GIS and use spatial autocorrelation analysis to find out if adjacent polygons have similar slope directions between them. I can use the spatial autocorrelation analysis tool, but the slope direction is a circular statistic, so I cannot easily use it. If anyone has experience in approaching this, I would like to know how to do it. Thank you in advance.
Hi esteemed scholars,
I will appreciate constructive any guide/resources on how to resolve issues related to autocorrelation in a gravity model.
Thank you.
Ngozi
Hi,
I am checking for spatial autocorrelation in my dataset. It comprises the ID of the nests, the longitude and latitude for each of the nest boxes and the number of fledged chicks for each nest box. I want to know if reproductive success is spatially autocorrelated in our bird colony.
For this, I computed the distance matrix for nest boxes to know the distance between each nest box and the rest of nest boxes. Following this, I designed distance bands (distance lags) to calculate Moran's I for each lag specifically. As I have multiple data for several years (2014-2020), I wonder if there is any way to get a mean Moran's Index of all the years, instead of calculating an index for each year.
It is my first time doing these types of analysis so any advice would be very much appreciated!!
Thank you.
Iraida
I am very interested in the application of seismic noise data on the earth scale, and I have obtained two data sets. I want to use the spatial autocorrelation method (SPAC) to do some experiments, but I have no experience, and I am not clear about the processing parameters such as the applicable frequency bands.
One of my data sets is linearly distributed, and the other is nested triangles. The geometric distribution is as shown in the figure, and the coordinates are latitude and longitude.
The tool I have is mainly geopsy, and I am not very skilled. Looking forward to your guidance or other tools.


Dear all,
I know it might depend also in the distribution / behavior of the variable that we are studying. The sample spacing must be able to capture the spatial dependence .
But, since Kriging is very much dependent in the computed variance within lag distance, if we have few number of observations we might fail to capture the spatial dependence because we would have few pairs of points within a specific lag distance. We would also have few number of lags. Specially, when we have points with a very irregular distribution across the study area, with a lot of observation in a specific region and sparce observations in other region, this will also will affect the estimation of computed variance among lag (different accuracy).
Therefore, I think in such circumstances computing semivariogram seems useless. What is the best practices if iwe still want to use kriging instead of other interpolation methods?
Thank you in advance
PS
I have a set of data collected as part of a hydroacoustic survey-- essentially a boat drove back and forth over a harbour and took a snapshot of the fish biomass/density underneath the boat every 5 minutes using a sonar-like device. I was worried that all of these snap-shots could be considered pseudoreplicates in that they wouldn't be independent of each other-- i.e. fish sampled at time X could be resampled at time X+1 if they happened to move with the boat. To correct for this I performed a test of spatial independence using a Moran's I test, which came back as non-significant. I also compared the delta AICs of models that included a spatial correction and the basic model with no spatial correction, and the basic model had a lower score. Does this mean that I can consider my samples collected via the hydroacoustic survey as being indpendent from one another and proceed with non-spatial corrected analyses?
Hi, I use xsmle command and db table (e.g spregsemxt) function in STATA to estimate spatial panel data. My problems are:
1. If I use xsmle command, the output given are without information on log likelihood function, moran's I and all the model selection diagnostics criteria. But we enable to specify the time specific effect, random effects and spatial fixed effect in the estimation.
2. If I use db table (e.g spregsemxt), the output complete with statistics and model selection diagnostics, BUT we can not specify the time specific effect, random effects and spatial fixed effect in the estimation.
My question is, what is the best/correct STATA command to get output for likelihood fucntion, moran's I, model selection diagnostics criteria with at the same time we can specify the model either time specific effect, random effects and spatial fixed effect.
Is there any complete reference on how analyse spatial panel data include model estimation and diagnostics?
Any suggestions are welcome and appreciated. Thanks.
How can I get more information abut spatial panel data in STATA (command xsmle)?
i need Example
I used Generalized Estimating Equations (geepack in R) on my regularly monitored (twice a year) plant abundance data in grid cells to evaluate their trends of density.
Since these are time-series data, GEE was an ideal non-parametric method to implement, because it takes the pairwise correlations between time points (correlation structure) in account.
But my data and the residuals of GEE are spatially correlated as well.
I found the spind package for R, which handles spatial auto-correlated data in GEE, but it seems that this function only deals with spatial autocorrelation, and the available correlation structures are very restricted (e.g. cannot choose "ar1" anymore, or use my pre-calculated correlation matrix as a "fixed" ).
So GEE can only handle temporal OR spatial autocorrelation but not BOTH?
Did I get it wrong?
Is there any way to solve this in GEE?
Thank you for any idea!
I am working to improve a manuscript and I have been advised to provide a map showing where the correlation is significant. Any information or links to learn about this would be quite helpful. Thank you in anticipation!
I have spatially distributed land abandonment data (binary), which I want to relate to independent variables like weather, soil characteristics etc (also spatially explicit). What are the programs in stata for performing either autoregressive probit or autologistic regression?
Thank you and regards
Rui
Dear All,
I used my taper data to fit a variable-form taper model Kozak 2004-2 ,which is a nonlinear model. The data is longitudinal data that is irregularly spaced and unbalanced.so we need to overcome the inherent autocorrelation by using continuous-time autoregressive error structure CAR().I read some papers in which the authors use SAS /ETS to fit the models.Take A.Rojo2005 ,for example.In A.Rojo(2005),the author incorporated CAR(2) error process into the models to minimize the effect of autocorrelation inherent in the logitudinal data.I did like what Rojo said in the paper.When I add CAR(1) to the model, I can get the result of autoregressive parameter ρ1 .But when I add CAR(2),It is difficult to converge for ρ2.
Could someone can help me to incorporate CAR(2) into Kozak2004-2?
I add the paper A.Rojo(2005) .Thank you very much.
Here are my SAS codes
proc import out=work.taper
datafile='E:/zzs7.csv' dbms=csv replace; getnames=yes;
RUN; /*read data */
data fit_taper;set taper;
if p="f" then output fit_taper;
run;/*Select data for fitting*/
PROC model data=fit_taper method=marquardt sur dw collin;
exogenous bolt tht dbh;
endogenous dob ;
parms b0 0.9884 b1 0.9478 b2 0.0735 b3 0.4884 b4 -0.9783 b5 0.5511 b6 0.1 b7 0.0389 b8 -0.1579 p1 0.8 ;/*start ualue*/
dob=b0*(dbh**b1)*(tht**b2)*((1-(bolt/tht)**(1/3))/(1-(1.3/tht)
**(1/3)))**(b3*(bolt/tht)**4+b4*(1/exp(dbh/tht))
+b5*((1-(bolt/tht)**(1/3))/(1-(1.3/tht)**(1/3)))**0.1
+b6*(1/dbh)+b7*tht**(1-(bolt/tht)**(1/3))
+b8*((1-(bolt/tht)**(1/3))/(1-(1.3/tht)**(1/3)))); /*Kozak2004-2*/
fit dob ;
run;
After exploring my dataset for Ph.D. thesis and learning several spatial econometric techniques, I successfully applied ordinary least squares (OLS), logistic regression, Spatial Autoregressive models [i.e., Spatial Lag model(SLM), Spatial Error Model(SEM), Spatial Durbin Model(SDM)], and most importantly Geographically Weighted Regression (GWR), and Geographically Weighted Logistic Regression models to find evidence of spatial and socioeconomic inequality in flood risk. The performance of all regression models was significantly improved when I accounted for spatial heterogeneity at the local level compared to non-spatial global models such as OLS and logistic regression.
I am amazed that several research papers were published so far in high-rank journals based on global regression results only, which I could have done a couple of months ago. The results do not make sense because the nature of the spatial heterogeneity could prevail in flood exposure. In my view, flood exposure and/ effects of flood risk cannot be locally independent by census tracts or dissemination areas or census subdivisions; they must be spatially autocorrelated. There remain ripple effects, spillover effects or indirect effects to adjacent neighbourhoods and to the overall economy. Populations from affected or flooded neighbourhoods could move to nearby safer neighbourhoods, looking for jobs and safe accommodation. Many other indirect socio-demographic effects could prevail around the flooded neighbourhoods. Do you agree? Please, justify your response.
I know that this is a fact but i need evidence in the form of a scientific paper.
I looked it up in books, google scholar and so on but never found something reliable.
Maybe someone has something in mind and can help me out.
Thank you!
Hi, I want to compare three methods of spatial analysis and examine their application in crash/accident analysis. These three methods are: Cluster/Otlier Analysis by Moran's I statistic, Hot Spot Analysis by Getis-Ord & Kernel density estimation.
What do you think are the features of these methods? What are the differences between them? (For crash/accident analysis).
Hey everyone, I hope someone can help me. Please!
I've carried out a spatial PCA using the adegenet package in R, following Dr. Jombart's tutorial (i.e. NAs in data replaced to mean allele frequency, etc.). My problem is with the interpretation of the variance explained by each component... Obviously it's not like a regular PCA where you need all the components together in order to explain 100% of the variance in data.
Here in sPCA it's easy to see (in the screeplot for example) that combining just a couple of principal components exceeds 100% of the variance.
Showing the summary for the sPCA:
[Call: spca.genind(obj = mi_genind, xy = mi_genind$other$xy, cn = data.graph,
scannf = FALSE, nfposi = 2, nfnega = 0)]
Scores from the centred PCA
_________var___________cum___________ratio____________moran
Axis 1___1.184406_____1.184406_____0.07550004____0.3562353
Axis 2___1.022800_____2.207206_____0.14069851____0.1799373
sPCA eigenvalues decomposition:
___________eig_______________var_______________moran
Axis 1_____0.15675044______1.0088656_______0.6214918
Axis 2_____0.08220275______0.7455009_______0.4410605
###################################################
So I want to have some sort of idea whether this analysis is meaningful to explain the pattern in variability. As Jombart says in the tutorial: "The maximum attainable variance by a linear combination of alleles is the one from an ordinary PCA, indicated by the vertical dashed line on the right [of the screeplot]". I could take that value as my 100% variance and calculate the percentage explained by my Axis 1 on the sPCA... but I'm still confused because doing this to just a couple of principal components and then combining them would exceed 100% of variance explained.
Thanks for any help you can give me!

Does anyone have an example to share on a statistical model that incorporates temporal and spatial autocorrelation terms simultaneously?
Examples in ecology and hydrology research would be optimal. Thanks in advance.
I am working on regression modeling where geographical units (zones) are the observations. I found there is significant spatial autocorrelation in the response variable (based on Moran's I). However, when I develop an ordinary least squares regression model, the model residuals do not show significant spatial autocorrelation (based on Moran's I). In this case, do I need to go for a spatial regression model?
I'm working with a mechanistic model to predict mosquito abundance with temperature input. I also have a raster with land-use classification for the same area and I now want to know how land-use dictates mosquito abundance (as land-use influences micro-climate). I think I should correct for spatial autocorrelation, but am struggling with how to do that best. One of my ideas is to find out up to what lag distance autocorrelation is present/significant and then do a ANOVA+post-hoc only on cells sampled x lag distance apart.
However, I'm confused about the different approaches amongst different functions/packages in R. On the one hand I tried lm.morantest, which computes Moran's I over the residuals of a linear model (in my case mosquito vs land-use). On the other hand I came across the raster package function , which calculates Moran's I just over the variable values (so either autocorrelation within the mosquito raster or in the land-use raster). Also sp.correlogram from the spdep package takes a variable vector as input instead of a linear model. The latter two functions give almost double the amount of autocorrelation compared to the first method.
So, I think I understand why you would check for autocorrelation in the residuals, but why in the variable input? Part of the correlation you're finding with the latter method might already be explained by your other variables as you're doing for example in linear regression? Are there options for doing something like a correlogram but with model residuals instead of variable input?
I am trying to find the best way to calculate energy poverty/ consumption on spatial basis using arcmap, so, what type of data do I need? and what is the best methods to do so?
Please, any suggestions of publications also can help!
Hello! I am lookinng to use a large brown bear telemetry dataset to create a standard distribution model using MaxEnt and Wallace (R package). I currently have over 50,000 GPS points from 17 different animals, gathered at different points in time, both collected in Greece. I am trying to figure out the best way of handling the data in terms of autocorrelation. I was wondering if any of you have any advice on how autocorrelation is tested and managed in such datasets for the creation of SDMs.
Firstly I am unsure whether checking and handling autocorrelation is at all necessary for SDMs given that what I am looking to create is a suitability model for bears in Greece - wouldn't larger use of an area correlate to higher suitability in this case? I don't want to end up thinning the data in a way that excludes these habitat preferences.
I was also unsure on how Spatial autocorrelation differs from Temporal autocorrelation in this case?
Any advice would be very much appreciated.
Thank you,
Angeliki
Using GeoDa software, I have run a spatial error model and obtained coefficients for each of my independent variables, in addition to constant and lambda variables. I am now wondering how to write my spatial error equation. Any help would be appreciated.
Obviously, the world is watching covid-19 transmission very carefully. Elected officials and the press are discussing what "the models" predict. As far as I can tell, they are talking about the SIR model: (Susceptible, Infected, Recovered). However, I can't tell if they are using a spatial model and if the spatial model they are using is point pattern or areal.This is critical because the disease has very obvious spatial autocorrelation and clustering in dense urban areas. However, there appears to be a road network effect and a social network effect. For example, are they using a Bayesian maximum entropy SIR? Or a Conditional Autoregressive Bayesian spatio-temporal model? An agent based model? Random walk?
I mean "they" generally. I'm sure different scholars are using different models, but right now I think I can find one spatio-temporal model, and what these scholars meant is that they did two cross sectional count data models (not spatial ones either) in two different time periods.
Hi,
I have a long term point data of occurrence of a spatial event. I want to analyze the long tern spatial variability of these events.
can you suggest any Spatio-statistical method for such variability analysis.
You Suggestion will be appreciate.
Thank You
My observations are points along a transect, irregularly spaced.
I aim at finding the distance values that maximize the clustering of my observation attribute, in order to use it in the following LISA analysis (Local Moran I).
I iteratively run Global Moran I function with PySAL 2.0, recreating a different distance-based weight matrix (binary, assigning 1 to neighbors and 0 to not neighbors) with a search radius 0.5m longer at every iteration.
At every iteration, I save z_sim,p_sim, I statistics, together with the distance at which these stats have been computed.
From these information, what strategy is best to find distances that potentially show underlying spatial processes that (pseudo)-significantly cluster my point data?
PLEASE NOTE:
- Esri style: ArcMap Incremental Global Moran I tool identify peaks of z-values where p is significant as interesting distances
- Literature: I found many papers that simply choose the distance with the higher absolute significant value of I
CONSIDERATIONS
Because with varying search radius the number of observations considered in the neighborhood change, thus, the weight matrix also change, the I value is not comparable
In detail, I know the 'spdep'package in R could do it, but the question is that I don't know what is the data format before to analyze in R, and some analysis code. I hope someone can help me, thank you!
Hi all.
I have a panel dataset containing 5 dependent variables (X) and a single independent variable (y). The dataset is based on repeated observations on a spatial grid (T = 78, n = 686, N = 53508, where T is the number of months, and n is the number of grid cells).
I believe that y can be expressed as a function of X, but I don't know if the coefficients of such a model are static or if they too vary with T and n (in theory, the coefficients are likely to vary with n and potentially T, but I don't know if my dataset has enough observations to support either possibility).
To start with I have tried constructing a fixed-effects model using the plm R library, where n is the fixed effect. I get a reasonable R2 and all the variables are statistically significant. As well as this, the Hausman test suggests that the fixed effects model is better than a random effects one.
However, I have run a Breusch-Godfrey and Pesaran CD test, which say that my model suffers from serial correlation and cross-sectional dependence. As this is my first attempt at regression modelling I am not sure how to remedy this. What should I do to make my model more robust, and is there a way to test whether the fit coefficients vary with at least n? Is there a better way to fit a spatio-temporal model in R? Thanks in advance!
Hello everybody !
I’m actually doing a master thesis on a bird, the Corn bunting, more specifically on the effect of agri-environment schemes (AES) on the distribution of territories of the bird during the breeding season in Belgium.
But now it’s time to work on the statistical analysis and I’m not the best on this field.
First, I’ve used quadrat sampling that are separate from each other to avoid spatial autocorrelation and each quadrats sampling have been visited only once. I’ve created a matrix in which I have, for each quadrat sampling :
- the surface area of 6 categories of fields,
- the number of Corn bunting encounter (and presence/absence),
- the surface area of a number of AES.
For the analysis, I’m considering the quadrat sampling as the terrritories of the birds inside it (it can biased the results but it’s necessary because of the lack of time).
I’m now wondering several questions :
- Is a GLM adequate for this analysis or should I use a GLMM ?
- If I’m working with the abundance of Corn bunting, is the Poisson Distribution the best one to use ? Is there any parameters to set ?
- If i’m working with the presence/absence, is the Binomial Distribution a good one ? Is there any parameters to set ?
- Is it better to work on the abundance or presence/absence ? I always heard that we are loosing informations when you change from abundance to presence/absence...
- I’m using R to do this statistical analysis, is the command « step » the best one to find the best model (by selecting « both » direction) ?
Thank you for your time and your precious help !
I'm currently working on a project (I work in the TV industry) to predict the launch day's reach % from data in the past year
After cutting out all the variables that have Pearson correlations >0.5, I narrowed it down to 2-3 predictor variables.
However, the DW test statistic turns out to be <0.5 in most markets and I suspect a funnel shape in my scatter plot of standardized residual values against standardized predicted value
Although I know that these campaigns were launched on particular dates in the past year, I do not consider them as time series data because the time intervals are not equally spaced (new shows can be launched any time of the year, though there may be seasonality in the trend due to festivals etc.)
How do I resolve this issue of heterskedasticity and autotcorrelation in such a case?
Dear All,
I think the question is clear.
Is it necessary to perform Global Moran's statistic before Anselin Local Moran's I to measure spatial autocorrelation?
I mean is it true that always we should perform Global Moran's and if the p value is in the significant range then using Local Moran's test?
Thank you.
I have sulphate (pct) and total sulphur (pct) value of numerous samples. i want to see how they relate and spatial distribution of changing ratios in the area. I plotted x,y graph and drew a trendline which did not intercept 0,0. Should i make it intercept to zero because when there is no total s there should be no sulphate. Next step is spatial distribution estimate. If it intercepts 0, i will be working on modelling the ratios (SO4/S).
I recently moved from distance-based techniques to model-based techniques and I am trying to analyse a dataset I collected during my PhD using the Bayesian method described in Hui 2016 (boral R package). I collected 50 macroinvertebrate samples in a river stretch (approximatively 10x10 m, so in a very small area) according to a two axes grid (x-axis parallel to the shoreline, y-axis transversal to the river stretch). For each point I have several environmental variables, relative coordinates inside the grid and the community matrix (site x species) with abundance data. With these data I would create a correlated response model (e.i. including both environmental covariates and latent variables) using the boral R package (this will allow me to quantify the effect of environmental variable as well as latent variables for each taxon). According to the boral manual there are two different ways to implement site correlation in the model: via random row-effect or by assuming a non-independence correlation structure for the latent variables across sites (in this case the distance matrix for sites has to be added to the model). As specified at page 6, the latter should be used whether one a-priori believes that the spatial correlation cannot be sufficiently well accounted for by row effect. However, moving away from an independence correlation structure for the latent variables massively increases computation time for MCMC sampling. So, my questions are: which is the best solution accounting for spatial correlation? How can be interpreted the random row-effect? Can it be seen as a proxy for spatial correlation?
Any suggestion would be really appreciated
Thank you
Gemma
Hello, everyone.
I intend to control spatial autocorrelation in a generalized linear mixed model with binomial error distribution by including the distance-based Moran's eigenvector maps (dbMEM), but I am not sure whether I should include dbMEM as a fixed or random variable in the model. So, what is the best approach? Any help is welcome.
Best wishes,
Rafael.
Hi all,
I'm examining the effect of fire occurrence on visits to National Parks and Forests using a panel dataset. Since my Y variable is a count variable, I intend to do so using a negative binomial fixed effects model. However, my units of analysis are not iid; there is spatial correlation between. I created a spatial weighting matrix on ArcMap which has 3 columns: my unit ID, neighboring unit ID, and weight. However, I'm having a hard time doing anything with it on Stata. I have two questions:
1) When I do spmat import using "weights.dta", I keep getting errors that say "error in line 1 of file." I don't know why that's happening. Here's what's in line 1:
realpudfid nid weight
248 249 .2736487
2) All I'm trying to do is cluster standard errors properly. Is it even possible to do this with a negative binomial model? Could someone walk me through how?
Thank you so much!
Me and my colleague are writing our master thesis and have a few struggles in our econometric procedure. We want to use a dynamic linear model to explain a macroeconomic phenomenon, and are expecting to include lags. According to the AIC, 2 lags is suitable.
In order to check for autocorrelation in our regression model, we want to do a Breuch-Godfrey test. The test acquire to fill in lag order, and this is when we met insecurity. Should we:
1) Use a simple lm to this test and exclude the lags intended to use, or 2) should we include a model including the 2 lags we intend to use. Will the lagged model disturb and give wrong output (as the test requires you to specify lag order)?
Including the lagged model we get 95% significance up to 15 lags, which is a lot more than what the AIC expressed. With the same significance level, our basic linear model shows that 2 lags is suitable.
We have also done a Durbin-Watson test using the lagged model, which indicated no signs of autocorrelation.
We appreciate every answer we can get.
I am trying to fit a Generalized Linear Model with a binomial error distribution to my data accounting for spatial autocorrelation in R. I have tried many approaches with no success. I recently discovered the spaMM package and used the glmmPQL function to fit a binomial model, as described in the help of the argument corMatern (link bellow). However, I obtained the following error: 'Error in getCovariate.corMatern(object, data = data): cannot have zero distances in "corMatern"'. Thus, do I need to exclude the closest data (with zero distance) or is there another alternative to account for spatial autocorrelation in binomial GLM? Any advice is welcome.
The external predictive adaptive response (external PAR: Nettle et al. 2013. Proc Biol Sci 280: 20131343; Gluckman et al. 2005. Trends Ecol Evol 20: 527-533 ) assumes that current environment condition can predict future environment condition. In other words, the environment factors are auto-correlated. How can this kind of auto-correlation being resolved in statistical procedures designed to test the external PAR, as one of basic assumptions of statistical model is the independence between observations.
Open to any suggestions.
There are some methods and software prepared for spatial autocorrelation, that occur in 2-D space (with geo-coordinates). For example, I used Spatial Analysis in Macroecology (SAM). However, it seems, that the procedures implemented in SAM, like Moran I index, spatial correlation, spatial autoregression are designated for 2-D or 3-D space. However, auto-correlation within a linear habitat seems to be different, specific, type of auto-correlation. So, is there a specific method for the analysis of the spatial auto-correlation in such linear habitats or could one use typical methods, just assuming that one of two arbitrary geo-coordinate units is constant?
I am revisiting some data I collected eons ago with a Hewlett Packard 3721A correlator (now obsolete). I used the cross-correlation facility to determine the transit time between two sensors from which I calculated the mean velocity of the fluid. I also measured the corresponding auto-correlation function values but never used them. I vaguely recollect being told at the time that signal coherence could be calculated from the auto-correlation peak values. Now, many years later, I would like to do this but am not sure how.
One example of the data: the auto-correlation peaks for Sensors A and B are 2.6 and 2.4 respectively and the cross-correlation peak (C) is 1.8. The units I believe are milli volts. Can signal coherence be calculated from these data? If so, what equation should be used?
Hi, I have a problem to select the best model between spatial panel (SAR and SEM) and nonspatial panel. The results of LM lag and LM error show insignificant value, but rho/lamda is significant. Does it means spatial model is not necessary? Because most of literature refer to LM tests, not rho/lambda coefficient.
Thanks.
Hello everybody,
I am currently trying to do a Gaussian linear regression in R with data that may be spatially autocorrelated. My dataset contains geographic coordinates (value of longitude, value of latitude), species, independent variables (BS and LTS) and some explanatory variables. The dataset also include the values of latitude and longitude in separated columns.
I extracted positive eigenvector-based spatial filters from a truncated matrix of geographic distances among sampling sites. I would like to treat spatial filters as candidate explanatory variables in my linear regression model. I did this as following:
First of all, I created a neighbor list object (nb). In my case of irregular samplings, I used the function knearneight of the R package spdep:
knea8 <-knearneight(coordinates(dataset), longlat=TRUE, k=8)
neib8 <-knn2nb(knea8)
Then, I created a spatial weighting matrix with the function nb2listw of the R package spdep:
nb2listw(neib8)
distgab8 <- nbdists(neib8, coordinates(dataset))
str(distgab8)
fdist<-lapply(distgab8, function(x) 1-x/max(dist(coordinates(dataset))))
listwgab8 <- nb2listw(neib8, glist = fdist8, style = "B")
Then, I built spatial predictors to incorporate them in the Gaussian linear regression. I did this with the mem function of the R package adespatial, as following:
mem.gab8 <- mem(listwgab8)
Additionally, Moran's I were computed and tested for each eigenvector with the moran.randtest function, as following:
moranI8 <-moran.randtest(mem.gab8, listwgab8, 99)
I obtained some eigenvectors with significant positive spatial autocorrelation. Now, I would like to include them in the Gaussian linear regression. I tried to do this with the function ME of spdep, as following:
GLM1 <- ME(BS~LATITUDE, data=dataset, listw=listwgab8, family=gaussian, nsim=99, alpha=0.05)
Unfortunately, I receive this error:
Error in sW %*% var : Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90
How do I solve this error? Or, there is another way to perform the spatial eigenvector selection in a Gaussian linear regression?
Hello everybody,
I am trying to convert some spatial points (given a value of latitude and a value of longitude) to a neighbour list in R. I need this in order to perform a linear regression with a spatial autocorrelation.
I have a list of individual animal species occurrences worldwide, each one with a single value of latitude and longitude. I would need to create an object class nb (neighbour list), but I do not know how to do this conversion in R.
My data looks like:
SPECIES: LATITUDE, LONGITUDE
species A: -85, 134
species B: 34 , 2
species B: 42, 3
species B: 45, 5
species C: -2, 80
species C: -5, 79
(...)
The dataset also contains other columns with the values of certain variables, but I think this is not important for my purpose.
Any help will be appreciated. Thank you in advance
EDIT:
I am using the package "spdep". First of all, I converted my data frame to a spatial object:
coordinates(dataset) <- ~ LONGITUDE + LATITUDE
Then, I am trying to create a graph-based neighbours list, computed from polygons (making a Delauney triangulation). However, when I try the following:
delau <- rgeos::gDelaunayTriangulation(dataset)
neib <- poly2nb(delau)
I receive an error that I cannot find how to figure it out: Error in rgeos::gDelaunayTriangulation(dataset) : duplicate points not permitted
Would anybody know how to solve this?
conceptually, what is the difference between autocorrelation and partial autocorrelation
In decing the lag lenght for AR or ARMA Models which one should be considered.
thanks in advance
Hi everyone. I have applied multiple logistic regression to create a model based on my independent parameters (x, y & w). My generated model function is Z=ax+by+cw-d where Z is an exponential term including the probability of the occurrence of my dependent parameter (Z=exp(P)/(exp(p)+1)), and all of the parameters are binary.
Now in order to interpret the output, I have calculated the probability of the occurrence of my dependent variable, for all values of all possible permutations of the variables as follow:
1: x=0, y=0, w=0 ------> P=0.74%
2: x=0, y=1, w=0 ------> P=2.3%
3: x=1, y=0, w=0 ------> P=1.35%
4: x=1, y=1, w=0 ------> P=4.14%
5: x=0, y=0, w=1-------> P=1.65%
.
.
8: x=1, y=1, w=1------> P=8.83%
Since the sign of all coefficients (a, b & c) is positive, apparently the highest probability occurs when x, y and w be 1. But in this case the probability got its highest value as only 8.8%. Is this result rational?
And how can I interpret the magnitude of each independent parameter? Can I say that since all the variables are binary and have a positive coefficient, a variable with bigger coefficient have bigger impact on the probability derived from Z?
Thank you all in advance for your kind replies.
Can UTM data for casualties and obstacles (or in general other structural elements of roads such as bends), be used as distances in spatial autocorrelation analysis?
Is there any differences between spatial autocorrelation and spatial non-stationarity? If yes could you explain the differences and novel methods to address them?
Dear researchers,
Is it necessary for the data (640 entries and 26 variables) to follow normal distribution for me to use Spatial Lag Model or Spatial Error Model in GeoDa.
Thanks in advance.
Durbin-Watson Statistics Table has three types of critical values for significance at 1%, 2.5% and 5% level. So how to choose which one to use when evaluating Durbin-Watson statistics (e.g. d=1.12)?
Thank you!
Elith and Leathwick (2009) recommended the Moran's I to testing for spatial patterns in raw data and residuals. I read many literature regarding this and many packages in R, but could not perform this test. Can anyone help me with detail method in data preparation for Moran's I test for raw dataset and model residual?
In particular, my questions are:
-How to deal with abundances recorded during multiple visits (2 or more) to each sampling unit? I see that a common practice is to consider the maximum over the visits as the abundance in the sampling unit. I wonder whether is it possible to account for species detectability directly in the RDA (as in unmarked for univariate models).
-Is it possible in RDA to account for spatial non-independence of
sampling units?
Finally:
-Is it better to consider occurrence (presence-absence) or abundance in RDA analysis? Which give more robust and reliable results?
I have some aggregated data for 20 geographical places and would like to understand Moran results based on the data. I know that the cluster field corresponds to low P-values and higher z-scores as well as positive Moran index, however I am not sure why the cluster gives number 2 for each clustered place. And how can I interpret the results for non-spatial statisticians:
Place Ii Ei Zi p.value Xi wXj Cluster
1 0.527153 -0.05263 1.521057 0.128245 1.099403 0.818118 0
2 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
3-0.14793 -0.05263 -0.25003 0.802567 0.24454 -1.45796 0
4 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
5 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
6 1.169842 -0.05263 3.207143 0.001341 -1.69534 -1.63654 2
Many thanks,
Eiman
As a geography student, I know that spatial autocorrelation does exist everywhere, and I know that spatial autcorrelation of spatial variables violates the assumption of classic statistics, i.e., independence of sample, so we have to consider it when analyzing influence of spatial phenomena or spatial modeling. However, I still do not know to use it in reality. For example, if we know the Moran's I index of housing price in New York is 0.8, then what? What can we do with it?
I am looking into Spatial Neighbors to address autocorrelation in my dataset, but I find it difficult to find arguments as to which method to prefer. I am using the R package "spdep" and functions dnearneigh and knearneigh to determine the distance-based neighbors and k-nearest neighbors, respectively. However, could someone advise me on the main differences between the two methods, as well as on how to determine d2 (upper distance bound) and k (number of nearest neighbors).
What are the technique can be applied for crime mapping/Analysis/Prediction in Indian context?
Please provide reference
What are the spatial issues related to housing submarket analysis?
Hi,
I'm currently trying to use Maxent for roadkill analysis, but one of my colleagues raised a question that using spatially correlated datasets in maximum entropy models leads to biased results and the dataset should be tested and correlated occurrences removed.
I have no clue how to test roadkill point dataset for spatial autocorrelation to get rid of correlated points. What's the easiest and the most common way to determine which points need to be removed to avoid the bias?
If you know / have any information / articles / website for me to look over, please advise on this.
Your direct advise to solve this issue is greatly appreciated.
Best,
Bryan
Suppose you have a set of point data which contains two variables (say var1 and var2), both of which have discrete values (say 1, 2, and 3). You try to map them (see attached files var1.jpeg and var2.jpeg for reference) and decide you want to see how similar these two data are in terms of statistics. You create a "matrix" comparing the values of the two variables (see sim.jpg) and obtain a %similarity by taking the percentage of the total count of all points which had the same value for both variables (e.g. var1 and var2 = 1, etc.) in relation to the total number of points.
Is that the only way to compare the two variables? What other methods can be performed to compare these two?
P.S.: Scatter plots are ineffective, since they only result in a set of 9 visible points (because the data is discrete, the only possible values for each axis is 1, 2 and 3, resulting in only 9 possible combinations of ordered pairs) (see scatter.jpg). Q-Q plots only show 5 points (see qq.jpg).
Thank you very much!


I have a question about the best way to test for violations of Hardy-Weinberg Equilibrium (HWE) among microsatellite loci for a species that is continuously distributed across a study area and showing IBD. We are looking at how landscape features affect gene flow among Eastern Indigo Snakes across a 25 x 50 km study area. We have 110 samples and about half are clustered in the southern half of the study area. A spatial correlogram of individual genetic distance shows that spatial autocorrelation among samples becomes non-significant at 5-10 km. We have used COLONY to identify full-sibs and found about 15 full-sib families although family size was usually two (max. four). There is significant IBD within our study area. STRUCTURE identifies K=4 with all 110 samples but when we randomly exclude all but one full-sib from each full-sib family STRUCTURE identifies K=1-2. We suspect that these STRUCTURE results are the result of neighborhood effects and IBD, respectively.
When we test for violation of HWE at our 15 loci, four have significant violations of HWE. Estimated null allele frequencies at these four loci are 6-15%. When we randomly excluded all but one member from each full-sib family, these four loci were still significantly out of HWE.
In a situation such as this, is it appropriate to test for HWE using all samples? I know that in systems with discrete populations researchers often test for HWE within each population, since violations may represent a mixture of multiple populations. But any designations of “populations” in our study area seem very arbitrary (e.g., driven by sampling intensity rather than the distribution of individuals).
Does anyone have any suggestions about the appropriate way(s) to test for HWE in a system such as ours?
Thanks,
Javan Bauder
I've been using SAS and GeoDa/ArcGIS
Dear all,
I have data in 6 region monthly for 2 years and its coordinates (attached), I would like to run spatial autocorrelation analysis with my data in R especially with Moran's I. Which the suite method/packages for my data?
I want to know, whether the rice prices among 6 region have spatial autocorrelation? Anyone can help me please?
Best wishes and thanks before...
Yoga
Dear all,
I am doing a TS analysis with four variables from 1980-2015. The post-estimation test statistics and p-values obtained are listed below:
1) Durbin-Watson (for autocorrelation): 2.1876; the following are p-values:
2) Breusch-Pagan (for heteroscedasticity): 0.1815
3) Breusch-Godfrey (for higher-order autocorrelation): 0.4595
4) ARCHLM: 0.9151
5) Ramsey RESET (for omitted variables): 0.5355
but that of Jarque-Bera (normality test) is 0.0238 indicating that the errors are not normally dIstributed.
Should I be worried?
Thank you.
Are there other reliable spatial autocorrelation tests aside from using ENM tools?
I have a ~ 100000 point shapefile. Now I need to take sample from this points while spatial auto correlation is least. I divided the point area by several square grid spacing like 100by 100 etc. and populated attribute with number of points inside each of these grids with intention of taking one point from each of this grids as a representative sample for logistic regression input. I checked the auto ccorrelation among the grid centers if it is negetive or at least zero. For this purpose I drew Moran I correllogram.But this process fails since I found 0 value at a distance which is too much and at that distance of grid spacing I will have only 5 grids so 5 points by which logistic tegression is impossible. So could anyone help me with best method for finding optimum grid dimension/spatial scale?
More background can be found at these links.
I see the importance of calculating spatial autocorrelation for species occurrence data, in order to guarantee spatial independence. However, I do not understand how this is done. Could any of the colleagues working on the topic help me?
From now on I thank you.
I am working with fish population that lives in lakes and ponds. I have GPS data (lat/long) just for the lake not for each individual.
I am trying to use auto correlation method analysis as Smousse & Peakall 1999 implemented in R package in PopGenreport. For this I estimate localization of each individual as the same of population ( It is like pile up all individuals)
I am getting some weird result: the correlogram looks "u" inverted (negative autcorrelation with small and big distances but positive with medium distances).
My guess is due all individuals in one pop are assigned the same position so the autocorrelation will fail. So, I add some noise in lat/long data ( random uniforme -1m,+1m). For "noised" data I get expected correlogram.
My question is, my approach is correct? is there any other method to manage this problem?
Smouse PE, Peakall R. Spatial autocorrelation analysis of multi-allele and multi-locus genetic microstructure. Heredity 82: 561-573
Hi,
I'm attempting to generate artificial landscapes with varying levels of spatial autocorrelation and I've read that unconditional Gaussian simulations would be able to do this. I don't have much experience with spatial modeling, but I understand that varying the beta parameter creates landscapes with values centered around that mean, with a variance that is defined (sill parameter). The levels of autocorrelation is then controlled by a range parameter, which determines the distance at which there is no correlation. Are there any equations associated with this method or is it as simple as what is stated above? In general, I am trying to gain a better understanding of this method and would appreciate any help.
Thanks
I need to statistically compare two maps in order to determine if the spatial distribution of their data is correlated or not. any suggestions? Thanks!
Hello,
I would like to ask about any method for testing spatial dependence on categorical data (e.g. vegetation types polygons) and tools for modeling it against environmental data.
Is Multinomial Regression suitable for this task?
Thanks!
I'm working up to fitting a Tweedie GLMM in R (mgcv). At the moment I'm playing in SPSS to sort out the data and some ideas for random effects. The data are from a survey of coastal dolphins on a sample of sites (Site) around the coast, each measured on a set of transects. The data are the number of groups (NGroups), the group sizes (GSize), and number of individuals (NIndividuals). NIndividuals is the sum of a Poisson number (Ngroups) of Gamma (or Negative Binomial) distributions (Group sizes). Given transect length and the search width, it is possible to calculate the density of sightings for each transect (SightDens). So far I've fitted a Poisson regression to NGroups with a random effect for Site using SPSS Genlinmixed. Now I'd like to see whether there is any spatial autocorrelation among sites. As the sites are strung out around the coast, this can be treated as a purely linear problem based on site order (SiteOrder). I'd like to try to fit a simple AR1 structure to the Site covariance matrix based on SiteOrder. Can anyone help with this?
I am using this method for interpreting the proportion of each predictor variable in a disease occurrence which contains 3 outliers, then I cannot use multiple linear regression, now I want to know that what are the assumptions of Probit-Logit model.
Using GeoDa software, I have run a spatial error model and obtained coefficients for each of my independent variables, in addition to constant and lambda variables. I am now wondering how to write my spatial error equation. Any help would be appreciated.
My purpose is to map hotspots using LISA tools (or in general local statistics). By developing GIS application using available spatial statistic libraries (i.e. PySAL - ESDA), I have still not found documented methods (or suitable literature) to include in the computation more than two variables.
Have you any suggestions on how to map simultaneously hotspot or significant spatial clusters of more than two variables?
In the following link one of the papers (Anselin et al. 2002) I have considered as reference
or
How do I relate Housing Submarket with Environmental Characteristics?
in Real Estate/ Housing fields
I have occurrence data (lat/long) and environmental layers (i.e., raster layers). I am doing SDM using Maxent algorithm. How can I deal with the issue of spatial autocorrelation, and (may be) remove the points that are spatially autocorrelated?
What is the relationship between Spatial Dependence and Housing Submarket? How can spatial autocorrelation explains Housing Submarket?
Hi everyone, I'm going to use one of these two indicators in order to investigate the spatial pattern of my ward wise disease data. Which one is more appropriate and why? I need some reliable references to help me for performing them into my study.
i have an a-apriori knowledge about the autocorrelation function r(h) , and the more r is near 1 or -1, there is a significant spatial correlation, and also when the lag distance increase , the r(h) will diminish, but i want more details from expert in spatial analysis.
Kind regards Louadj yacine
I have 25 data points which represents woody plants richness. Each one summarizes the value of this variable for the same number of 2mx2m quadrats. Field design was 50mx2m vegetation transects.
Is it meaningful to explore spatial autocorrelation using with Moran's I and variograms?
What method I can use to evaluate the concentration of spatial data, considering the weight, position and distance between data?.
I am trying to determine the serial autocorrelation in my precipitation data. I have data for 30 years (1986-2015) and I am using the "ZYP" package in R and the Zhang method described in: Zhang, X., Vincent, L.A., Hogg,W.D. and Niitsoo, A., 2000. Temperature and Precipitation Trends
in Canada during the 20th Century. Atmosphere-Ocean 38(3): 395-429.
My code is simple
setwd("C:/Users/sch298/Documents/Kentucky River weather data/Daily homogenization")
y=read.csv("P-GHCNDUSC00150624.csv",as.is=TRUE)[, 2]
zyp.trend.vector(y, x=1:length(y), "zhang", T, T)
It ran fine but all the trend estimates are zero including the lower and upper bounds. I am sure something is wrong. I have attached my data file and output screenshot. Can anyone please help me to understand what might be wrong?
Thanks
Som
Dear all, I would like run spatial autocorrelation analysis with my data in R (or other software such as Minitab, Past or Python). My data comprise 100 1m2 plots with control paired plots 1m far away treatment. In all plots I measured plant cover and I want to measure species co-ocorrence in each plot. All plots are georeferenced with lat and long in degree, minutes and seconds. I want know if had autocorrelation in my sampling. Can someone help me?
Best wishes,
Jhonny
Dear colleagues,
I have a dataset consisting of continuous, categorical, and binomial (presence/absence) variables, and I need to test for spatial autocorrelation in each one of them. I have done it already for the continuous variables using the standard Moran's I (Moran.I function of the ape R-package), but I understand I cannot do this for my other variables. What other R alternatives do I have to test for spatial autocorrelation in categorical and binomial (presence/absence) variables?
Thanks in advance!
I am trying to test my preliminary MaxEnt model residuals for spatial autocorrelation in order to decide whether I need to rarefy my presence points. Since the data came from multiple sources (mostly literature and museum records), it is obvious that these records were gathered through different sampling effort and somewhat clustered in space. From the other hand, the species is rarely collected throughout its range (I have only about 300 points) and I would like to retain for modeling purpose as many records as I can. The method I use follows Nunez & Medley, 2011, who propose calculating Moran's I at multiple distance classes with SAM software. However, I am having troubles with interpretation of the results. Attached you can find the correlogram I am getting in SAM. From what I can see in the results, the first and the highest Moran's I value = 0.475 (p=0.005) is at distance of 64.007 and the rest has much lower values, with most of them being negative. Does that mean that my presence points can be filtered at minimum distance of 65 km or they are OK and show only weak SAC? I read somewhere that only significant values greater than +-0.5 or +-0.7 can indicate serious spatial pattern. Am I getting it right?
Another related question is: why the correlogram shows Distance units instead of km? I am choosing the Geodesic coordinate system when loading my data but the program keeps showing the distance in units...
I am absolutely newbie to SDM and spatial statistics, so any help/tips will be greatly appreciated!
Kind regards,
Serge

I am wondering if anyone here has ever dealt with spatial autocorrelation using Logistic Regression in GIS.
In the literature I have read so far, sometimes the issue is not even addressed. In other instances, the authors used the geographic coordinates as covariates. For example, quoting from Hu, Z., & Lo, C. P. (2007). Modeling urban growth in Atlanta using logistic regression. Computers, Environment and Urban Systems, 31, 667–688: "The second step was including spatial coordinates of data points into the list of independent variables. Spatial autocorrelation can be alleviated to some extent by attempting to introduce location into the link function to remove any such effects present (Bailey & Gatrell, 1995). For example, spatial coordinates of observations might be introduced as additional covariates, or to classify regions in terms of their broad location and treat this classification as an extra categorical explanatory factor in the model."
At the best of my understanding, the latter approach is termed "autocovariate" modeling by: F. Dormann, C., M. McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., … Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography, 30(5), 609–628. http://doi.org/10.1111/j.2007.0906-7590.05171.x
I would like to know your opinion on the issue, and what approach you happened to use.
I need to test for spatial autocorrelation in my data. In R, I used the Moran.I function from the ape package, and the moran.test function from the spdep package, but got strickingly different results. I think this may be due do the weights employed to calculate Moran's I: While in ape your weights are given by your distance matrix, in spdep you must specify spatial weights from a neighbors (nb) object, choosing a given style (binary, standardised, ect). Could anyone please clarify this and perhaps suggest which package is best?
DEAR MEMBERS
The Durbin-Watson stat is not valid as an indicator of autocorrelation when there is a dependent lagged variable on the right side of the equation.
is it true that the DW stat is not valid for panel data in any case??
what about Rsquare in panel data..what shoud be the value for good model ? if R square is 0.009 R square=.34 ,or Rsquare =.45 ?? not good fit, is it necessary to be 0.50 at least?
Dear all,
Im working in a field experiment with 20 traps (pitfall traps to collect ground atrhopods) in a treatment field and 20 traps in a reference field (so potentially spatially autocorrelated).
I performed a nMDS (non param multidimensional scaling) plot to assess multivariate ordination of those samples and I plotted also 95% confidence ellipses to visualize effective discrimination between the treatment and the reference field. Then I would like to have a statistical measure of this discrimination so my idea was to perform a perMANOVA (adonis function in R software) to test dissimilarity between fields. So my question is:
-Can I use perMANOVA with such experimental design? If not, is there a way to deal with such autocorrelation? Suggestion on alternatives?
Thanks a lot
Alessandro