Science topic

Geostatistical Analysis - Science topic

Explore the latest questions and answers in Geostatistical Analysis, and find Geostatistical Analysis experts.
Questions related to Geostatistical Analysis
  • asked a question related to Geostatistical Analysis
Question
1 answer
When we conduct geostatistical analysis for water quality, how do we pick the distance between two points in canals and lakes? what is the grid schema? give examples
Relevant answer
Answer
Hi Shaimaa,
When conducting geostatistical analysis for water quality, the selection of the distance between points in canals and lakes depends on your specific research goals, but there are three general approaches you can follow:
  1. Random Sampling: You can collect points randomly across the study area, ensuring that the points are spatially distributed and well spread out. This helps in capturing the overall variability of the area.
  2. Transect-Based Sampling: You can collect points along transects, ensuring that the distance between transects remains consistent. This method is useful when you're studying patterns along a specific direction, such as across a river or along a canal.
  3. Grid-Based Sampling: In this approach, the study area is divided into a grid, and samples are collected at each grid intersection or within each grid cell. This ensures that the entire study area is uniformly covered.
Regarding the number of points, it is essential to collect at least 30 points for statistical robustness. However, the exact number can vary based on the size and variability of the area being studied.
As for the distance between points, it depends on the nature of the study. For instance, if you're comparing your water quality points with satellite imagery, the distance between points should be large enough to avoid multiple points falling within the same pixel.
In another example, if you're collecting depth data, the distance between points should be adjusted according to the seabed's topography. If the seabed is steep or irregular, shorter distances between points may be necessary to capture changes accurately. In contrast, if the seabed is relatively flat, larger distances might be appropriate.
I hope this explanation helps!
  • asked a question related to Geostatistical Analysis
Question
4 answers
I would like to use the R package "gstat" for predicting and mapping the distribution of water quality parameters using tow methods (using kriging and co-kriging)
I need a guide to codes or resources to do this
Azzeddine
Relevant answer
Answer
So you want the code for Kriging using gstat. A simple Google search shows plenty of tutorials. You just need to apply the principles showed on these tutorials to your data.
I would advise you to start with Kriging and then move to co-Kriging.
  • asked a question related to Geostatistical Analysis
Question
3 answers
Presently I am attempting sequential gaussian simulation using SGeMS software. I have followed the steps as shown in various videos available online and also the user guide (Remy et. al., 2006). I have set a grid of 50x20x100 as per procedure and finally derived 50 realisations. Now, I want to retrieve the simulated data along with the grid point locations so that I can plot them in some different software.
I followed the steps of "object -> save project -> grid to save -> property to save -> csv format" and saved the file. However, when i open it, I get the simulated values of variable but not its XYZ coordinates. I have attempted multiple times unsuccessfully.
Can any one point out where am I missing the step?
TIA.
Relevant answer
Answer
Thank you Dr Robin for your prompt answer. However, my problem still persists. The reasons are:
1. My cell sizes are 2mx25mx5m in XYZ direction, hence even if I reshape my computation grid of 1x100000 to 50x20x100, I wont be able to assign the right XYZ coordinates to each centroid unless I know the sequence of numbering of centroids.
2. I don't know use of Python.
I wonder whether/why the software which takes input of cordinates in proper format with clearly defined X,Y,Z coordinates, does not deliver the output in similar fashion.
Thank you for your help anyway.
Best wishes
  • asked a question related to Geostatistical Analysis
Question
2 answers
What ways can I get measurements of ozone and nitrogen dioxide concentrations in an area? What satellites provide this type of data and can I use Google Earth Engine to retrieve the data when it is available?
Reghais.A
Thanks
  • asked a question related to Geostatistical Analysis
Question
5 answers
Hi, I was hoping someone could recommend papers that discuss the impact of using averaged data in random forest analyses or in making regression models with large data sets for ecology.
For example, if I had 4,000 samples each from 40 sites and did a random forest analysis (looking at predictors of SOC, for example) using environmental metadata, how would that compare with doing a random forest of the averaged sample values from the 40 sites (so 40 rows of averaged data vs. 4,000 raw data points)?
I ask this because a lot of the 4,000 samples have missing sample-specific environmental data in the first place, but there are other samples within the same site that do have that data available.
I'm just a little confused on 1.) the appropriateness of interpolating average values based on missingness (best practices/warnings), 2.) the drawbacks of using smaller, averaged sample sizes to deal with missingness vs. using incomplete data sets vs. using significantly smaller sample sizes from only "complete" data, and 3.) the geospatial rules for linking environmental data with samples? (if 50% of plots in a site have soil texture data, and 50% of plots don't, yet they're all within the same site/area, what would be the best route for analysis?) (it could depend on variable, but I have ~50 soil chemical/physical variables?)
Thank you for any advice or paper or tutorial recommendations.
Relevant answer
Answer
Thank you!
  • asked a question related to Geostatistical Analysis
Question
15 answers
Many open source programs exist in the field of geology with all its specializa (Water resources , hydrology , Hydrogeology, Geostatistics ,Quality water .......etc) that many people are unaware of.
What software do you want to suggest to us ?
Thanks
Reghais Azzeddine
Relevant answer
Answer
Here is a list of common software and free alternatives
Software list
Illustrator => Inkscape, Scribus
Photoshop => gimp
Matlab => Python, R, GNU Octave
Anaconda, R studio, Jupyter notebook, Spyder
ArcGIS Pro and ArcGIS Online => QGIS, GRASS, uDig , GEODA, FOSS4G, Leaflet
PowerPoint => Google slides, LibreOffice, FreeOffice
Microsoft Word => LaTex, Google docs, LibreOffice, FreeOffice
Excel => Google sheets, LibreOffice, FreeOffice
Microsoft OS, Mac OS (and older computers) => Linux (Ubuntu, many others)
Others
GitHub, Arduino, Raspberry pi, Audacity, BRL-CAD, freecad, Dia, PDFCreator, Blender, Cinelerra, Bluefish, KeePass, 7-Zip, Psiphon, Clonezilla, VLC, Quanta Plus, NixNote, Overleaf, TeXstudio
  • asked a question related to Geostatistical Analysis
Question
10 answers
I have two datasets. One with 9 past cyclones with their damage on the forest, wind speed, distance from the study site, recovery area. Another dataset with future sea-level rise (SLR) projections and potential loss area due to SLR.
  1. By using data from both disturbance events datasets (loss area, recovery area, wind speed, predicted loss area from SLR) can I create any kinds of disturbance risk/vulnerability/disturbance index/ hazard indicator map of the study area?
  2. What kinds of statistical analysis can I include in my study with these limited data sets which will help me to show some sort of relationship of "Loss Area" with other variables?
  • asked a question related to Geostatistical Analysis
Question
1 answer
We know that the disperssion variance is related to the domain size V and the support size v. The textbook said that if keeping v unchanged, the disperssion variance will increase as the domain size V increased. I want to know why? Is there mathemtical evidence?
  • asked a question related to Geostatistical Analysis
Question
2 answers
We know that LU decomposition is an important method for stochastic simulation of 2D RF. Assume the covariance matrix of regionalized RVs is C, it can be decomposed as C=LU according to the LU algorithm, then a RF can be generated by X=L'*y, where y is a vector consisting of independent standard normal random numbers. I want to know whether can I use LU decomposition for simulation if n observations exist as the conditioning data? If so, how can it be demonstrated?
Relevant answer
Answer
Thare are many decomposition types, Why you do not try another of what you applied before
regards
  • asked a question related to Geostatistical Analysis
Question
2 answers
I want to simulate a random field Z(u) that has a nested variogram, say gamma(h)=gamma1(h) + gamma2(h) + gamma3(h), assuming the variogram is isotropic. Whether can I simuate independently three random fields: Z1(u) with correlation structure gamma1(h), Z2(u) with correlation structure gamma2(h), Z3(u) with correlation structure gamma3(h), and then sum them up Z(u) = Z1(u) + Z2(u) + Z3(u)?
Relevant answer
Answer
Thank you, Donald Myers. I intend to use unconditional sequential (Gaussian) simulation based on simple kriging. From my understanding, we do not have to assume the RF is Gaussian. Note that I assume second-order stationarity for my question.
I want to simulate a random field with more complex spatial correlation structure. But the code available for me (Matlab) only provides simulation based on variogram with definition of one structure. Therefore, if the practice is equivalent in my question, then I can do the simulation by independently simulating three random fields and sum them up.
  • asked a question related to Geostatistical Analysis
Question
5 answers
I did a MCA analysis using FactoMineR. I know how to interpret cos2, contributions and coordinates, but I don't know how values of v.test should be interpreted.
Thank you
Relevant answer
Answer
(a v.test over 1.96 is equivalent to a p-value less than 0.05 )
  • asked a question related to Geostatistical Analysis
Question
3 answers
The published RP mainly shows the use of cluster analysis like agglomerative hierarchical clustering (AHC) technique to divide the data into clusters having the common trait followed by Oneway ANOVA test followed by DUNCAN test as a posthoc test. However, no one mentioned the detail of the procedure of how the multiple variables or clusters were utilized to prepare single site-specific management zones. This makes it difficult for the layman. Please share the detail or any video link to prepare the zonation map ArcGIS.
Relevant answer
Answer
Efforts are good and there are many research papers on MZs, but still, these are not useful for laymen as only a written text is given in the RPs. It is easy to write but different in practice.
Like (paper attached) the result of the cluster analysis(factor score attached), we can prepare PCs maps (Fig. 3).
Still, the question remains unanswered for preparing the final MZs (Fig. 5). How to get value with respect to each sampling point for preparing (Fig 5 and table 4). Even in the email to many corresponding authors RPs no reply was received.
  • asked a question related to Geostatistical Analysis
Question
8 answers
I have created a soil moisture proxy model that shows areas with higher or lower soil moisture. In other words, each pixel in the GIS raster data has a value. I developed and applied the model in ArcGIS 10.2.2. I am wanting to find a high resolution land surface temperature dataset that I could use to validate the model. By high resolution, I would like to have at least sub-kilometer. For my purposes, I need it to be high resolution, but I am just trying to determine whether they correlate on an ordinal scale. It would be preferable that it was in raster format as well. 
I know that MODIS is probably one of the best datasets, but I think it's resolution is only 1-km. Does anyone have any other suggestions? Thanks so much for your time and consideration.
Relevant answer
Answer
  • asked a question related to Geostatistical Analysis
Question
9 answers
I want to carry out PCA on a set of chemical data, some of them in oxide form and some in elemental form. The oxides are in percentage and the elements are in ppm.
I have understood that, the data have to be normalised/standardised before starting PCA. Now,
1) Should I have to convert all oxides to element first?
2) Should I have to convert all into single type of unit (either percentage or ppm )?
3) For normalisation, should I go for lognormal (10), lognormal (2) or natural log? What is the best way to decide which one is ideal?
4) If some elements show lognormal (10) distribution and some show Ln distribution, can I apply them separately or a single method to be followed for all?
5) Can I attempt IDF-Normal method for normalisation of such data?
Kindly advise.
Relevant answer
Answer
This article will help you to start with PCA and understand when to standardize the datasets https://www.reneshbedre.com/blog/principal-component-analysis.html
  • asked a question related to Geostatistical Analysis
Question
22 answers
Dear all,
I know it might depend also in the distribution / behavior of the variable that we are studying. The sample spacing must be able to capture the spatial dependence .
But, since Kriging is very much dependent in the computed variance within lag distance, if we have few number of observations we might fail to capture the spatial dependence because we would have few pairs of points within a specific lag distance. We would also have few number of lags. Specially, when we have points with a very irregular distribution across the study area, with a lot of observation in a specific region and sparce observations in other region, this will also will affect the estimation of computed variance among lag (different accuracy).
Therefore, I think in such circumstances computing semivariogram seems useless. What is the best practices if iwe still want to use kriging instead of other interpolation methods?
Thank you in advance
PS
Relevant answer
Answer
You need to separate two questions, first there is the number and spatial pattern of the data locations used in estimating and modeling the variogram. Secondly there is the number and spatial pattern of the data locations used in applying the kriging estimator/interpolator. These are two entirely different problems. The system of equations used to determine the coefficients in the kriging estimator only requires ONE data location but the results will not be very useful or reliable. Now you must decide whether to use a "unique" search neighborhood to determine the data locations used in the kriging equations or a "moving" neighborhood. Most geostatistical software will use a "moving" neighborhood, if you use a moving neighborhood then about 25 data locations is adequate, using more may result in negative weights and larger kriging variances. Depending on the total number of data locations and the spatial pattern there may be interpolation locations where there are less than 25 data locations. Using a "unique" search neighborhood will likely result in a very large coefficient matrix to invert.
With respect to estimating and modeling the variogram you must first consider how you are going to do this. Usually this will include computing empirical/experimental variograms but for a given data set the empirical variogram is NOT unique. It will depend on various choices made by the user such as the maximum lag distance, the width of the lag classes and whether it is directional or omnidirectional. An empirical variogram does not directly determine the variogram model type, e.g. spherical, gaussian, exponential, etc. It also does not directly determine the model parameters such as sill, range.
Silva's question may seem like a reasonable one to ask but it does NOT have a simple answer. Asking it implies a lack of understanding about geostatistics and kriging.
1991, Myers,D.E., On Variogram Estimation in Proceedings of the First Inter. Conf. Stat. Comp., Cesme, Turkey, 30 Mar.-2 April 1987, Vol II, American Sciences Press, 261-281
  • 1987, A. Warrick and D.E. Myers, Optimization of Sampling Locations for Variogram Calculations Water Resources Research 23, 496-500
  • asked a question related to Geostatistical Analysis
Question
11 answers
Aim is to find signal value at x0 from signal values at xi, i=1,..N using Kriging, given as Z(x0)=sum(wi Z(xi)).
After fitting a non-decreasing curve to empirical variogram, we solve following equation to find the weights wi's-
Aw = B,
where A is padded matrix containing Cov(xi,xj) terms and B is vector containing Cov(xi,x0).
In my simulation setup, weights often have negative value (which is non-intuitive). Am I missing any step? As per my understanding, choice of curve-fitting function affects A. Weights are positive only if A is positive-definite. Is there a way to ensure that A is positive-definite?
Relevant answer
Answer
I think the problem is that the sample space of the variables considered has not been taken into account. Kriging in any form (simple, ordinary, ...) has been devised for real random variables, with support the whole real line, going at least conceptually from minus infinity to plus infinity, and endowed with the usual Euclidean geometry. In such a case negative weights would be no problem. If you expect positive estimates, than the variable is not supported on the whole real line, and you need to take this into account. In the field of compositional data analysis you can find tools to adress this problem. The essential tool is to determine the natural scale of your data, to find appropriate orthonormal coordinates for your data, and to perform estimation in the resulting representation of your data.
  • asked a question related to Geostatistical Analysis
Question
5 answers
We know that the estimation error (in percent) can be calculated with EEx = a*sqrt(var(x))*100/x, where EEx is the estimation error, a is a constant, var(x) is the variance (came from (co)kriging), and x is the estimated value by (co)kriging.
Considering an estimated value below 1%, the above equation may leads to large and meaningless estimation errors (thousands of percentages).
The question is that what should we do for estimated values less than 1%?
in the other side, we need to report the estimation error in percentages to classify mineral resources based on their estimation error.
p.s.: an example is Phosphorus grades at an iron ore mine. They vary between 0 and 1, but after performing (co)kriging, their estimation error ranges around 1500%.
Despite performing compositional data analysis (CoDa), this happens, more or less.
Relevant answer
Answer
The equation must solve any problem
  • asked a question related to Geostatistical Analysis
Question
9 answers
I am interested to know the suitability of Geostatistical Analysis or Geospatial Analysis in ArcGIS.
Both of the tools contain interpolation techniques, I am not sure which one should be used for mapping of Soil properties (like Bearing Capacity) using point data?
Relevant answer
Answer
Geostatistical Analyst uses sample points taken at different locations in a landscape and creates (interpolates) a continuous surface. The sample points are measurements of some phenomenon, such as radiation leaking from a nuclear power plant, an oil spill, or elevation heights. Geostatistical Analyst derives a surface using the values from the measured locations to predict values for each location in the landscape.
Geostatistical Analyst provides two groups of interpolation techniques: deterministic and geostatistical. All methods rely on the similarity of nearby sample points to create the surface. Deterministic techniques use mathematical functions for interpolation. Geostatistics relies on both statistical and mathematical methods, which can be used to create surfaces and assess the uncertainty of the predictions.
Geospatial analysis is the gathering, display, and manipulation of imagery, GPS, satellite photography and historical data, described explicitly in terms of geographic coordinates or implicitly, in terms of a street address, postal code, or forest stand identifier as they are applied to geographic models.
  • asked a question related to Geostatistical Analysis
Question
4 answers
Hello! I have a data base of geochemistry analyzes.
My doubts is:
1 - Can I apply statistics measurements for the same variable even though it was measured by several analytical methods?
Eg: Can I get the mean of SiO2 from whole rock samples which was measured by X-Ray Fluorescence and by Electron microprobe?
2 - Can I get linear correlation from two different variables measured by different analytical methods?
Eg: Linear correlation between H2 that was measured by Gas chromatography and FeO that was measured by Electron microprobe?
3 - In which readings can I learn this theory?
Relevant answer
  • asked a question related to Geostatistical Analysis
Question
10 answers
I have a dataset with distances between beneficiaries and the nearest provision point (nearest hub).
I want to develop a model to explain distances based on several atrributes like category of beneficiary, category of provision point among others:
distance ~ cat_beneficiary + cat_provision + altitude + ...
I guess I should use a GLM, but I don't know which model would fits better with this kind of data (continuos and positive). Can I use a count data model (like Poisson or NB)? Or they just work with discrete data?
I attach a histogram.
Relevant answer
Answer
Dear Josep,
Replacing zeros with small numbers (10^-6 or .Machine$double.xmin) may "solve" the problem, but a more elegant solution is using zero-inflated models (e.g. packages 'glmmTMB' or 'glmmADMB').
HTH,
Ákos
  • asked a question related to Geostatistical Analysis
Question
1 answer
In geostatistics, structural analysis aims to get a universal expression characterizing the spatial variation of regionalized variable in different directions. The general expression is γ(h)=Σγi(h), where γi(h) represents the variogram model in the ith direction, which is subject to some geometric transformation and then is isotropic.
My question is:
(1) whether the expression γ(h)=Σγi(h) applies for the geometric anisotropic variogram? In detail, if the variograms in two orthogonal directions have the same nugget and sill, but the ranges are different, we call this geometric anisotropy. The common practice is to make a linear transformation [1, 0; 0, K] to the lag vector, then the variogram model can be used for all directions.
(2) If the geometric anisotropy can also be expressed as sum of variograms in different directions, then it should be equivalent with the commonly used way as mentioned in (1). But how these two ways can be related to each other?
(3) I think the core question is: why the variograms in different directions can be added?
Could anyone give me some explanations, and some illustrated examples?
Relevant answer
Answer
2008 Myers, D.E.Anisotropic radial basis functions International J. of Pure and Applied Mathematics 42, 197-203 No, this model can result in a non-invertible kriging matrix.
What you are trying to construct is not a geometric anisotropy but rather is more like a zonal anisotropy.
See 1990, D.E. Myers and A. Journel, Variograms with Zonal Anisotropies and Non- Invertible Kriging Systems Mathematical Geology 22, 779-785
Also see 2008 Myers, D.E.Anisotropic radial basis functions International J. of Pure and Applied Mathematics 42, 197-203
A geometric anisotropy is when the range varies with direction
  • asked a question related to Geostatistical Analysis
Question
10 answers
My observations are points along a transect, irregularly spaced.
I aim at finding the distance values that maximize the clustering of my observation attribute, in order to use it in the following LISA analysis (Local Moran I).
I iteratively run Global Moran I function with PySAL 2.0, recreating a different distance-based weight matrix (binary, assigning 1 to neighbors and 0 to not neighbors) with a search radius 0.5m longer at every iteration.
At every iteration, I save z_sim,p_sim, I statistics, together with the distance at which these stats have been computed.
From these information, what strategy is best to find distances that potentially show underlying spatial processes that (pseudo)-significantly cluster my point data?
PLEASE NOTE:
  • Esri style: ArcMap Incremental Global Moran I tool identify peaks of z-values where p is significant as interesting distances
  • Literature: I found many papers that simply choose the distance with the higher absolute significant value of I
CONSIDERATIONS
Because with varying search radius the number of observations considered in the neighborhood change, thus, the weight matrix also change, the I value is not comparable
Relevant answer
Answer
Hi everyone,
after a little research, I finally came up with the answer I was looking for.
Short answer:
when using Global Moran's I index (I) with incrementally increasing distance searches (thus, changing the weight matrix at every iteration), only the the z-values are independent from both weight matrices and variable intensity variations, thus, they are comparable across multiple analyses.
The I in Moran's I statistics is not comparable across analyses, i.e, if with distance of 10m I=0.3 and distance 15m I=0.6, we cannot say that with a distance of 15m the clustering strength is double.
We could only say that in both cases there is a positive (sign of the I) spatial autocorrelation.
For the strengths, we use the z-values.
That is why ESRI plots distances in the x-axis and z-values in the y axis, indicating significant (p-value < than specified signification level) peaks as interesting distances.
For more information, it is clearly explained during a class that Luc Anselin in this Global Autocorrelation class, given in 2016 in Chicago University.
follow from minute 38 when he talks about the permutation approach.
Enjoy!
  • asked a question related to Geostatistical Analysis
Question
11 answers
I want to compare two maps (covering the same area):
  • a raster map of the Topographic Position Index (TPI) with values ranging from 0 to 100. This is a continuous variable. I added a picture in attachment.
  • a raster map with wetlands (permanent wetland, temporal wetland, no wetlands present), so this is a categorical variable. I added a picture in attachment.
I want to check whether the TPI map can predict wetlands. I want to look for similar spatial patterns.
I was thinking in terms of probability (e.g. Bayesian Networks), but I am not sure this is the right technique.
Does someone know a technique I could use to analyse these two maps? I am using ArcGIS but other software programs (like R) can also be used of course.
Relevant answer
Answer
Annelies Broeckx Thanks. I have downloaded the files and played a bit around with this. If you would assume, that any kind of gradients indicate different regions, then the following may work. I used mainly gradient operators & minkowski metric and and compared everything. Enclosed you will find some colored results which show potential overlaps as color code (whitish). I got 17% overlay for the gradients and if I dilate them 50% at max to respect influences of misclassification and misregistration.
  • asked a question related to Geostatistical Analysis
Question
3 answers
I first state that my domain of interest lies in the understanding of distances in the spatial, geographical domain. This does not involve distances in statistical non-spatial analysis.
Distances in the mathematical sense are characterised by the properties of positivity, separation, symmetry and triangular inequality. In the literature this last property is often wrongly interpreted. I state the hypothesis that the triangular inequality (d(AC) <= d(AB) + d(BC) whatever B) has for main purpose to guarantee the minimal nature of distances. I have so far listed three errors:
1. confusion with with the non-euclidean nature of geographical distances : the fact that distances are not following the straight line is accounted for triangular inequality as in Müller 1982 ("Non-Euclidean geographic spaces: mapping functional distances" Geographical Analysis vol 14). This is clearly a misunderstanding of the non euclidean nature of the transport and mobility realities.
2. considering non-optimal sets of measures (the word "measure" is used to avoid considering them as distances) containing triangular inequalities as in Haggett 2001 p 341 ("Geography a global synthesis" Prentice Hall). This argument is not consistent with the idea of minimality that I mentioned earlier: what can be a sub-optimal distance? Does it exist in spatial analysis of transport such distances?
3. the existence of rest-stops along transport routes creates the possibility that the additivity of time-distances (for instance) is not guaranteed. On the optimal route from A to C passing through B, if there is a need for a stop (for rest, for energy refuelling or other similar purposes) in or around the point B then the addition of time-distances AB and BC is inferior to the overall AC distance creating an apparent violation of the triangle inequality. This idea can be found in the reference article Huriot, Smith and Thisse 1989 "Minimum cost distances in Spatial Analysis", Geographical Analysis, p 300. This argument could be surpassed by using a non-continuous time-distance function to overcome the sub-additivity paradox.
All this discussion brings to the idea that distances are optimal in nature, and that in the spatial analysis of transport and mobility, the concept of distance contains an idea of optimality. This idea could link geography and economics through the principle of optimisation in the distribution of activities and the functioning of transport systems.
I would like to open here a discussion to test if my reasoning is consistent and solid: I welcome any counter arguments, examples, illustrations.
Relevant answer
Answer
" All this discussion brings to the idea that distances are optimal in nature, and that in the spatial analysis of transport and mobility, the concept of distance contains an idea of optimality. This idea could link geography and economics through the principle of optimisation in the distribution of activities and the functioning of transport systems "
Interesting work ..... you are probably aware of the fact that the solution to a linear programming "transport problem" links activities with transport systems, and provides a set of prices (dual variables, or shadow prices) that relate to, and are derived from, the transport costs --- those transport costs do not have to obey the triangle inequality.
  • asked a question related to Geostatistical Analysis
Question
4 answers
To do ArcGIS geostatistical analysis I need China Climatic Zone shape file. I search it in DIvaGIS but it was not found..
Relevant answer
Answer
The Köppen-Geiger classification Eric Delgado dos Santos Mafra Lino mentioned is a good choice.
I also found another link where you can download the global climate zones as a shapefile. I am not sure if that is what you are looking for, but have a look at it:
  • asked a question related to Geostatistical Analysis
Question
8 answers
I plan to use sequential Gaussian simulation (sGs) approaches in study. However, when I apply the inverse normal score transformation (NST) to the data after the calculation, there are some errors (especially in negative results). Therefore, I simulated the data with Box-Cox transformation instead of NST and obtained more appropriate results.
Can the Box-Cox transformation be used instead of normal score transformation in sequential Gaussian simulation (sGs)? Can I also find examples of such transformations previously used (to reference in the article)?
Thank you very much in advance
Relevant answer
Answer
Mr. Murphy
That could be a good idea. I'il take your advice. Thank you so much for your help.
  • asked a question related to Geostatistical Analysis
Question
12 answers
I am using ArcSWAT.I encounter some problems when I try to estimate the flow through these GIS files. I loaded my land use after the watershed delineation, but when I tried to add my look up table, and select "Cropland data layer", I got this error message(error 1).
"The grid value 0 has not been defined.
Please define this grid value."
I double click the 'value 0' and select one of an item for it(Crop-AGRR), but 'Land Use\Soils\Solpe Definition can't complete this time. Then I add my look up table again, but select "User Table" this time, just like running the Example1 which come with the SWAT model, but I got another error message(error 2).
 
 
'An error has occurred in the application. Record the call stack sequence and description error.
 Error call stack sequence
 
 fromChoose2LUSoilsfunctions.vb
Error Number
  
5
Description
Argument ‘Length’ must be greater or equal to zero'
I appreciate it if someone could give me some suggestions.
Relevant answer
Answer
because it have some grid not define value check to find that grid
  • asked a question related to Geostatistical Analysis
Question
10 answers
Hi everyone!
I would greatly appreciate your help
This is my first weeks of self-studying GIS and geostatistical analysis for my hydrobiological research.
I have about 30 points on the lake with x,y,z information and species occurence data.
I'm just interpolated z value (depth) and made bathymetric map (isolines) of the lake using Surfer Golden Software 11.
Now i need to compare this information with species occurence data and maybe find some relation between them, but I don't clearly understand how can I do this.
Which software or tools i should use? Should I continue my work on Surfer? Is there any tutorials or literature for begginers? I'm stuck and in despair.
I've heard that cokriging maybe can help, but I'm not sure. Also I heard about Maxent, but as a beginner I can't understand how to work on this soft, so I need some simple tutorials or something like that I think..
And I am sorry for my English)
Relevant answer
Answer
Hi all,
All these information that you guys provided is very helpful but it might be overwhelming. Zhanna, you mentioned that you have Surfer software available to conduct your analysis. So you have spatial data then with coordinates. Unfortunately none of us seems to be an expert in Surfer. I used it once and don't remember much about it. There is one more program for free SGeMS. You can download it from website, it can perform variogram modeling, interpolation and even sequential gaussian simulation. It is quite efficient in constructing histograms, QQ plots, scatterplots. It is a little pain to work with and it also collapses on you but in the end of the day it does a solid job.
Bottom line start with simple analysis and slowly build it up.
  • asked a question related to Geostatistical Analysis
Question
8 answers
The original data is 2D, and has been detrended using a second order polynomial obtained by OLS. The variogram of the detrended data is in red as shown on the attached figure. Conditional sequential Gaussian simulation is then performed and 100 realizations were obtained. The variograms of these realizations are in gray in the figure. Why is there an apparent deviation? Whether is there something wrong with my procedure? Thank you.
Relevant answer
Answer
You need to recognize a number of aspects of your problem that you may not be understanding correctly.
1. "Detrending" is a technique applied to a data set, it does not affect nor is it a characteristic of a random function. The empirical variogram is only an unbiased estimator of variogram values if the expected value of the random function is a constant (with respect to the spatial locations). Unfortunately without knowledge of the multivariate probability density function (don't confuse multivariate with multidimensional) you can't actually compute the expected value. As a practical solution to the problem of whether the expected value is constant it is common to fit a trend surface to the data and then possibly use the residuals to compute the empirical variogram. Note that for a particular data set the empirical variogram is not unique, i.e. you must make decisions about the width of the distance classes, the number of distance classes and whether to use an omnidirectional or directional empirical variograms.
2. Conditional sequential Gaussian simulation means you are going to generate a partial realization of a random function using a covariance function (you can't use a variogram directly). The algorithm is supposed to preserve certain properties and also is based on certain assumptions. one of which is that the random function has a multivariate Gaussian distribution and is second order stationary (the existence of a variogram only assumes Intrinsic Stationarity). "Conditional" means that the realization is forced to match the data values at the data locations. "Sequential" means that the algorithm generates a simulated value at one location at a time, in sequence hence you must choose (or the software chooses it for you) a "path". If the same set of simulation locations is chosen in a different order there is no assurance that the set of simulated values is the same.
3. The simulation algorithm does not ensure that the empirical variograms for individual partial realizations will match the empirical variogram for the original data set.
4. Usually the software generates simulated values on a regular grid whereas the original data locations are usually not on a regular grid and there will be far fewer data locations than simulation locations.
5. Any spatial data simulation algorithm depends on the use of a random number generator and in turn the random number generator depends on the use of a "seed". To obtain multiple partial realizations you keep changing the seed. Note that for a given random number algorithm and given seed the sequence of "random numbers" is in fact not really random (see the book "Numerical Recipes").
Observations about the graphs
a. The plot of the empirical variogram for the original data appears to be almost too smooth to be true unless you had a really large data set.
b. I assume that the single plot for the simulated data is an "average" of the empirical variograms for the 100 realizations, is that true or did you average the simulated values at each location to compute a single empirical variogram?
c. In addition to "detrending" the data did you also do a "normal" transform, if so before or after computing the empirical variogram for the detrended data.
d. Almost certainly the empirical variogram for the simulated data is based on a much larger number of "data" locations, you didn't provide enough information about what you did
e. You should list all the choices you made in computing empirical variograms and also the choices you made in using the Sgsim software
f. The difference between the two plots appears to be difference in the implied sills, i.e. a difference in the variances.
You need to understand what the software is doing both for computing empirical variograms and also for the conditional sequential Gaussian simulation. i.e. the underlying algorithms and how the software implements those algorithms.
  • asked a question related to Geostatistical Analysis
Question
6 answers
The formula of Advanced Vegatation Index for Landsat 7 is:
AVI = [(B4 + 1) (256 - B3) (B4 - B3)] ^ 1/3
and for Landsat 8,
AVI = [(B5 + 1) (65536 - B4) (B5 - B4)] ^ 1/3
Now if I transform the DN values into Reflectance values, should the constants (256, 65536) be changed or they remain constant for reflectance as well?
Relevant answer
Answer
I contacted USGS about this. They replied, "The constants represent the number of possible bits in a resolution (8 bit = 256; 16 bit = 65536), so they shouldn't change."
  • asked a question related to Geostatistical Analysis
Question
4 answers
When performing kriging with anisotropy it can be the case that a very high length-to-width-ratio of my sampling points can pretend an anisotropy which is in reality not existent. So my question is if there is a recommended ratio of the longest and shortest sampling axis for kriging to avoid this effect? Or how can I deal with this situation?
Relevant answer
Answer
There is no way of determining that absent some knowledge of the phenomena that generates the data and at least some information on the spatial correlation for the variable of interest.
This is a little like asking what the minimum number of data locations is necessary to apply kriging and again without some preliminary information you can't say. Moreover the problem is different for the variogram estimation/modeling step vs using the variogram for kriging
Note that for a given data set, the total number of data location pairs (used in the sample variogram(s) is fixed, all of which might be used in an omni-directional sample variogram but fewer in each of the directional sample variograms (but not necessarily the same number for different directions).
One of the consequences is that the directional sample variograms may be difficult to interpret (too few pairs) and/or for one direction the sample variogram is moderately good but for one or more other directions it is not good. The spatial pattern of the data locations can also cause effects in the sample variograms as you have noted.
You might find the following interesting
1987, A. Warrick and D.E. Myers, Optimization of Sampling Locations for Variogram Calculations Water Resources Research 23, 496-500
  • asked a question related to Geostatistical Analysis
Question
9 answers
I have a set of disease cases in the polygon form as an attribute of each city. There are some 180 cities (polygons) that 2-5 of them recorded more than 300 cases, about 100 of them contain 0-2 cases and the rest recorded 2-20 disease cases. I'm going to evaluate the possible correlation between illness and some environmental factor such as temperature, precipitation, etc.
However, the distribution of the disease data is severely non-normal and violates many statistical methods' assumptions.
Do you have any suggestion in this case?
Relevant answer
Answer
To assess the correlations in your data set you could use a non-parametric correlation measure like Spearman's rho. Also, if you analyse spatial autocorrelation in your spatial pattern, you should use a Monte Carlo/ramdomization approach to determine you p-values. See e.g. http://pysal.readthedocs.io/en/latest/library/esda/moran.html --> p_rand
Another thing you could use is Poisson regression to determine the strength and direction of influence of your environmental factors on the number of local disease cases. 
Cheers,
Jan
  • asked a question related to Geostatistical Analysis
Question
3 answers
Hello everyone, I'm doing work about solute transportation in soil. There are 40 sample points and in GS+ software, active lag distance is default as half of sample distance. After calculation, only four or five plotted points were showed, but R2 is more than 0.9. Is this analysis reliable? How many plotted points are needed at least?
Relevant answer
Answer
If you going to use the variogram for kriging or simulation remember that the purpose of the variogram is purely utilitarian; you are trying to develop a model that captures the spatial covariance of observed values.  The model you specify for kriging or simulation helps control the smoothness and roughness of the resulting surface.  Therefore you want enough variogram points to indicate a range of spatial dependence,  existence and  magnitude of any very small-scale variability ("nugget effect"), and a value for the sill.  Four points might be enough for some datasets, including yours.  For others, forty might not be enough.  You can play around with different lag sizes but remember that as the number of variogram values increases the number of pairs used to calculated each value decreases and the variogram can become more noisy.
  • asked a question related to Geostatistical Analysis
Question
3 answers
I have used AHP, Frequency Ratio (FR), and Fuzzy Logic (FL) to create land suitability maps in the GIS environment. Do you know other methods?
Relevant answer
Answer
If you create land suitability model so you have face pixel mixing problem for accuracy purposes.
Weighted Overall analysis method for land suitability map & sub- pixel is another very good method for pixel mixing of classes.
  • asked a question related to Geostatistical Analysis
Question
10 answers
1) From 73 statistic levels I interpolated the groundwater table using three methods: topo to raster, IDW and kriging. From these last two I used geostatistical analysis and so errors are calculated. However, I do not know how to validate the "topo to raster" interpolation. In addition, I interpolated the GW flow direction using ArcHydro Groundwater. Is this valid? Is this modeling?
2) I calculated the mix fraction of "X" water source using a 3-mixing model analysis (EMMA). I interpolated the mixing fraction using the same three  methods. Topo to raster (aparentelly) resulted the best fit...still, I do not know how to validate the method (i.e. RMSE). Kriging overstimated a lot of values and so IDW.
Thank in advance for your help.
Relevant answer
Answer
Dear Christian,
The variogram for your data could be very telling.  If the data are very noisy the nugget effect will be large and the nugget constant relative to sill will be large.  Kriging is a type of smoother and will overestimate low values and underestimate high values.  Smoothing increases with the nugget/sill ratio.  Whether overestimation of low values or underestimation of high values is the more obvious depends on the distribution of data.
Sparseness in geographic data distribution makes the situation even worse.
As suggested above, look for outliers in the data.  Alternatively, if your software includes the option of a normal scores transform with conditional simulation you might want to explore that route. 
  • asked a question related to Geostatistical Analysis
Question
3 answers
ku-band Geostationary satellites elevation angle, for internet and TV.
Relevant answer
Answer
no reference ... I performed the calculus by myself.
It's only a problem of spherical geometry, considering the position of the geostationnary satellite (whatever the used band), and the position of the user on ground.
The only impact of the used band (Ku in your case) is the lowest usable elevation.
  • asked a question related to Geostatistical Analysis
Question
2 answers
Hello,
I want to extract this netcdf file 'Sea_sur_temp_tos_Omon_MPI-ESM-MR_historical_r1i1p1_200001-200512.nc'. It has total 8 variables. Those are time, time_bnds, j(cell index along second dimension), i (cell index along first dimension), lat, lat_vertices, lon, lon_vertices, tos( sea surface temperature). While extracting through ArcGIS, in the dimension box it is asking for j and i values instead of lat and lon. I don't understand what should be the value for i and j. 
Kindly help me in this regard. 
Thanks
Relevant answer
Answer
I hope this will help you
  • asked a question related to Geostatistical Analysis
Question
2 answers
-
Relevant answer
Answer
Generally speaking, specifics about the individual chemical cocktails used in a project are considered proprietary, and are usually only available with explicit permission from the data owner (the company doing the job). Some exceptions for example, would be data collected from IHS (usually comes at a cost), or data voluntarily promulgated (e.g. fracfocus.org). If you are looking for general soil/ water data, I would start here:
I would then peruse similar pubs like this one:
 then continue mining data from the subsources therein (i.e. the Acknowledgements). Hope that helps.
  • asked a question related to Geostatistical Analysis
Question
4 answers
I am trying to evaluate impact of an intervention that was implemented in very poor areas (more poor people, undeserved communities). In addition, the location of these areas were such that health services were limited because of various administrative reasons. Thus, the intervention areas had two problems: (1) individuals residing in these areas were mostly poor, illiterate and belonged to undeserved communities; (2) the geographical location of the area was also contributing to their vulnerability (as people with similar profile but living elsewhere (non-intervention areas) had better access to services. I have a cross sectional data about health service utilization from both types of areas at endline. There is no baseline data available for intervention and control. I am willing to do two analyses: (1) intent to treat analysis: Here, I wish to compare the service utilization in "areas" (irrespective of whether the household in intervention area was exposed to the intervention). The aim is to see whether the intervention could bring some change at "area" (village) level. My question is: can I use Propensity Score Analysis for this? (by matching intervention "areas" with control "areas" on aggregated values of covariates obtained from survey and Census?). For example, matching intervention areas with non-intervention areas in terms of % of poor households, % of illiterate population, etc. The second analysis is to examine the treatment effect: Here I am using Propensity score analysis at individual level (comparing those who were exposed in intervention areas with matched unexposed people from non-intervention areas). Is it right way of analysing data for my objective?
Relevant answer
Answer
Thanks a lot, Sebastian. This is very useful. I am working on it and I will update you once I address all these concern, to the extent possible with existing data available.
  • asked a question related to Geostatistical Analysis
Question
5 answers
I have a time series of satellite derived rasters. what is the best approach to define spatial pattern of spatio-temporal variation by means of geostatistical analysis?
I would like to characterize how areas are affected by temporal and spatial variation of a parameter derived by satellite. how i can relate this with another forcing driver ? this one is measured as a numerical vector.
Is the variogram analysis the best? Or the empirical orthogonal function?
 if one of these was feasible how would you develop this with R, or other open source softwares?
The time series raster dataset has more than 50 images with a variable revisit time. the lag time range from 7 to 96 days.
Relevant answer
Answer
Thank you for the replies but I am interested to see if the use of variograms or EOF have been applied by someone to identify spatial patterns and what are their opinion. And what software have been used.
  • asked a question related to Geostatistical Analysis
Question
7 answers
Dear RG members
During reading a paper, I passed the statement "M 6.0 Ranau earthquake dated on June 05, 2015 coupling with intense and prolonged rainfall caused several mass movements such as debris flow, deep-seated and shallow landslides in Mesilou, Sabah" given in the paper and forced to ask a question like that.
regards
IJAZ
Relevant answer
Answer
Dear Ijaz,
there is nothing unusual in the text you cited in your question.
Obviously, an earthquake cannot cause rainfalls. Volcanic eruptions can cause it (due to dusts immision in the high troposphere), but there is not a relationship between earthquake (lithosphere) and rainfall (atmosphere).
However, what the text you cited says is clear and is talking about the relationship between earthquake and mass movements (landslides, rockfalls, debris-flow).
Prolonged rainfall events are key factors for ignite mass movements such as debris-flow, rotational and sheet-on-sheet landslides. This is because in a region where hydrogeological instability exists (limestone above evaporites, schists or swelling clay, for example), water can flow within rock layers, eroding and putting excessive weight on slope terrains.
Thus, just after an earthquake, seismically-induced landslides are (unfortunatly) a common related-hazard. In fact, just after earthquakes (while aftershoks still occur), the geologists must work with fireworks or army (emergency services) to monitor mass movements in the targeted area.
If prolonged rainfalls event follow or coincide with an earthquake, now it is clear that water is an augmenting factor for such mass movement events.
Best regards
Nic
  • asked a question related to Geostatistical Analysis
Question
4 answers
Interested to know about flood return period. 
Relevant answer
Answer
It is an interesting question, and I am not an expert on earthquake risk.  Earthquakes tend to be dependent on the local geology, stratigraphy, plate tectonics.  The location of storm severity is less dependent on surface or subsurface features, more dependent on jet stream and air circulation patterns interacting with as they move over surface features.  Flood recurrence intervals are based primarily on storm data collected or historic records that can be retrieved, and to some degree, this may be the same as for earthquakes.  Since I am not an earthquake expert, I would hesitate to say that they are directly comparable, even though with enough data, the tools to plot flood and earthquake frequencies and severities for a location may be the same or similar.  The time series for floods seems like it is apt to be more frequent and widespread within a climactic zone, while earthquakes frequency and severity tend to be more localized within defined zones, less frequent for most areas, but also with connected aftershocks for a period of time that are related to the initial severe earthquake event.  As long as you understand the differences, the basic methods to assess risk exist, but it should be noted about the average comment made is often applied to data, so 50% of data could be above, 50% below the trend lines.  Even though the lines may appear impressive on plots, the confidence limits help to assess the uncertainty, and should be greatly respected if severity is related to life and property damage.
  • asked a question related to Geostatistical Analysis
Question
4 answers
I am using krige.cv() for cross validation of a data set with 1394 locations. In my code, empirical variogram estimation, model fitting, and kriging (by krige()) everything works fine. But when I use krige.cv() I get the following error.
Error: dimensions do not match: locations 2786 and data 1394
One can notice that 1394*2 = 2786. What could I be missing? Please note that there are no NA or NaN or missing values in the data, variogram or kringing results. Everything works fine, and it's just krige.cv() that does not work.
Relevant answer
Answer
Dear Asad Ali,
Your question is not very correct because you didn't mention the software used in your study. I know several geostatistical programs with different organized type of imputing the data and outputting of the results. The first reaction to your question is that some technical problem is coming from the software itself, or from your data file structure. Something is not read correctly.     
  • asked a question related to Geostatistical Analysis
Question
3 answers
I am caught in a strange situation. I have been doing some kriging using gstat package, but the data is exhibiting a lot of bad signs. It has a second order strong trend surface, with almost all the coefficients declared significant with three asterisks *** in lm(). Though R^2 is small, but the empirical variograms are clearly different in two cases i.e. variogram(attribute~1) and variogram(attribute~SO(x, y)) have different sills. Furthermore, the directional variograms also show both zonal and geometric anisotropies. When I try to fit a variogram model, I see a big change between the fitted ranges and sills of the simple (no anisotropy assumed) and anisotropy corrected variogram models. How do I deal with this analysis?    
Relevant answer
Answer
First of all there is a question of terminology
1. "Trend surface" pertains to the data but ultimately it is not the data but rather the expected value of the random function (that presumably generated the data). Unfortunately without knowing the multivariate probability distribution there is no way to actually compute the expected value hence the Trend surface is an ad-hoc solution. Note that any statistical tests in lm() are based on specific statistical assumptions which actually contradict the assumptions implicit in the use of kriging.
2. Geometric anisotropy and zonal anisotropy are characteristics of the random function and its variogram (the model, not the sample variogram)
Having said all that there is the practical matter of what to do. First I would suggest making a plot of the data locations with each location coded by the data value as a quick way of checking to see if there is a "trend". The "trend" may have a directional effect and this could lead to thinking there is an anisotropy. Next look at a histogram of the data. Third, ask your self what you know about the phenomenon that generated the data, e.g. is it reasonable to expect a non-constant expected value and/or a directiional effect.
lm( ) should be able to compute the residuals for both a first order polynomial trend and a second order trend. Generate the coded plots for both sets of residuals, how do they compare with each other and with the coded plot of the original data. Compute the histograms for both sets of residuals, again compare with each other and with histogram of original data.
Compute and plot the sample variograms for the original data, for the residuals from first order polynomial and for second order polynomial.
If you can fit reasonable variogram models to any or all of these data sets (original, first order residuals, second order residuals), use the model with cross-validation and the associated data sets.
For each of the three data sets the cross validation should produce four values at each data location (the actual data or residual, the kriged value, the "error" and the kriging std deviation. Since the kriging estimator is unbiased, the "mean error" should be zero so compare with average error in each of the three cases, the mean square normalized error (normalized by the kriging std deviation) should be one so compare with empirical value, Look at the fraction of nomalized errors exceeding 2.5, theoreticalkly this should be less than 0.5. Make a bi-variate plot for  "data value  vs kriged value" the correlation should be high, make a bi-variate plot of kriged values vs error, these should be uncorrelated.
Did you see evidence of anisotropy in the directional sample variograms for the first and/or second order residuals?
If the cross validation results look acceptable for either the first order residuals or the second order residuals then you can proceed two ways
A. Use Ordinary kriging with the residuals data set and add to those the value of the trend surface at each location. You won't get a kriging variance this way (you can't used the estimated variance from lm( ))
B. Use Universal kriging with the original data set, the order of the trend surface but not the coefficients in the trend surface. This way you will get a kriging variance at each interpolation location.
Unless you have an explanation related to the phenomenon to justify using an anisotropy I suggest not doing it, in  particular no zonal. gstat does not include a zonal anisotropy. As for the geometric there is always the problem of an insufficient number of pairs for the plotted points in the directional sample variograms (the total number of pairs if fixed by the size of the data set and hence the pairs are split up for the different directions.
Note that the cross validation results as well as any final kriging results may be sensitive to choices such as (1) whether you use a circular or elliptial seach neighborhood (2) the minimum and maximum number of data locations used in the search neighborhood as well as the variogram model type (exponential, gaussian, spherical, etc) but not as sensitive to the variogram parameter values
  • asked a question related to Geostatistical Analysis
Question
7 answers
Is it valid to transform ilrs using normal score, Box-Cox or other transformations in order to perform Sequential Gaussian Simulation (SGS)?
Is it true to use the combination of isometric logratios (ilrs) with Direct Sequential Simulation without any transformation? 
Thanks in advance.
Relevant answer
Answer
ilr(s) is a transformation. In my point of view it is true to do it without any further transformation.
  • asked a question related to Geostatistical Analysis
Question
4 answers
Steps to WGS1984 ASCii file project to Kertau_RSO_Malaya_Meters ASCii file. Been trying but still not projected well.I cannot open in FMP stated that this file has no projection. Seeking an answer from all experts and professors. Thank you and appreciate your feedback.
I have also attached my original file for reference.
Relevant answer
Answer
Thanks to all and I really appreciate for your advice and answer. =)
  • asked a question related to Geostatistical Analysis
Question
4 answers
Dear all
Thanks in advance for comments, answers, papers etc.
Regards
Ijaz
Relevant answer
Answer
Excellent discussion. Two important ways landslides can be incorporated into  studies of tectonic deformation:
  1.  Strong ground shaking due to large earthquakes can cause landslides over a very large area (100's of km2), and age-dating is the critical piece of the puzzle, as mentioned by William Hansen, above. If there is a known active fault (or faults) in the region, the timing of large seismic events as determined by paleoseismic methods can be compared to the timing of the landslides, and if the data allow for them to occur at the same time an earthquake is a good candidate for the trigger.
  2. Landslides of all sizes can also be directly triggered by uplift associated with fault movement.  For example movement on a daylighting thrust fault can produce very large gravitational failures of the overlying hangingwall at the range front, especially where the fault decreases in dip in the shallow section (upper <1-2 km) which is very common. They also occur over the forelimb of folds formed over blind thrust faults where fault movement steepens the forelimb and oversteepens the overlying terrain.  I observed and mapped both types of earthquake-related landslides caused by the 1999 M7.6 Chi Chi earthquake in Taiwan. 
  • asked a question related to Geostatistical Analysis
Question
3 answers
Hello,
It will be really helpful, if someone can suggest some sources where DMSP-OLS data with thermal band is available.
Relevant answer
Answer
Hi Sayan, 
Hope you are doing good. 
Concerning your question, as Vijith Sir suggested, NOAA seems to be the most appropriate link to retrieve TIR data for DMSP. Additionally, you may also want to check out the attached link.
  • asked a question related to Geostatistical Analysis
Question
7 answers
I have soil contaminant samples collected at different depths/layers. I generated a contaminant surface at each depth/layer using ArcGIS krigging tool. However, I need to have a vertical feel of this contaminant across layers and which I think I can achieve by interpolating the values across the different layers. As far as I know ArcGIS can't do this. So, I will be happy to know any freeware I can use to achieve this. Many thanks.
Relevant answer
Answer
I would use GRASS GIS module v.vol.rst which interpolates vector point data to a 3D raster map using regularized spline with tension algorithm (RST).
Here is a small example in command line syntax (you can do the same in Python, GUI or modeler):
v.import input=points.shp output=points
g.region vector=points res=0.5 t=5 b=-15 tbres=0.5
v.vol.rst input=points wcolumn=values_column tension=20 smooth=0.6 segmax=400 dmin=0.5 zscale=100 elevation=result
r3.to.rast input=result output=result_slice
where:
points.shp is name or full path of a vector file you want to import (uses GDAL/OGR)
points is the name of the vector map in GRASS GIS
values_column is name of a attribute table column with values to interpolate
result is the resulting 3D raster map
result_slice is a beginning of a name for 2D rasters created from horizontal layers of 3D raster
v.import imports the data into GRASS GIS database
g.region set the computation region extent and resolution (2D and 3D) for subsequent raster calculations
v.vol.rst does the 3D interpolation
r3.to.rast does horizontal slices of the 3D raster and creates a series of 2D rasters
The result is a 3D raster which you can further analyze or visualize in GRASS GIS or you can slice it horizontally into a series of 2D rasters and analyze and visualize them (or export them using r.out.gdal).
  • asked a question related to Geostatistical Analysis
Question
10 answers
I have 5 raster layers depicting different temperature levels across a given geographic space. I need to use a common/same ‘Stretched” color ramp to show how this phenomenon varies across space, where a given color in the color scale in each of the raster layers represent same value in all the layers. Kindly see attached sample.  I need it in a “stretched” style.I use ArcGIS 10.3.1
I have tried a couple of things which haven’t worked yet.
I made a dummy raster with values equivalent to the min-max of the 5 rasters. The lowest value of all the raster is 5 and the max is 74. So, I created a dummy raster with min value of  5 and max as 74. Layer properties-Symbology-I then symbolized the dummy with a color ramp of choice, choose -minimum-maximun- under “Stretch” and choose -From Custom Setting (below)- under  setting under “Statistics”. Save the layer as layer file. lyr
The problem is that when I import the symbology or apply the layer file, all the raster retain same symbology and shows same min-max values (5 and 74). I need them to show their real values-such as 34 to 58 and the colors should reflect that range in the common color scale/symbology.
How do I go about this? I need a quick way out. I am not experienced in Python or other programming languages, but with detailed steps I can also try if that’s the only way out.
Relevant answer
Answer
Did you try saving a layer file of the ideal raster symbology? After you create that .lyr file you can import it to the other rasters you want to match. 
  • asked a question related to Geostatistical Analysis
Question
11 answers
AIC and BIC are Information criteria methods used to assess model fit while penalizing the number of estimated parameters. As I understand, when performing model selection, the one with the lowest AIC or BIC is preferred. In a situation am working on, a model with the lowest AIC is not necessarily the one with the lowest BIC. Is there any reason to prefer one over the other? 
Relevant answer
Answer
As Geoffrey pointed out, the BIC penalizes more heavily for complex models.  So the context certainly matters here.
  • asked a question related to Geostatistical Analysis
Question
8 answers
Hello every body
I used Arc Map Geostatistical  analyst to interpolate a surface for heavy metals. Before Interpolation I divide my data set to train and test. By training data I interpolate a surface among dataset and now I want to determine residuals for test data to control the precision of interpolation. How can I determine value in an unknown position?
Relevant answer
Answer
A few observations:
1. When you use kriging you are relying on a model for the random function (the data is a non-random sample from one realization of the random function). and that the random function satisfies several crucial assumptions such as intrinsic stationarity and that the data can be used to estimate/model the variogram. There are no statistical tests to determine whether these assumptions are valid..
2. The kriging estimator is "exact" (sometimes also called "perfect"). This means that if you generate an interpolated value at a data location and use the data value in the interpolation the interpolated value will be the data value (the residual is always zero).
3. If you split the data locations into a training set and an interpolation set you are assuming that both are samples from the same realization of the same random function. The method of choosing the two data subsets does not ensure this assumption is satisfied. Hence this technique does not really quantify the accuracy/precision of the interpolation.
4. Cross-valation (leave one out) is really a method to compare one variogram model against another and/or one choice of the variogram parameters against another or possibly to see the effect of the search neighborhood parameters on the interpolated values. It can also be useful for identifying "unusual" data values
5. As an exploratory technique, splitting the data locations into two subsets may be useful but there is no real theory to back it up. A different split might produce very different results and there is no way to determine which set of results is more relevant.
As for cross validation there are multiple statistics that can be computed but no single one is more important or reliable. For a discussion of these see the following
1991, Myers,D.E., On Variogram Estimation in Proceedings of the First Inter. Conf. Stat. Comp., Cesme, Turkey, 30 Mar.-2 April 1987, Vol II, American Sciences Press, 261-281
  • asked a question related to Geostatistical Analysis
Question
3 answers
Hi everyone, I have a point data set with 197 different coordinates. I am selecting 25% to be used for training. When I run maxent with certain environmental layers, it is only using 8 presence records used for training, 2 for testing. Any ideas why this is?
Relevant answer
Answer
Thanks for the suggestions guys. I discovered it was a TWI layer I produced that was causing the problem. It was in a floating point pixel format which I have changed to signed 32 bit, the same as my other layers, and this seems to have done the trick.
  • asked a question related to Geostatistical Analysis
Question
8 answers
I want to compare between blending data using Regression Kriging and Bayesia Kriging. What is the advantage of Bayesian Kriging compare to Regression Kriging? Anyone has a recommended link/journal for learning Bayesian Kriging? Many thanks in advance
Relevant answer
Answer
You should also consider cokriging. Exactly what do you mean by "more powerful"? I.e., how would you quantify "power"?
Regression kriging usually means combining in some way regression and kriging. In particular where the expected value of the random function is nonconstant, e.g. it is a polynomial function of the position coordinates or is function (usually a low degree polynomical) of another spatially distributed variable, e.g. elevation in the case of precipitation.
Precipitation data is nearly always "point" data whereas satellite data is non point data  (there is a pixel size) so you must compensate for the difference in data support. Regression doesn't do that and in general Bayesian modeling doesn't either. I suggest that you need to look more at the literature on combining rain gage data with Doppler rain data, this is more analogous to your problem. You will see that cokriging has most often been used
  • asked a question related to Geostatistical Analysis
Question
5 answers
I'm modeling labor productivity on farm sites spread widely across the US, and I would like to include NDVI (or another vegetation index) as a predictor variable. I'm wondering if there is month during the growing season that makes the most sense for comparing across disparate climate types. Thank you for any suggestions! 
Relevant answer
Answer
Hi Rafter,
One tool to detect onset and other parameters of vegetation seasonality, as suggested by Wietske, is TimeSat: http://web.nateko.lu.se/timesat/timesat.asp
As in your study you are using isolated farm sites it is probably better and more simpler to use an available map of vegetation phenology (e.g. http://onlinelibrary.wiley.com/doi/10.1029/2006JG000217/epdf) to choose the month when NDVI should be compared (for example the one corresponding to green peak), instead of collecting the NDVI data and running TimeSat or other software.
All the best,
João
  • asked a question related to Geostatistical Analysis
Question
19 answers
I am modeling undrained behavior of clays/drained behavior of sands under various static loading cases. I understand that the geostatic step is utilized in general analysis of soils. I was wondering if one can replace this with static, general case under gravity loading. I am not interested in calculating pore pressure and the soil is homogeneous in nature.
Relevant answer
Answer
Elaheh, first let me try to describe about specifying the initial stresses. The predefined stresses option allows you to define the stress state of any object at the initial time step. You can use it in one of two methods namely the initial stress option or the geostatic stress options. Using the initial stress you can specify the 3 principal and 3 shear stresses. For most common geotechnical problems we do not need to use this option and we stick to geostatic stresses. ABAQUS requires the specification of vertical stresses at 2 points which it would then use for calculating the stresses at all nodes in between them by linear interpolation. For example assume a column of soil 1m high below the ground level then the vertical stress at the top of the soil is 0 for vertical coordinate 0. The stress at vertical coordinate 1m is equal to - density x gravity x height (here 1m), here the sign convention for compressive stresses need to be negative.  The K0 value is specified using the lateral coefficient 1, if the horizontal stress is asymmetric then you specify the other value in lateral coefficient 2. As far as modeling your wellbore problem, I am not sure on the nature of the simulation. If you can give a few more details, we can look at solving it. I hope this helps!! 
  • asked a question related to Geostatistical Analysis
Question
8 answers
Hello all.
I have a data set with coordinates from locations that some fishes have been collected.
I want to be able to estimate the distances between each of those locations, but following the river path. I know how to estimate euclidian distances, but they are estimated as lines connecting each location, ignoring the river. 
Is there any program, R function or QGIS plugin that could calculate such distances? I have a shape file for the river basin in question, that I can transform to lines file in QGIS, but I can't get the estimates of distance of these locations along the river.
Thanks,
JP
Relevant answer
Answer
This is known as Linear referencing [1] . In ArcGIS, you can use the linear tools - please refer to tutorial here [2]. For QGIS, you may like to have a look at this plugin [3] from Faunalia, working on a PostGis base (the plugin expects that all data for processing is available in PostGIS database). 
Main steps in Arcgis to understand the workflow  :
  • First you create a "route" (=your river and a PolylineM object). it must be clean, oriented in the right direction and not in multi-part to work properly. You must have a unique id by river part if necessary.
  • With your route feature created with Create Routes,you "Locate features along routes" wich will ultimately give you a kilometric value from start to end of the river for each of your points (fishes). With this kilometric value,  you can then calculate the distance between them, following the river.
Hope this will help.
______
[1] Wikipedia : "Linear referencing (also called linear reference system or linear referencing system or LRS), is a method of spatial referencing, in which the locations of features are described in terms of measurements along a linear element, from a defined starting point, for example a milestone along a road. Each feature is located by either a point (e.g. a signpost) or a line (e.g. a no-passing zone). The system is designed so that if a segment of a route is changed, only those milepoints on the changed segment need to be updated. Linear referencing is suitable for management of data related to linear features like roads, railways, oil and gas transmission pipelines, power and data transmission lines, and rivers."
  • asked a question related to Geostatistical Analysis
Question
12 answers
I plotted a variogram for a dataset which contain 83 observation sites. The maximum distance between the observation points is 60 km but i got the range of the variogram is 261 km. What does it mean?
[I thought that range is 261 km means kriging can predict well in the radious of 261 km.probably my concept is wrong. ]
Relevant answer
Answer
Jennifer is right, if your software permits it look at the numbers of pairs used to compute the sample variogram for different lag distances and you will see that the numbers begin to decrease for lag distances that exceed about half the maximum distance between data locations. You may want to consider whether there are other hidden problems in the software if it produces  clearly wrong results like this. It is certainly possible that the range is actually greater than the maximum distance but you can't determine that from the sample variogram. There are multiple possible algorithms for auto fitting of the variogram parameters but no one choice is best possible for all data sets. Also if your software permits it, look at the kriging weights. In gereral you don't want any negative weights, those would indicate too oarge a search neighborhood and/or too large a range
  • asked a question related to Geostatistical Analysis
Question
8 answers
i want to know if there is an equation to compute the miminum sample to take in the field.
Relevant answer
Answer
There is no simple answer because a lot depends on the specific geographic region you are studying. It also depends on the spatial pattern of the data locations. If you have not already consulted them see papers in the European J.  of Soil Science (formerly the J. of Soil Science). and the Soil Science Society of America Journal.
As suggested above, if you really don't know anything about the spatial distribution of the particular soil parameters of interest, you are to have to do some preliminary data collection and use those results to guide you in further sampling. Also as suggested above data collection for estimating and modeling the variogram is not the same as data collection for subsequent kriging. The cost of sampling will always have some impact, if there were no cost or difficulty in getting data then get a couple of thousand but that is usually not the situation
Once you have some data then you want to do various exploratory statistics (average, std deviation, histogram, coded plot of the data locations (coded by the data values) Do you have any information on the soil type(s)? Are they the same over the entire region of interest or different?. If you are using an auger to collect soil samples for lab analysis, note the size of the auger (depth, diameter)
1987, A. Warrick and D.E. Myers, Optimization of Sampling Locations for Variogram Calculations Water Resources Research 23, 496-500
1990, R. Zhang, A. Warrick and D.E. Myers, Variance as a function of sample support size Mathematical Geology 22, 107-121
R. Webster and some of his students had a series of four papers in the J. of Soil Science , circa 1980
  • asked a question related to Geostatistical Analysis
Question
3 answers
HI im looking for Aphonopelma chalcodes distribution data for use in a gis, i need the data in geotiff or shape format.
Thanks for your help.
Relevant answer
Answer
The recent revision of the genus Aphonopelma will give you most of the data you need, certainly the best data available before you seek to add records and distributional data for any of the species..
  • asked a question related to Geostatistical Analysis
Question
4 answers
Can anyone please tell me the use of Normalized Rank in Analytical Hierarchy Process used in ArcGIS? Is it used to scale all the maps into one as it is being taken the value from 0 to 1 for all the features.
Relevant answer
Answer
Yes, the scale will be the same for all the features, as in the second to last slide of the presentation I attached to my previous answer. If you consider the fields in the maps as consisting of pixels of the same value in the initial three maps, which were View, Slope and Price. They are standardised, to change the rankings into probabilities, and multiplied by the weight calculated in the previous slide, to effectively change each map into a weighted factor in the final decision. When they are added together to form the decision map, the units are still probabilities of the suitability of that field, because the weights add up to one.
What was done with discrete units in the slide 6, is now done with fields of pixels on different feature maps on slide 7. For your laptop example, you will have to use the approach in slide 6.
There your factors will be Price, Design, RAM, Colour, Screen, Weight, Webcam, Hard disk and Size. So you have to decide which of the factors count the most, the second most, ..., the eighth most and the least, effectively ranking the factors from 1 to 9. The factor ranks are then standardised to have values between zero and one, where 1 is the highest and 0 the lowest and normalised by dividing each by the sum of all the ranks. These probabilities should add up to one and they are the numbers in blue under the attributes in slide 6.
Each factor must then be rated against each laptop producer in the same way and these probabilities, which add up to one for each factor over the three products, become the numbers in blue on the lines in the last section of slide 6.
The final probabilities are calculated in the way illustrated by the calculation on the far right of slide 6.
  • asked a question related to Geostatistical Analysis
Question
7 answers
the nugget effect is the value of the variogram when lag distance is equal to zero, and according to litterature, it's dependant on measurement error, i don't if any one can help me to calirify this point.
thank you Louadj yacine
Relevant answer
Answer
A large nugget effect relative to total variability can be caused by sampling that is too sparse with respect to spatial variability, or because of measurement error.  In either case it is small-scale variability. 
Generally you fit a nugget effect by eye from the variogram.  The first few points--two or more--are the most instructive.  However, you can also use your experience as a guide in fitting the nugget effect. 
If you are doing variography as a prelude to kriging or simulation the nugget effect  affects the degree of smoothing in the results. 
  • asked a question related to Geostatistical Analysis
Question
4 answers
Hi everyone
I want to add coordinates of several points as post map in Surfer. The problem is that after importing data they are not in a straight line, they are shown in a zigzag formation. How can I fix the problem? The points are the coordinates of ship movement, so they should be shown in straight line.
Thanks
Relevant answer
Answer
Thank you, Christian Günther. It's very good.
  • asked a question related to Geostatistical Analysis
Question
3 answers
Dear collegues,
I do geostatic procedure for modeling of tunnel in clay-soil.
ABAQUS is used.
The model of soil - Cam-Clay. 
Initial stresses in the soil are given. 
Material data is fully specified. 
Nevertheless - the program displays an error message:
The sum of initial pressue and elastic tensile strength for the porous elastic material in element 1 instance ground1-1 must be positive.
Relevant answer
Answer
It seems that you use the *INITIAL CONDITIONS,TYPE=PORE PRESSURE in your model. You have to reduce the initial pore pressure imposed before the start of the analysis. If its value looks rational, though, check the consistency of your units.
George
  • asked a question related to Geostatistical Analysis
Question
5 answers
I downloaded MODIS level 2 ocean colour images and displayed them in Seadas version 7.2. My area of interest in Lake Victoria. However I am facing a problem with the areas that are affected by cloud cover hence causing no data to be available for those parts. Is there a way I can extrapolate using the available data in order to have data in those areas affected by cloud cover. Also is there any other software I can use to perform that function?
Relevant answer
Answer
There are several options depending on what you want to do and the spatial and temporal size of the gaps in data. I've seen that for small gaps ppl only uses spatial filters or linear interpolation, for example. You could also empirically set something that interpolates both in space and time. 
In my case, over the argentinean sea (using MODIS L3, 8-day composites, 4.6 km), I had huge regions with no data for 3 months every year. I used HANTS (Harmonic ANalysis of Time Series, available as an Add-on in GRASS GIS 7) and DINEOF (available through the package sinkr in R). But for my case, DINEOF worked much better :) 
  • asked a question related to Geostatistical Analysis
Question
4 answers
I try to find any reference to such question:
Suppose, we have 4 arrays (with same zonds on each).
First and second are hybridized with sample 1
Third and fourth are hybridized with sample 2
Then we want to compare signal from zonds between samples.
As I understood, we must carry out RMA procedure for each array to correct background and then construct empirical signal distribution from ALL 4 ARRYAS and make quantile normaliztion. So, our input matrix for QN will consist of 4 columns.
But collegues say, that we must make QN independetly for
first and second and third and fourth
Who is right?
Thanks in advance.
Relevant answer
Answer
Do you have any house keeping genes which you can use to normalize for total RNA quantity? The problem with thematic arrays: you have 2 sources of variation (a) technical due to difference in extracted/amplified RNA quantity and (b) due to the process you target. It is not clear how to distinguish between them without additional information. 
I would try something less perturbing the data as well, for example - simple linear centring/scaling. Have a look on distributions of original data and normalized (of course in log scale) and identify the problems you would like to remove by normalization.
  • asked a question related to Geostatistical Analysis
Question
3 answers
When i kept substrate width and copper width same and applied master slave boundary condition, during simulation it shows 4 topological error. Since i am reproducing one published article i am facing this problem. How to solve this issue. I am attaching the HFSS 14.0 file. Thanks in advance to all reply.
Relevant answer
Answer
Thanks sir (Mesrar Haytam) for your reply.
  • asked a question related to Geostatistical Analysis
Question
2 answers
Hello fellow researcher. I am currently working on a research project on Cokriging. anybody who can assist me construction of crossvariograms in a non collocated setup.
Relevant answer
Answer
See my  1982 paper in Math Geology (available on Researchgate for downloading). There are two questions, one pertains to the experimental cross variogram and the other to a model cross variogram. Unlike a variogram where the are many known valid models (variograms must be conditionally negative definite) there are no known necessary and sufficient conditions for a cross variogram, in particular cross variograms do not have to look like variograms although they do have to satisfy the Cauchy Schwartz Inequality, i.e. the absolute value of the cross variogram must be less than or equal to the square root of the product of the associated variograms
If you find any software purporting to use general cokriging it is likely they have used a Linear Coregionalization Model (LCM) , e.g see the R packiage gstat.
Note that there are two possibilities for a cross variogram, symmetric (the most commonly used) and  a non-symmetric. The LCM is based on a symmetric model. Alternatively see my 1990 paper in Math Geology on Pseudo cross variograms (available for download on Researchgate).
Cokriging is often used because there is less data for one variable than for another, i.e. more data locations where there are values for one variable than for the other. The extreme case is when there are no data locations where there is data for both variables, the symmetric cross variogram is difficult to use in that case.
Collocated cokriging is just a special case of the general form of cokriging except that the cross variogram only has to be known for lag zero, i.e. a model for the cross variogram is not necessary.
  • asked a question related to Geostatistical Analysis
Question
7 answers
In landslide susceptibility analyses, we are using geological and topographical parameters. How to identify the multicolinearity among these factors to know which factor is best and which is least.  
Relevant answer
Answer
Hi Ramesh,
I believe you are using the Frequency ratio model for the susceptibility mapping. In that case you need to run the omission and commission analysis to detect which variable is more capable of predicting the landslide susceptibility.
You need to make the LSZ map with different combination and validate each LSZ map to decide which combination is best.
The frequency ratio itself shows the strength of each feature classes in making the terrain susceptible to landslides.
Another important method is to use MLR (Mutliple Logistic Regression) for the analysis, which return the real role of each parameters in the analysis ( this is suggestive in most cases to detect the multicollinearity among the variables.
Hope this will give an idea about the process.
Good luck.
Vijith
  • asked a question related to Geostatistical Analysis
Question
6 answers
I mean that is possible to count the pseudo-distance between points, however it doesn't have strictly geographical interpretation?
Relevant answer
Answer
Hi Zasina,
Kriging is widely used in Engineering, for example in Aerospace design.
In this case the Kriging estimator is also known as a 'meta-model', 'surrogate model', or 'response surface'.
The pseudo distance, as you indicate, is not the geographical distance, but the 'distance' between various designs. So for example two wings might be 'close', that is similar in shape, and we might therefore expect similar aerodynamical performance. Wings that are 'more distant', that is very different in shape, can be expected to have very different aerodynamical performance.
If you want I could up some references?
Cheers,
Jouke
  • asked a question related to Geostatistical Analysis
Question
3 answers
I have data representing Carbon stocks of my study area. The study area is further subdivided in many smaller areas. For these areas the Carbon stock, NDVI and Area (sq.km) is measured from years 2004 - 2013.
It is believed more the area more is the avg Carbon stock. However this is not true
The more the area and avg. NDVI, more is the avg. Carbon.