Science topic

Species Distribution Modeling - Science topic

Explore the latest questions and answers in Species Distribution Modeling, and find Species Distribution Modeling experts.
Questions related to Species Distribution Modeling
  • asked a question related to Species Distribution Modeling
Question
4 answers
Please suggest me some good articles too for references.
Relevant answer
Answer
Yes, we can use SDMs to predict Fauna and Flora's habitats modeling.
We need environmental data (variable) based on species that we want to predict suitable habitats; Presence/Non-presence data, Dem, Landuse, etc.
  • asked a question related to Species Distribution Modeling
Question
5 answers
Hello Experts,
I am trying to run an SDM for present and future conditions using Maxent, but the picture of the model appears to be not identical as well as losing a block of pixels near the top of the study area. Note that I had set my study area shape file in 'processing extend' and 'raster analysis' sections while extracting the layers. All the variables were processed at the same extent and cell size for maxent analysis.
Here I attached my study area, present, and future distribution for your convenience.
What could be the possible solution?
Thank you.
Relevant answer
Answer
Dear Mokshedur,
After preparing the data to input the MaxEnt Software, You should again cut your data based on the study area's border (Extract by Mask).
  • asked a question related to Species Distribution Modeling
Question
4 answers
I am familiar with 'sdm' package to construct species distribution model (SDM). Now, I am an facing issue.
I used predict() of 'dismo' to predict the distribution of species few weeks ago and it ran smoothly without consuming much times. It hardly took 3-5 mins but now it is running since morning (8 hrs+) for same data. Yet I am waiting for the result. I have to prepare SDMs for 10 different specues. If it takes too much time to predict a single model then I have to wait for many days .....which is annoying.
How can I fix it ? Can anyone shares his/her thoughts in this regard. Or is there alternative to save time.
Thanks
PS: PC configuration: 8 GB RAM, AMD Processor
Relevant answer
Answer
You're welcome.
  • asked a question related to Species Distribution Modeling
Question
1 answer
I want to download BioClim data containing the Generic grid format (bill) to use in species distribution modeling
Relevant answer
Answer
Hi Fedhasa, did you manage to do it? I am in the same problem
  • asked a question related to Species Distribution Modeling
Question
3 answers
I'm working with a multiband raster and I want to extract each band to a single raster band. I tried two approaches using R (raster) and QGIS (gdal translate).
I noticed that the output file from QGIS is around 25MB while the output file from R is around 2MB. The original multiband raster is around 490MB with 19 bands. This led me to thinking that the QGIS output is more reasonable to use. Note that I will use the bands for SDM.
Is the R output still useable for this purpose? Can you also explain the difference in file sizes?
Relevant answer
Answer
its happen maybe - When creating new images or exporting existing images the bit-depth may change. This results in a change in the amount of disk space the new image requires. This happens for a number of reasons, discussed in articles in the Related Information, below. The reason is due to the amount of bits required to store each individual pixel (cell) in a raster image. When working with 8 bit images, 1 byte (8 bits) is required to store each pixel in the image. When working with 16 bit images, 2 bytes are required, and with 32 bit images, 4 bytes are required and so on. An easy way to determine the approximate size of the image is to use the formula below: Rows x Columns x number of bands x pixel depth (8 bits = 1 byte) For example: 8-bit image: 100 rows x 100 columns x 3 bands x 1 = an output raster that is approximately 30,000 bytes in size. 32-bit image: 1000 rows x 1000 columns x 1 band x 4 = an output raster that is approximately 4,000,000 bytes. (esrisupport)
plz check the "bit" on your both output file
  • asked a question related to Species Distribution Modeling
Question
3 answers
I have no idea because I'm not sure what kind of corellatin it is (i.e pearson or anything else)
Thankyou
Relevant answer
Answer
The correlation matrix only shows the correlation between every two variables, i.e. it is a pair-based comparison. The best way for checking multicollinearity is variance inflation factor (VIF), which provides a dependence index regarding to all other explanatory variables. The R package, car, has a function vif() can do the job.
  • asked a question related to Species Distribution Modeling
Question
5 answers
I am currently using Hmsc package to apply JSDMs to freshwater fish and mussel communities. Most of all, I want to study the correlations between the species. Is it correct to identify a correlation as positive or negative with at least 85% posterior probability? Or should I increase this to 90/95%?
Relevant answer
Answer
What you mention as the 'posterior probability' comes from the confidence level of the credible interval (the bayesian counterpart of the confidence interval, I won't dig into it but see e.g. https://towardsdatascience.com/statistics-101-credible-vs-confidence-interval-af7b7e8fdd79) of the correlation term. Depending on the confidence level you set, you obtain a given credible interval, which can either overlaps zero or not. If it overlaps zero, than the parameter is not statistically significant at that confidence level (as it is your case at 95%). Of course, when you choose a less conservative confidence level (e.g. 85% in your case), then the credible intervals are narrower, and you will find more significant correlations.
As such, there's no 'correct' or 'incorrect' way of doing, but just a more or less conservative confidence level that you accept.
Concerning the ecological interpretation of your results, see as suggested above.
  • asked a question related to Species Distribution Modeling
Question
3 answers
I am trying to find out the earthworms species diversity in different state if India. Can anyone have any data or idea regarding earthworm species diversity in India?
Relevant answer
Answer
I think Kerala state of India has highest diversity of earthworms.
  • asked a question related to Species Distribution Modeling
Question
4 answers
Hello Experts,
Since our targeted species is found only in the 2 km region of the study site, we are planning to use 30 m spatial resolution climate data on our Species Distribution Model. But the problem is that my local weather station is capable of providing 20 km resolution data. On the other hand, if I use WorldClim data that is also 1 km.
My questions are
1. Can I use these downscaled data (from 1 km or 20 km) on my local study on SDM, which will be on 30 m resolution?
2. If I downscale, will be there any variational changes on climate data? Is it acceptable to do so?
Please note that I'm new to this field.
Thank you for your valuable time.
  • asked a question related to Species Distribution Modeling
Question
5 answers
Hello Experts,
My study site is relatively small and the targeted species is found as continuous patches. Do I need to consider Patch size/area in the MaxEnt model?
Does patch size have any meaningful measurable values that can be included in the MaxEnt model?
Thank you.
Relevant answer
Answer
The patch size can not be measured and given a value in the MaxEnt model. However, we can improve the representativeness of the sample sites within the MaxEnt algorithm.
  • asked a question related to Species Distribution Modeling
Question
4 answers
Currently, on CHELSA there are 5 models of future projection available. How to choose the best 3 of them? Are there any parameters that should be prefered when performing SDM on MAXENT (bioclim data)?
There are two parameters that I think may affect the reliability of MAXENT output. ECS (equilibrium climate sensitivity) or TCR (transient climate response). But I am not completely sure about it.
Any kind of help and suggestion would be greatly appreciated.
Relevant answer
Answer
Hi Berika, I'd recommend to use all 5 as they are already the result of a selection process. But if you want to select only 3 then this document (section 6 and 7) will be useful: https://www.isimip.org/documents/413/ISIMIP3b_bias_adjustment_fact_sheet_Gnsz7CO.pdf
  • asked a question related to Species Distribution Modeling
Question
2 answers
I am trying to choose models for my Species Distribution Analyses. I know that according to CMIP6, up to 100 models should be released. However, I found only 68 in Meehl et al. 2020. Moreover, there are only five models available to download from the Chelsa database. I wonder how I can prefer one model to another? According to ECS (equilibrium climate sensitivity) or TCR (transient climate response)? If yes, which one is crucial in choosing the model?
Relevant answer
Answer
R. T. Corlett Thank you for your answer!
  • asked a question related to Species Distribution Modeling
Question
5 answers
I am currently creating a species distribution model with Maxent in R. The problem I am facing is that the habitat suitability prediction is always different. When I run the model twice, with the same configuration, two different maps will result. That is because I use random background points (there is no sample bias in my data, so no need for that) and these are of course always randomly sampled.
So how should I treat that. Should I create ten maps and average the habitat suitability value? Or should I evaluate the model each time and choose the one with the highest evaluation metric score?
If someone has literature on that would be good as well.
Relevant answer
Answer
When you run the software, it is used random background points each time, therefore, it is normal that you are facing different maps.
(You are predicting the habitat suitability, not creating it)
To earn the best result you can change other options in software; Repetition, Threshold, etc.
  • asked a question related to Species Distribution Modeling
Question
25 answers
Hi Guys!
I usually download future climate data from Worldclim.org.
Their website says that "Data at 30-seconds spatial resolution is expected to be available by the end of March 2020", however, this has not materialized . . . https://www.worldclim.org/data/cmip6/cmip6climate.html
Does anyone know of alternative sources to download future data at this (1km) resolution?
Many thanks!
Joshua
Relevant answer
Answer
Hi Guys
Update, Worldclim has updated their website with new CMIP6 30arc sec variables.
We have waited long enough.
Enjoy!
  • asked a question related to Species Distribution Modeling
Question
4 answers
Hello Experts,
We are at the beginning of making predictive modelling of an invasive plant species using MaxEnt. The species is found as a patch over the study area. I am new at using this model, have a piece of limited knowledge about it. I have reviewed several papers where only point locations of the present occurrence had been used.
Since my target species occurs as a patch, How can I take the polygonal area of the species where it occurs, instead of point location data?
Or are there any other methods to cover the whole patch of the species into SDM?
Relevant answer
Answer
No, only occurrence points. Keep in mind that bioclimatic or environmental variables are the ones that could potentially represent the species, they are not always the traditional WorldClim ones. To do this you must study the behavior of the species!
I hope I have been useful to you.
  • asked a question related to Species Distribution Modeling
Question
3 answers
Hi!
I used MaxEnt to carry out SDM for 16 species across 26 environmental variables. I did 10 replicates for most species and more replicates for some due to the small sample size. To show my results I basically replicated the plots that MaxEnt makes, using the text files to draw the response curves, but with no confidence intervals so my figure wouldn't be too overcrowded. Now I want to make the same figures but with the confidence intervals to put in the appendix, but I can't find the data to draw the confidence intervals. Is it possible to obtain this data somehow, or do I have to calculate the confidence intervals myself if I want to make my own figures? Has anyone been in the same situation before?
Thank you :)
Relevant answer
Answer
In case anyone has the same issue, I wanted to share how I resolved it in the end.
Outammassine Abdelkrim thank you for the input, but unfortunately your answer was not what I was looking for - I was referring specifically to the standard deviation for individual parameters rather than the whole replicate.
So I did what I suspected I'd have to do at the start - I used the individual replicates' data (for me the file names went like "species_modelnumber_parameter_only.dat"), put them all together (for each species and each parameter) so I could calculate the standard deviation myself, and then was able to plot it. It worked out great, though it was a fair bit of code to go through.
  • asked a question related to Species Distribution Modeling
Question
4 answers
Hi everyone,
I'm trying to run species distribution model with dismo package in R and I would like to get a better response curves of my variables.
This is how I've done, but the resulting curves are rather rough.
me <- maxent(variab, ab)
response(me)
How can I improve the result? For example getting smoothed response curves?
Thank you
Relevant answer
Answer
Are your response curves jagged and unnatural-looking? That could indicate the model is overfitting your training data. Assuming your data are fine, try limiting the number of feature classes in your model by excluding the hinge and threshold feature classes. Maxent by default uses linear, quadratic, product, hinge, and threshold feature classes. However, the hinge and threshold feature classes have a tendency to overfit some datasets, resulting in jagged-looking response curves and this may be the culprit here. I'm not familiar with dismo, but a quick web search suggests this may be as simple as: maxent(x = x, p = p, args=prepPara(userfeatures="LQP")). Where L=linear, Q=Quadratic, H=Hinge, P=Product, and T=Threshold.
  • asked a question related to Species Distribution Modeling
Question
12 answers
Hello! I am currently doing GARP species distribution models. However, I can't find a clear paper about the model's assumptions on what the input data should be or how should they behave.
For example, is it necessary that environmental data are normally distributed? Or that data should follow a certain function (e.g. logistic or linear)?
Thank you for your answers!
Relevant answer
Answer
  • asked a question related to Species Distribution Modeling
Question
13 answers
According to the evaluations we have made among our colleagues on this subject and our own inquiries, another requirement has emerged. This means that there is a lack of standardization of the numbers used in the world's herbaria and given as the plant type codes. For example, for a plant samples of a species, collected from Turkey, stored in Geneva (G) herbarium, it has a different codes in other herbarium. For this reason, the species should be presented with the herbarium codes to be added to the country origin codes. Or some other digitising and coding systems. In this way, both the origin is indicated and even the collected plants can be classified. What do you guys think about it?
"TUR-G 125" instead "G 125"
Country codes are given below:
Relevant answer
Answer
The idea of standardising herbarium numbers is ill advised. Apart from the fact that it may create uncalled-for additional work if implemented retroactively, it may hamper the purpose of herbarium numbers which, usually, are accession numbers. There is some tradition, in smaller institutions in particular, for using herbarium numbers as a surrogate for collectors' numbers, which means that they are assigned to duplicate specimens as well, whether stored in the original place or distributed as gifts or on exchange. This leads to problems and errors; in particular, accession numbers are unique and can thus be cited in order to differentiate between duplicates, which is sometimes essential when it comes to type designation.
As an aside, there is already a system in place that is well known and widely used, which having been implemented in the JStor Global Plants images database (https://plants.jstor.org). It is not confusing and does not cause additional labour: it uses any existing herbarium numbers prefixed by the official "Index herbariorum" herbarium code (or "acronym"). The numbers are prefixed by the adequate number of zeros to match the longest extant (or foreseen) number used in that herbarium.
  • asked a question related to Species Distribution Modeling
Question
4 answers
Dear colleagues,
I would like to predict some species distribution models to the LGM. For (bio-)climatic variables alone this is straightforward and still reasonable for topographic variables based on a paleo-DEM. However, I could not find a global dataset (raster, *.tif, *.afd, etc.) of paleo forest cover in percentage.
Please, let me know if you are aware of such data. Alternatively, I'd like to hear your opinions on how to model proportional LGM forest cover. For the latter, I would like to know which variables and which algorithms (ANN?) you would suggest to model forest cover.
Thanks.
Relevant answer
Answer
  • asked a question related to Species Distribution Modeling
Question
6 answers
Hi, I am not really sure if this question is valid nor makes any sense.
But for example we have a single (imaginary) species, let's say Pikapika pii, and determined its genetic diversity. PCA and STRUCTURE clustering showed three groups, GRP1, GRP2, and GRP3.
My question is, can I treat these three groups as "separate species" and use it to run a multispecies SDM, or run an ensemble of single species SDM, or this is not valid/possible at all.
I would appreciate any help/correction with this thought. If possible, you can also refer publications that I can read, or experts that I can directly consult/talk with.
Thank you so much for your time and help.
Relevant answer
Answer
John Paul Manamtam Payopay : Yes, if you prepare data.
  • asked a question related to Species Distribution Modeling
Question
7 answers
We have a locally endemic plant species which distributes just a specific narrow area. It spreads almost everywhere in that area, but it does not occur anywhere except that area. We want to model the distribution of the species with maxent. The bedrock in the distributed area is the same everywhere and the elevation variation is really low. Would it be right to produce artificial presence data and model it by putting artificial (random) sample points into the field?
Relevant answer
Answer
The regular way to use Maxent is to use your presence data and your environmental layers. The extent of the environmental variables should be larger than the range of presence points by 10-50%. Maxent will generate random pseudo-absence points for comparison. Never produce presence points. You may also use R package ENMeval to determine best model parameters.
  • asked a question related to Species Distribution Modeling
Question
6 answers
Hi all!
I am trying to build my own species distribution models (SDMs). I have seen that some authors use, say, 100 replications, but I do not understand how they get to this number. Hence, my question is: when it comes to the replications, how is that number decided? Is there a rule of thumb?
Thank you a lot in advance!!
Relevant answer
Answer
Although many would recommend power analyses and a couple of other rules (such as 10xN-levels) there is no consensus approach to defining sample size. As Andrew Paul McKenzie Pegman has explained, there are some "magic numbers" of biostatistics (such as 30) that are good enough depending on the situation.
  • asked a question related to Species Distribution Modeling
Question
2 answers
In many study areas, both the line transect survey data and camera trap data are available. How to combine them in a species distribution model is what I care about. Line transect surveys cover large areas with a low survey intensity, whereas camera trap surveys cover small areas for 24/7. Suggestions for appropriately weighting these two types of data are highly appreciated.
Relevant answer
Answer
David Eugene Booth Yes, the two methods give quite different results. However, researchers are interest in animals, and they usually want to use all available information for their studies. For example, a scientist completed a line transect survey and recorded many occurrences of wild boars. He can estimate the density of the wild boar base on this survey. In the meantime, he has camera trap data in the same region. Now he is thinking about how to use the camera trap data to improve the accuracy of wild boar density.
  • asked a question related to Species Distribution Modeling
Question
8 answers
In Maxent model evolution the output result using AUC value; If AUC is more then 0.7 and close to 1 that result is best. But if we ruled out AUC value for evolution then what are the methods for selection the best result? Using Different variables for a single species I run Maxent model and I have four (4) output. Then how to I evaluate the best result for my study? How to I Compare those result and select a best result except AUC value?
Relevant answer
Answer
For other test for "best" fit of SDM including maxent output and complementary considerations (optimizing transferability; minimizing multi-collinearity of independent variables) for testing and model selection see the following:
AIC Akaike Information Criterion may also be relevant
Be welcome to share your findings of additional testing
  • asked a question related to Species Distribution Modeling
Question
20 answers
I am the new user of "R" and "Maxent.jar". And I generated some species distribution models by "Maxent.jar". But I am getting in the trouble that is I can't use R package to evaluate this models. The most difficult thing is I can't make R to read models generated by "Maxent.jar" whatever the file extent is .asc or .tif even .img. Maybe my thought is naive and silly, but what I want is the R package can read my models and can make me do some evaluations like AICc, Kappa, TSS, CCR, AUC and so on.
The R package I am using now is "ENMeval 2.0" by Jamie M. Kass (It is a good R package but maybe it is not easy for the new user of R and Maxent or I am too naive and silly to use it.)
Plz, I need your help, thx.
Relevant answer
Answer
Here's the latest ENMeval vignette, which starts from scratch: https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0.0-vignette.html
  • asked a question related to Species Distribution Modeling
Question
5 answers
Since it is preferred to check any autocorrelation among the variables; one has to remove highly correlated variables to run an SDM (I am using MaxEnt). For my study, I have calculated the Pearson correlation coefficient (r) among the variables (correlation matrix is provided). But as I am new to this, I am finding it hard to interpret the correlation matrix table. Meaning how and on what basis, I am going to remove the variables? (I am taking the threshold ≥0.8 for the purpose), I need some expert suggestions.
Q1. How the variables are chosen? And please suggest me accordingly to the provided table. Which are the variables I have to select for my study?
Q2. How one variable is selected, when there is a high correlation between two variables?
Q3. Is negative correlation not a problem? I am asking this because; I have seen few papers where highly negatively correlated variables are also selected.
Please help me.
Relevant answer
Answer
You have 20 variables bio1 -bio19 and elev with many of highly pairwise correlated. This means that there is redundant information and uncertainty in the data that makes it difficult to attribute and interpret the contribution of one or more variables. Knowing the pairwise linear correlation coefficients does not help to reduce redundant information and extract meaningful information for separate contributing factors.
I suggest using a principal component decomposition methodology that allows one to perform a multivariate correlation analysis and identifies redundant variables that carry little or no independent information while retaining only a few mutually uncorrelated principal variables (components) that contain practically all original information. This technique is a special case of a matrix approximation procedure called singular value decomposition. The higher the level of correlation between the columns of data of the original matrix, the fewer the number of new (principal) variables is required to describe the original data set.
You can perform principal component analysis using various statistical packages, such as R, Matlab, or Minitab. The latter requires no coding at all.
An overview of principal component analysis can be found in many books on multivariate analysis.
  • asked a question related to Species Distribution Modeling
Question
8 answers
I am not a spatial analyst or an expert in a related field, that's why I decided to contact you for advice or help in extracting maximum information from the data I have.
Data (2 trials) - routes crossing the study area (about 1250 sq. km.), about 1000 points along the lines represents an event of species occurence, and a collumn about the number of occurences (z data) attached to the x,y coordinates.
Questions:
Which Exploratory Data Analysis tools I should try to describe the pattern? I have an idea about visualising the Standard Distance circle (Centrography), box plot of an events, for example. Maybe Kernel Density? And what I can do with z data? Maybe interpolation? I think It would be easier to work with transects, but I don't know what I can do with this kind of irregularly shaped data. Can I do some tests or predictions with this kind of data, or the samples are too small and not representative? For example I have a hypothesis that the distribution of an event is not random, and the binomial probability and number (z) of events is higher in the central south part because of some factors.
I look forward to your suggestions on what tools, tests I can use, which concepts I should learn about.
Now I am trying to use R and QGis for visualisation and analysis.
I also apologize that my english may confuse you.
Relevant answer
Answer
Gabriel Asato Thank you! I already visualised the data using heat map tool on Qgis, but it rightly just shows the more surveyed areas (bigger line length per square unit) as denser
  • asked a question related to Species Distribution Modeling
Question
6 answers
I have seen maxent modelling being applied to a range of distribution types from small scale to large scale. Is it effective to use maxent for small areas like Protected areas and National Parks (<1000 km2 area) . In my opinion macroscale covariates like the bioclimatic variables make little impact in small ranges. Therefore, using finer scale covariates that make an impact within the study area would make a better impact on the model. I am looking for ideas and suggestions.
Relevant answer
Answer
Francesco Valerio Thank you for sharing. It seems to be a good approach.
  • asked a question related to Species Distribution Modeling
Question
3 answers
if consider all environmental for SDM what will be the problems any suggestion?
Relevant answer
Answer
PCA is also a good approach
  • asked a question related to Species Distribution Modeling
Question
5 answers
#1.Background of the work: I am trying to workout the habitat suitability modeling of a tree species endemic to the Western Ghats. This query is regarding a confusion that arose when doing General Niche-Environment System Factor Analysis (GNESFA). #2.Main Query: While doing GNESFA, I got some strange results while plotting the niche on the factorial axes ( please see the attached figures). Extreme dark circles are forming in the scatter plot. Can anyone please help me in finding out why this is happening. #3.Attachments: Final result and specific plot of niche on the factorial axes are attached.
Thank you very much for all your patient reading and time. Hoping for a positive response.
Relevant answer
Answer
Dear Pietro,
Thank you very much for helping resolve this problem. I will check for clustered location data within the same pixel. Yes I do have many locations in the same pixels, I couldn't correlate that to this. I think to get a better scatter plot I will remove the duplication of location within same pixels and try again. I will update you after I fix this.
Have a wonderful day.
Kindest Regards
Namitha
  • asked a question related to Species Distribution Modeling
Question
4 answers
Hello everyone,
I am currently searching for alternatives for the Neotoma database. The database did not offer the expected amount of data I need for the Mediterranean. Does anyone know alternative databases to Neotoma, where I can find geographically referenced pollen data for the Mediterranean (e.g. Universities, Collaborative Research Centres, etc.)? I am pleased about every hint.
Cheers,
Fabian
Relevant answer
Answer
Thank you Héctor
  • asked a question related to Species Distribution Modeling
Question
4 answers
I am looking for a tool in ArcGIS to help in identifying priority areas for conservation based on the output of species distribution models (maxent, GLM, BRT,..). Something maybe similar to the zonation algorithm.
Relevant answer
Answer
This website has links to dozens of connectivity tools https://conservationcorridor.org/corridor-toolbox/programs-and-tools/
FunnConn and Linkage Mapper both run in ArcGIS. Others run in R or are stand-alone tools. Each has its own assumptions about which patches are "better" so whichever one you choose to use you'll want to read up on it quite a bit. I'd start from what you want the tool to do rather than the software that you want to run it in. Many of these are open source and are freely downloadable.
  • asked a question related to Species Distribution Modeling
Question
8 answers
I've been doing MaxEnt modelling for a while now using bioclimatic and topographic variables. I have already obtained a 'water line systems' of our country which is in a shapefile, however, I haven't tried to incorporate this "distance to water/water systems" as my input file for the modelling since I am not sure what format this variable should be in as well as how to prepare this data for the modelling. So, does anyone know how to incorporate this file for MaxEnt modelling?
Relevant answer
Answer
I forward Abbas Naqibzadeh answer. Simple use the tool "Euclidean distance" in your GIS, or calculate that using R. That will generate the mean distance to the next water body per pixel.
Alternatively you could also create density variables, eg. "density of water systems per pixel". This approach might be better if your resolution is coarse, meaning if you are modelling with large raster cell sizes (so that mean distances would become meaningless). An easy approach to such a variable would be to calculate length per pixel for example.
  • asked a question related to Species Distribution Modeling
Question
6 answers
Hi everyone,
I am wondering if anyone knows of a species distribution model that can take in percentage data (i.e. presence data with prevalence values attached to them)? The dataset I have right now has many occurrence points of a species, with each point containing information on the prevalence of the species at that particular location. However, the current SDMs (e.g. MaxEnt, GAM etc.) that I am using only allow me to input all these as presence data and treats all presence data equally in predicting the potential distribution of the species of interest. I would like to run a model that can take in the prevalence of each occurrence point and weight the points accordingly to produce the final results. I would greatly appreciate if anyone has any information on this, or if there's an alternative method that I can use to account for percentage data for each occurrence point. Thank you so much in advance!
Relevant answer
Answer
I think you can try R-package,
It will be very useful.
  • asked a question related to Species Distribution Modeling
Question
6 answers
What is the significance in carrying out species distribution modelling or habitat suitability modelling in a tree species with very limited occurrence (~under 10000 trees). Other than points like:
1) It will help in locating unknown populations based on the probable species distribution map. (back to field studies)
2) Based on the probable location map we can try introducing the plant to that location.
Relevant answer
Answer
3) The results may reveal the cause of the observed geographic restriction: dispersal or env. conditions (also tentatively, which variables), or both.
  • asked a question related to Species Distribution Modeling
Question
4 answers
1) When comparing bioclimatic variables for doing species distribution modelling, which is more relevant when working on narrowly endemic tree species. ( regional study)
2) What is the suitability of BIOCLIM variables for a regional level study?
3) Are there other methodologies for comparing the variables when trying to do distibution modelling ?
Relevant answer
Answer
Stepwise backward elimination of variables is an alternative to a priori selection by PCA. When we compared both methods, the outcome was near identical. One issue to keep in mind is that PCA is not suitable for categorical variables.
Bioclimatic variables are spatial interpolations of meteorological point data. The quality of the interpolation depends on the number data points and representativeness of these meteorological data. Where we tested the WorldClim interpolations with independent local meteostations, the difference was too large for comfort. Further, we noticed in more than one study, that classical meteorological variables were outperforming the bioclimatic derivatives.
I agree with Yuri from our experience that (bio-)climatic variables maybe suitable at larger areas, say at subcontinental level, but not so much for smaller ones. In mountains elevation plus vegetation variables may outperform climatic variables.
My statements above can be studied in detail in our SDM papers available under my RG profile. Keywords: bear, endemic plants, elephant, wild boar, oak, beech, mountain pine, land use conversion ("deforestation"), GLM and maxent.
Enjoy the read
  • asked a question related to Species Distribution Modeling
Question
5 answers
I am working on the species distribution modelling of an endemic forest tree. After completion of the modelling , how can I use this data to trace back or find new populations in the study area. Is there any established pipeline or work plan for the same?
I am specifically looking for ways in which I should start looking for actual population clusters of tree, based on the predicted map that we generate.
Relevant answer
Answer
I found a paper with a similar research intention as your question I reckon. Sorry if it's not quite on par with your interest.
  • asked a question related to Species Distribution Modeling
Question
2 answers
Does anyone have ever troubled with an error of parallel calculation? See blow
---------------------------------------------------------------------------------------------------------------------------------
R Version: R version 4.0.3
snowfall 1.84-6.1 initialized (using snow 0.4-3): parallel execution on 5 CPUs.
Library biomod2 loaded.
Library biomod2 loaded in cluster.
Error in checkForRemoteErros(Val):
5 nodes produced errors; first error: arguments imply differing number of rows: 5
2531200,0
-----------------------------------------------------------------------------------------------------------------------------------
Thank you very much!
Relevant answer
Answer
can you provide the code?
For example:
library(qdap) spellcheckstring = "universal motor vlb" mydictionary = c("brake", "starter", "shock", "pad", "kit", "bore", "toyota", "ford", "pump", "nissan", "gas", "alternator", "switch") class(spellcheckstring) # character class(mydictionary) # character check_spelling(spellcheckstring, dictionary = mydictionary)
ERROR: Error in checkForRemoteErrors(val) : one node produced an error: arguments imply differing number of rows: 3, 0
SOLUTIONS:
The dictionary is so small that when it is split up (https://github.com/trinker/qdapTRUE) there are no possible matches for that letter. Use assume.first.correct=FALSE:
check_spelling(spellcheckstring, dictionary = mydictionary, assume.first.correct=FALSE)
Version 2.2.5 (dev version) automatically enforces assume.first.correct=FALSE if custom dictionary does not have at least one word beginning with all 26 letters of the alphabet.
Get the latest release of qdap
if (!require("pacman")) install.packages("pacman") pacman::p_load_gh( "trinker/qdapDictionaries", "trinker/qdapRegex", "trinker/qdapTools", "trinker/qdap" )
AS YOU CAN SEE IT IS VERY LIKE BUT WITHOUT YOUR CODE IT IS IMPOSSIBLE
AS YOU CAN SEE IT IS VERY LIKE BUT WITHOUT YOUR CODE IT IS IMPOSSIBLE
  • asked a question related to Species Distribution Modeling
Question
11 answers
I am running MaxEnt with 15 replicates.
In MaxEnt output file there are 0 to 14 sample predictions and background predictions CSV files.
I want to know which replicate has the average sample predictions and background prediction values.
Relevant answer
Answer
I am not sure I understand your question, because the aim of the "want" is not specified.
Click "Plots" in the maxent output and you will get besides an average output for the number of the 15 (0-14) replicates, also the max, min and median for the number of replicates. This applies to replicate run type Bootstrap. You may check yourself for replicate run type Crossvalidate.
Have fun
  • asked a question related to Species Distribution Modeling
Question
7 answers
I am running MaxEnt modeling for my target species distribution. There I need to select the least correlated variables to avoid multicollinearity. This multicollinearity can be tested using various tests such as Pearson's correlation coefficient (r), Variance inflation factor (VIF), and Principal component analysis (PCA). Among these, I find PCA a bit difficult to understand. My questions are
1). Can I go with any one of these methods, to check collinearity?
2). Which one of the tests is the best? if any.
3). Will it be okay if I only go with Pearson's correlation coefficient (r)? Will it make my result and interpretation sufficient?
Relevant answer
Answer
Please find the variance inflated factor (VIF) (Hair et al., 2017).
The cut-off value must be less than 5. Less than 5 is accepted.
  • asked a question related to Species Distribution Modeling
Question
10 answers
Hi, I am currently undertaking a species distribution modelling project using maxent and RCP predictions.
However I am encountering issue where when I put it the required number of replicates (5) it will complete the first replicates (labelled 0) and then stop displaying an error saying layer bio 1 is missing and then failing to produce the further replicates. Any help would be much appreciated.
Relevant answer
Answer
The error occurs due to the difference in cell size. If your layers are in ASCI format, open it with notepad and check the followings -
ncols
nrows
xllcorner
yllcorner
cellsize
In all layers, all these parameters should be the same, if different you can clip again.
Sailesh
  • asked a question related to Species Distribution Modeling
Question
25 answers
The field of species distribution modelling has experienced fast growth in the last decade. With so many R packages available (sdm, dismo, ENMeval, BIOMOD, SDMtune, ssdm, esdm, ENMTML, ...) it is difficult to find the "best" approach. Which is the most comprehensive platform to fit, evaluate and project/predict species distributions across space and time, in addition to assessing variable's importance and response curves?
Relevant answer
Answer
I am in agreement with Nasir Hameed .
Ahnaf Ilman
  • asked a question related to Species Distribution Modeling
Question
3 answers
Hello. Can some one show me how to make SDM with Maxent for future distribution ? Is there some specific options for future distribution ? Or is it exactly the same process as for a current distribution ?
I already have the present occurence and future climat database ?
Relevant answer
Answer
The same settings that you chose in the current conditions should be applied in the future
Make sure that the variables names are the ones used in the current situation
Make sure you are using the same grid scale
Make sure that the boundries of the factors used are the same in the present and future time
  • asked a question related to Species Distribution Modeling
Question
8 answers
Hello Maxent Community,
I have been generating SDM models and projections in Maxent. I have had some success with one set of variables (some Hydroshed variables mixed with World CLIM variables); however, I am having a bit of trouble with a model based solely on World CLIM variables.
I used a Peasron correlation matrix to select eight World CLIM variables, which model well with my species' occurrence data. The issue arises, when I ask Maxent to project on future World CLIM variables - the returned projection is almost entirely blue. There are a few specs of red and yellow (indicating prediction probabilities), but the vast majority of the projection is blue (indicating zero prediction probability). I tried rerunning the model with clamping turned off, but the problem persisted.
I am relatively novice with Maxent, so I was wondering if I am overlooking a setting that may help correct this issue. I am a bit confused, because when I ran the projection with the Hydroshed variables and some of the same World CLIM variables, the future projection looked just fine. Any advice or suggestions would be tremendously appreciated.
Thank you in advance for your help!
Relevant answer
Answer
I recommend reading this article:
Predicting the impacts of climate change on the distribution of species: Are bioclimate envelope models useful?
Best wishes
Bekhruz
  • asked a question related to Species Distribution Modeling
Question
14 answers
Hi, guys. Thanks for reading this question.
I am modeling a reactor (steady, RANS). CH3OH and O2 go into the reactor via inlet (a jet). After reactions, species leave the computational domain via an outlet. I turn on the species model and define the reaction between CH3OH and O2 as a two-step reaction,
CH3OH+O2 = CO + 2H2O ; CO + 1/2 O2 = CO2
I choose the eddy dissipation model and run the simulation. The temperature field looks reasonable (look at the attached picture). But the mass flow rate (kg/s) of atom 'C' at the outlet is almost 3 times bigger than that of the inlet. The mass of atom 'C' is non-conserved!
Then, I use a one-step reaction instead of a two-step with all other settings unchanged. I check the result and find that the mass flow rate (kg/s) of atom 'C' at the outlet is the same as that of the inlet. The mass of atom 'C' is conserved this time!
It is weird! Why the two-step reaction leads to the non-conservation of atoms? I am sure that the stoichiometric coefficients are balanced. I would be very grateful if someone can discuss this problem.
Relevant answer
Answer
This is actually common practice in combustion simulations with small mechanisms, e. g. for methane combustion. You solve transport equations for CH4, CO, CO2, O2 and H2O, and use N2 as bulk species. Most reaction mechanisms that are used for this (like Jones-Lindstedt or Westbrook-Dryer) are given with the bulk species last anyway, so no one is really aware of the consquences of "bad" order of species.
Thanks Zhaojian Liang for letting us know, I wasn't aware this could have such an impact.
  • asked a question related to Species Distribution Modeling
Question
6 answers
Hello everyone,
Do you know any reliable Python library for Species Distribution Modeling which provides methods for creating absence data?
I've come across multiple packages for R like "dismo" and "biomod2" but I've found nothing for Python.
I'm not familiar with algorithms provided in these R packages I mentioned above, are those algorithms (like SVM or BRT) somehow modified for this type of Modeling? And is it possible to use Python machine learning libraries like sickit-learn for this purpose?
Thanks for your attention.
Relevant answer
Answer
  • asked a question related to Species Distribution Modeling
Question
3 answers
Hi everyone,
I need some clarifications on these things:
- What are the most important differences that can be taken into consideration between the various algorithms used in SDM ?
- What is the most machine learning model widely used in SDM algorithm ?
Thank you
Relevant answer
Answer
Thank you so much for the link
  • asked a question related to Species Distribution Modeling
Question
4 answers
How can decide to keep the number of maximum iterations in MaxEnt. Is there any way or depend on the sample size.
Thank you.
Relevant answer
  • asked a question related to Species Distribution Modeling
Question
2 answers
I am currently in the process of trying to learn how to do SDM correctly and I want to evaluate the models I run with OpenModeller.  My understanding is that one should use om_test, but I am having trouble understanding exactly how to use it.  Could anyone help me out on how to use it?  
From my understanding, one runs om_test with the Serialized model created from om_console, but not the projection result.  However, I thought that what one should actually be interested in evaluating is the projection result and not the model one created to make the projection.  Is my understanding of what it means to evaluate a model wrong?  Hopefully my question makes sense.  Any help would be greatly appreciated.
Relevant answer
Answer
  • asked a question related to Species Distribution Modeling
Question
4 answers
Hi everyone, I have to clarify a few of these things
01. Is there any difference “10 percentile training presence logistic threshold” ( maxentResults.csv) in the column titled and the "10 percentile training presence" in the .html file(output)?
02. When representing unsuitable habitat and suitable habitat can be used "10 percentile training presence" instead of “10 percentile training presence logistic threshold”? If yes which one is most suitable
Thank you..
Relevant answer
Answer
This is a usifule article for your topic:
  • asked a question related to Species Distribution Modeling
Question
3 answers
I am working on Species Distribution models. I had taken the data from GBIF and DIVA. but the resolution of both the data sources is different. I am using QGIS (Don't have ArcGIS). Suggest to me how I can create the same resolution and dimensions for the data.
Relevant answer
Answer
Also you can just import by right clicking on the raster layer and set cell size
  • asked a question related to Species Distribution Modeling
Question
10 answers
Hi guys, I would love to know how I could go about solving the above question. I currently have 2 different species distribution models, one created with the current climate projection and the other using a future climate projection. I already know how to use R to calculate the 3 different statistics which concerns niche overlap (Schoener's D/WIlliam's I/Rank correlation) using ENMTools package. Now I would love to find out how I could get an absolute value of the average change in habitat suitability in the SDM (concerning each pixel I guess). I know I could potentially use ArcMAP and utilise raster calculator and zonal statistics as a table functions to find this change but is there a shorter way I could go about this?
Thank you!
Relevant answer
Answer
Hello Noh,
sure, you can do the same in R using raster::calc() or raster algebra (i.e. future - present, if future and present are RasterLAyers with the same extent/resolution) and raster::cellStats().
HTh,
Ákos
  • asked a question related to Species Distribution Modeling
Question
1 answer
I'm working on a species distribution model of anteaters in Maxent. I want to project the model with a CanESM5 ssp370 bioclimatic variable. To do this, I need first to convert this raster to a ascii file. I tried to do it in Qgis, but the program only transforms one band rasters. Does somebody knows another way to transform this raster? Or perhaps a way to separate each band from the raster so I can work each one separately?
I thank you all beforehand.
Relevant answer
Answer
Separar las bandas en archivos individuales se puede hacer con la herramienta rearrange bands de QGIS, seleccionando una por una las bandas y guardandolas en una nueva capa TIF.
Esas las puede convertir en archivos ASCI usando la herramienta translate, pero con esta he tenido problemas en el formato de las celdas sin datos (me coloca un fondo negro), por lo que esas conversiones las he hecho sin problemas usando el paquete raster, de R.
Espero haber sido de ayuda y no haya problema por la respuesta en español.
  • asked a question related to Species Distribution Modeling
Question
5 answers
I am proposing that a hierarchical approach is used where coarse resolution variables are used to run a maxent SDM and delineate a presence/absence map for my species (tropical conifer tree species). After which, the higher-resolution variables derived from the LiDAR data used to generate models within these areas. Can anyone foresee potential problems with this approach? or have any better ideas?
Relevant answer
Answer
Interesting idea. I have no clear yes or no for you.
However, I see one potential problem. Using MaxEnt, we want it to learn for itself whether variables are predictors or not.
So by forcing climate variables as the primary cutt-off of presence/absence, and then running the "final models" only within "suitable climate" areas, you basically disable maxents ability to decide for itself.
Also, potential areas outside the "suitable climate model", which could however be presence if climate is not a super factor, will be missed completly.
One last point to keep in mind: climate variables are generally often correlated with other topographic and/or landcover data.
So you might be overrepresenting some factors if they are represented by climate first and by other variables in your second model secondly.
However, if the root of your problem is the climate variables being to coarse, then you might just work around by using a dem instead of the climate variables (given there is a strong physical causality from elevation to temperature)?
Cheers
  • asked a question related to Species Distribution Modeling
Question
4 answers
Background: VIF-stepwise test deals with multicollinearity and automatically eliminate the highly correlated variables according to the determined threshold. The problem is when using bioclim 19 variables, there are variables derived from original variables like bio1, 2 and 12 and this affects the elimination process. Since they are the source, they are found highly correlated with the other derived variables, thus excluded first. VIF-step doesn't distinguish between original -important variables in my opinion- and other derived variables.
Now for PCA, it works on one by one collinearity with no self eliminating option so the user must choose the variables to be retained and this requires defense later -on why we choose this variable over the other-. I thought in another approach where I will use PCA along with jackknife results to determine what variables to retain. This approach will allow me to retain some original variables and I can depend on their Jackknife results in the defense.
My main questions are: is the third approach scientifically valid? and is it more robust than VIF-stepwise test?.
Side question: how can we defend the variables we chose using PCA alone?
Relevant answer
Answer
  • asked a question related to Species Distribution Modeling
Question
1 answer
I'm creating SDM for an endemic species to a country. The country have different areas where climate, elevation and LULC differ. Is there a tool / way to indicates the most limiting factor / factors for each area in that country?. I don't have enough data in each area to make separate models in maxent. I can obtain one model only for the whole country. So, is there a helpful tool in arcgis, an option in maxent or any known program?.
Relevant answer
Answer
Jane Elith and co-authors have a detailed description of how to do this using Maxent and provide code for doing so - https://besjournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2Fj.2041-210X.2010.00036.x&file=MEE3_36_sm_AppendixS1-S7.pdf (see page 5 in the supplementary material)
Elith, J., Kearney, M., & Phillips, S. (2010). The art of modelling range‐shifting species. Methods in ecology and evolution, 1(4), 330-342.
The basic premise of their method is once you have a good model then you can systematically run models dropping each predictor one at a time and see how well it affects the prediction accuracy. The variable that leads to the largest drop is the limiting factor. You can even map out for each pixel which variable is most limiting.
  • asked a question related to Species Distribution Modeling
Question
3 answers
I'm trying to create current and future SDM using maxent for a native species to an area. What would be different if I trained the model on that area then project it worldwide from training the model worldwide from the very start and then projecting worldwide too?.
Relevant answer
Answer
Some additions to the previous answer: SDMs try to fit the niche of a species, based on the combination of presence(/absence) data and environmental data. So if you fit your SDM on a small training area, chances could be high that you leave out entire parts of the species' niche. Hence, your model and its predictions will be wrong.
Better to use as much presence data as possible of that species to make sure that you have covered the 'full niche'. However, you have to take into account that different 'subspecies' or varieties of the same species can have a different response. So for example: Consider a species that occurs in Europe and the US and you want to model the future range of the species in Europe. In this case, it is not necessarily a good idea to also include the presence data from the US, because the individuals that live there will probably have a different response to environmental gradients.
If your species only occurs in that small area, there is no need to to include the whole world in your model. 'The larger region' around your occurrence points will suffice.
  • asked a question related to Species Distribution Modeling
Question
8 answers
Monthly values of minimum temperature, maximum temperature, and precipitation were processed for nine global climate models (GCMs):
BCC-CSM2-MR, CNRM-CM6-1, CNRM-ESM2-1, CanESM5, GFDL-ESM4, IPSL-CM6A-LR, MIROC-ES2L, MIROC6, MRI-ESM2-0, and for four Shared Socio-economic Pathways (SSPs): 126, 245, 370 and 585. My confusion is which one would be accurate for the purpose of building niche models in tropical climatic conditions. Or do I need to use any other GCM?
Relevant answer
Answer
Hello Snehangshu,
All of them are only models. None of them are better than the other ones. Some of them are better in predicting the temperature values of the reference period, other ones are better in predicting the precipitation seasonality, and other ones are better in predicting some characteristics of the future climate - but who knows which ones?... Anyway, there is a web application which is intended to help species distribution modelers to select from the several GCMs, called GCM compareR. I attach the paper about the application:
HTH,
Ákos
  • asked a question related to Species Distribution Modeling
Question
5 answers
I used ENMeval R package to produce my species distribution model with algorithm "maxent.jar". I hope to calculate TSS for my binary map output from generated from maxSSS. However, I am unsure about how do I get the sensitivity and specificity to calculate my TSS score for my model.
Can I check how do I calculate TSS from what I have on hand from ENMeval output for my model?
Relevant answer
Answer
Instead of directly answering your question, I would like to suggest to use p10 (10th percentile presence probability) as a threshold to produce binary map. If you are using background data (e.g. 10,000 random points, or you do not have true absence data) when developing Maxent, AUC and TSS actually are not reliable metrics to evaluate model predictive performance. Check continuous Boyce index.
  • asked a question related to Species Distribution Modeling
Question
3 answers
Clustered/random data are very common in data analysis. For example, if I want to model the occurrence (presence/absence) of a species over multiple countries I could suggest that the countries are clustered/random effects. In theory I could use a binomial GLMM. However, the structure of the dataset, performance and information resulting from this model are not satisfiable and mostly do not fit my questions. The non-linear responses, high variability of the data, (randomly) missing values of the predictors, categorical predictors, and unbalanced dataset make it more challenging. Because of this I often use Random Forest models. Although, (sometimes) it suggested RF models are black-box models this is hardly the case. The return of variable importance, display of partial dependency plots, extraction of split-points at the root node, depth and number of split of the predictor variables makes it a complex white-box model. These results are also fitting most of the questions I ask. One could suggest a to use a GAMM, but there are so much buttons to tweak on these models, I do not feel confident and comfortable using them.
To handle the missing values, categorical data, and unbalanced datasets I used the randomForest package for R (Liaw and Wiener, 2002). The randomForest package has the possibility to impute the median for missing values and stratify (downsizing) the data in unbalanced datasets, which makes well suited for the data I work with. The stratification of the data is key as well as the imputation of the median. However, a drawback is that the randomForest package cannot take in account clustered/random effects. This then ends up as a discussion points for basically each analysis.
There are some scientific publications of MERFs (i.e. A. Hajjem et al. 2014) and R-packages of available (i.e. MixRF). However, from the description of the manual of these packages it does not seem they can impute the median and stratify the data. I do not want to lose a lot of my data by balancing my datasets before analysis and I do not want to lose information by removing incomplete samples.
Is there any news on an R-package that implements RF models that can handle al these things? Or, is there a suggestion for other types of models in R which can return similar information as the RF models and are (sort to say) user friendly like the randomForest package?
Thank you in advance,
Liaw, A., Wiener, M., 2002. Classification and Regression by randomForest. R News 2, 18–22.
Ahlem Hajjem, François Bellavance & Denis Larocque (2014) Mixed-effects random forest for clustered data, Journal of Statistical Computation and Simulation, 84:6, 1313-1328, DOI: 10.1080/00949655.2012.741599
Relevant answer
Answer
Sandhya Avasthi As far as I know (after reading the manual of the MixRF) there is no option to impute the median for missing values or stratify the dataset. I also read online that the MixRF package (and functions therein) cannot work with categorical values. Al three of these are critical, since the data I work with contains missing values, are unbalanced and have often categorical values. Therefore, the MixRF package (till now) is not an option, since I value the latter three points more than the incorporation of clustered/random effects. The vcrpart can handle categorical predictors (too my knowledge), but cannot impute the median or stratify the dataset.
  • asked a question related to Species Distribution Modeling
Question
3 answers
Hello all. I have a technical question regarding species distribution models on intertidal species.
The main issue here is the fact although the species in question are associated with intertidal regions, I come across a problem:
- BioOracle variables do not have information for the occurrence data of the species (from gbif). I think that the coordinates, although being correct and placed in intertidal regions, are not considered "ocean" or "marine" by biooracle?
Has anyone ever encountered such a situation? I wasn't looking forward to loosing occurrence points since the ones with Nan env values make up a lot of the total number of presences.
Thank you in advance for any advice.
Relevant answer
Answer
You need to trick the computer into believing you somehow!
  • asked a question related to Species Distribution Modeling
Question
3 answers
Dear Species Distribution Modelers and All,
Have you ever tried to select the specific month/quarter (e.g. wettest quarter for bio8) in a way other than the default one? If yes, how? If not, why not? E.g. in this paper a novel method (called 'static' approach) is suggested for calculating bioclimatic variables for future time periods: the month/quarter is selected once in the reference ('current') period and used later as it is fixed for all the studied periods.
Have you ever considered this method, or do you think it may have relevancy for your further research?
Thanks,
Ákos
Relevant answer
Zagir Ataev I think there are many ways because there are no rules in modeling :)
  • asked a question related to Species Distribution Modeling
Question
7 answers
Comparison of results according to variable combinations
I would like to compare SDM (Species distribution model, ex. Maxent) results according to a combination of about 10 variables.
I think it will take too much work to do this one by one.
So I am looking for a package or code to do this in R or other programs in a short time.
It would be nice to compare the results with AUC values ​​or other values.
If this is possible, I'm wondering if I can get the results in a raster (or workable in gis).
I am wondering if there is a way to express numerically which variable combination is the best, even if it is not the result of SDM.
Thanks for reading.
Relevant answer
Answer
The kuenm package cycles through all possible combinations of predictors using Maxent.
  • asked a question related to Species Distribution Modeling
Question
4 answers
In our study system, using BIOMOD2, we are obtaining SDM models with high evaluation metrics (i.e. TSS >0.8) where all the selected variables have very low variable importance values (<0.04). Isn't this counterintuitive? Should not a highly evaluated model have at least one variable of relevant importance value?
Relevant answer
Answer
Just adding the following recently published article which seems to be both timely and relevant:
  • asked a question related to Species Distribution Modeling
Question
5 answers
Dear all,
I want to know the best method to select the pseudo-absences between : minimum Convex Polygon method, kernel density estimate and from the buffered area around the occurence points. Any other suggestion will be welcome.
Thank you in advance
Best regards
Ilhem
Relevant answer
Answer
This important issue has been well addressed by a previous study. Please refer to:
Selecting pseudo‐absences for species distribution models: how, where and how many?
  • asked a question related to Species Distribution Modeling
Question
1 answer
Good afternoon,
I have recently developed a Maxent distribution model in R for a reptile endemic to Madagascar, using native-range presence-only data. The model performs well.
I have now projected this model onto Florida, where it identifies no suitable habitat. However, my focal taxon is well established in Florida. I did not use Florida presence data in the model because I wanted to check if the model would have a priori predicted the colonisation event.
I would like to recover quantified probability values for the known Florida occurrences; at the moment, I can see in general that the predicted probabilities in the colonised area are low, but I cannot quantify them. I can find the equivalent, quantitative values for my known Madagascar occurrences (because they were included in the model) in the model output (Training points) or using evaluate in 'dismo' (Validation points). Does anybody know of a method/package/function which I can use to find these values for my Florida points (currently just stored as a .csv).
I can share more technical details of the model if necessary; it is a fairly standard Bioclim/Maxent/presence-only data in R, using mainly 'dismo'.
Thank you.
Relevant answer
Answer
Which allowed me to find these probabilities.
  • asked a question related to Species Distribution Modeling
Question
3 answers
I am doing a series of SDMs using MaxEnt for different species of bats, I am using different features and regularization values, in addition, I am using spatial partition and jackknife for cross validation.
I would like to start selecting the models according to their significance, and then continue with other metrics such as the omission rate and AIC in order to select one of the models.
I know that the partial ROC exists but I don't know if it can be done with the partitions that I am using (space partition and jackknife) or it can only be calculated with random non-space partitions.
That is the question
Thanks.
Relevant answer
You could just do AIC or BIC
  • asked a question related to Species Distribution Modeling
Question
4 answers
I want to employ Occupancy and Detection probabilitiy estimates for a large vertebrate species. Do you recommend using and running different models with presence-absence matrices by sex (i have robust data on the sex ratio in my study population), or using sex as a covariate (i.e. using the typical species presence-absence matrix) of occupancy/ detection? The rest of the covariates will be from the habitat type and the species community at my study sites.
Thanks.
Relevant answer
Answer
Yes, sex is a common covariate used in occupancy models. If you haven't used it already, I would highly recommend the "secr" package in R. You can then compile a capture history element and combine it with covariates to run your models and analyses.
  • asked a question related to Species Distribution Modeling
Question
4 answers
I was doing some geostatistical analysis (variogram+kriging) for a "presence only" type data in a species distribution modeling context. Since, we know that when estimating the (empirical) variogram, the attribute is basically assumed to be a realization of continuous random variables (although an attribute can occur in counts too). If the attribute is just the presence, and no sub-categories then all the values at all positions will be same (say 1, if we denote a presence by 1). Hence the variogram can not be calculated, not even the indicator variogram.  In some papers such as [1] and references there in,  a grid based approach was used. In this approach a grid of certain size (e.g. 10 x 10 m etc) was superimposed on the sampling area and the number of species inside each cell were counted. This constitutes a count/frequency table like data. In the other approach pseudo absences or background data were generated using some algorithm e.g. Maxent etc (see e.g. [2, 3]). The pseudo absences are generated taking many factors into account and stacked/combined with actual data. This is merely generating x, y coordinates and giving it an absence status (say 0s). The result is a binary data with two categories, presence 1 and absences 0.   
Now the questions that are bothering me are
1. For the grid based approach, what should be the optimal cell size? How to find it and decide it? How to proceed with variogram with kriging etc?
2. For pseudo absences/background approach, how many absences (as compared to actual data)? How to decide it? How to proceed with variogram with kriging etc?
Reference
1. Rossi, Richard E., et al. “Geostatistical Tools for Modeling and Interpreting Ecological Spatial Dependence.” Ecological Monographs, vol. 62, no. 2, 1992, pp. 277–314. www.jstor.org/stable/2937096.
2. Tomislav Hengl, Henk Sierdsema, Andreja Radović, Arta Dilo, Spatial prediction of species’ distributions from occurrence-only records: combining point pattern analysis, ENFA and regression-kriging, Ecological Modelling, Volume 220, Issue 24, 24 December 2009, Pages 3499-3511.
Relevant answer
Answer
Thanks for your interest. Look at my latest paper about dengue prevalence 2020. I have explained the method very well. Let me if you need help with your data. You can contact me at asad06@gmail.com.
Cheers,
  • asked a question related to Species Distribution Modeling
Question
4 answers
As a researcher with about 15 years of experience I learned from my submissions of articles to various national and international journals that Macro level studies are given preference over the Micro level studies especially in international journals with high impact factors. To my beliefs Micro level studies with proper field surveys yield better empirical data and realistic information. Most of the Macro level studies are based on deductive approach that is without field visits. Researchers who are carrying out such studies at country and global level are based on the available data and literature reviews. More specifically, when we talk about the species distribution modelling using MaxEnt Macro level studies are based on the occurrence data either GBIF or other databases collected in different time periods. Here, it is pertinent to mention that such data are old data and who knows at the time of modelling the species might have extinct from the locality. On contrary to that, in a Micro level study, the possibilities of visiting fields and collection of real occurrence points are possible and by using such points the authenticity of the findings are more valid. I am a field worker and believe in direct visual appreciation of the species in the field and based on that the modelling are being carried out. During the past my articles got rejected from many reputed international journals on the ground that the study has been conducted at smaller areas. But, I am not discouraged and continuing my efforts and working with the same momentum. By expressing all these comments I am not trying to prove that Macro level deductive studies are not significant. In fact, such studies have very high impact and good findings too. However, Micro level inductive studies should be also given priority in terms of publication in good journals. After all, both approaches are useful in research.
I invite the valuable opinions of researchers on this pertinent discussion.
Relevant answer
Answer
In fact, macro research serves the most developed centers and the micro only if they are related to them, which would lead us to think that the current shape of the economic world is replicated in science.
  • asked a question related to Species Distribution Modeling
Question
17 answers
I work with stream fish species and have used Maxent to model species distributions within stream networks. My workflow has been to use stream segments (e.g., NHDplusV2 polyline dataset) as my base layer for modeling, linking covariates to these segments, and using the Maxent samples-with-data (SWD) approach to run models within the Maxent java applet. In this way, I have not needed, nor used, rasters to characterize my covariates (i.e., think of the stream segments as my model grain, or the "pixels" in a raster).
Although this approach has worked fine in the past, I'm now finding myself unable to adopt many of the new approaches for evaluation of model complexity and fit (e.g., calculation of AICc) that are being employed in several R packages (e.g., ENMeval and MaxentVariableSelection).
In R, I can run my Maxent models with the 'dismo' package with a simplified SWD format (one data.frame with all covariate data, another vector file indicating 0 (background) or 1 (present) for each row. However, all implementations of an AICc calculation I've come across involve the use of raster files, including the packages ENMeval, MaxentVariableSelection, and rmaxent.
Any suggestions on how to move forward?
Thank you,
Andrew
Relevant answer
Answer
Jen Tinsman - Confirmed: ENMeval v2.0 will have SWD functionality and some new features for plotting MESS maps of occurrence partitions.