Questions related to Species Distribution Modeling
I am trying to run an SDM for present and future conditions using Maxent, but the picture of the model appears to be not identical as well as losing a block of pixels near the top of the study area. Note that I had set my study area shape file in 'processing extend' and 'raster analysis' sections while extracting the layers. All the variables were processed at the same extent and cell size for maxent analysis.
Here I attached my study area, present, and future distribution for your convenience.
What could be the possible solution?
I am familiar with 'sdm' package to construct species distribution model (SDM). Now, I am an facing issue.
I used predict() of 'dismo' to predict the distribution of species few weeks ago and it ran smoothly without consuming much times. It hardly took 3-5 mins but now it is running since morning (8 hrs+) for same data. Yet I am waiting for the result. I have to prepare SDMs for 10 different specues. If it takes too much time to predict a single model then I have to wait for many days .....which is annoying.
How can I fix it ? Can anyone shares his/her thoughts in this regard. Or is there alternative to save time.
PS: PC configuration: 8 GB RAM, AMD Processor
I want to download BioClim data containing the Generic grid format (bill) to use in species distribution modeling
I'm working with a multiband raster and I want to extract each band to a single raster band. I tried two approaches using R (raster) and QGIS (gdal translate).
I noticed that the output file from QGIS is around 25MB while the output file from R is around 2MB. The original multiband raster is around 490MB with 19 bands. This led me to thinking that the QGIS output is more reasonable to use. Note that I will use the bands for SDM.
Is the R output still useable for this purpose? Can you also explain the difference in file sizes?
I have no idea because I'm not sure what kind of corellatin it is (i.e pearson or anything else)
I am currently using Hmsc package to apply JSDMs to freshwater fish and mussel communities. Most of all, I want to study the correlations between the species. Is it correct to identify a correlation as positive or negative with at least 85% posterior probability? Or should I increase this to 90/95%?
I am trying to find out the earthworms species diversity in different state if India. Can anyone have any data or idea regarding earthworm species diversity in India?
Since our targeted species is found only in the 2 km region of the study site, we are planning to use 30 m spatial resolution climate data on our Species Distribution Model. But the problem is that my local weather station is capable of providing 20 km resolution data. On the other hand, if I use WorldClim data that is also 1 km.
My questions are
1. Can I use these downscaled data (from 1 km or 20 km) on my local study on SDM, which will be on 30 m resolution?
2. If I downscale, will be there any variational changes on climate data? Is it acceptable to do so?
Please note that I'm new to this field.
Thank you for your valuable time.
My study site is relatively small and the targeted species is found as continuous patches. Do I need to consider Patch size/area in the MaxEnt model?
Does patch size have any meaningful measurable values that can be included in the MaxEnt model?
Currently, on CHELSA there are 5 models of future projection available. How to choose the best 3 of them? Are there any parameters that should be prefered when performing SDM on MAXENT (bioclim data)?
There are two parameters that I think may affect the reliability of MAXENT output. ECS (equilibrium climate sensitivity) or TCR (transient climate response). But I am not completely sure about it.
Any kind of help and suggestion would be greatly appreciated.
I am trying to choose models for my Species Distribution Analyses. I know that according to CMIP6, up to 100 models should be released. However, I found only 68 in Meehl et al. 2020. Moreover, there are only five models available to download from the Chelsa database. I wonder how I can prefer one model to another? According to ECS (equilibrium climate sensitivity) or TCR (transient climate response)? If yes, which one is crucial in choosing the model?
I am currently creating a species distribution model with Maxent in R. The problem I am facing is that the habitat suitability prediction is always different. When I run the model twice, with the same configuration, two different maps will result. That is because I use random background points (there is no sample bias in my data, so no need for that) and these are of course always randomly sampled.
So how should I treat that. Should I create ten maps and average the habitat suitability value? Or should I evaluate the model each time and choose the one with the highest evaluation metric score?
If someone has literature on that would be good as well.
I usually download future climate data from Worldclim.org.
Their website says that "Data at 30-seconds spatial resolution is expected to be available by the end of March 2020", however, this has not materialized . . . https://www.worldclim.org/data/cmip6/cmip6climate.html
Does anyone know of alternative sources to download future data at this (1km) resolution?
We are at the beginning of making predictive modelling of an invasive plant species using MaxEnt. The species is found as a patch over the study area. I am new at using this model, have a piece of limited knowledge about it. I have reviewed several papers where only point locations of the present occurrence had been used.
Since my target species occurs as a patch, How can I take the polygonal area of the species where it occurs, instead of point location data?
Or are there any other methods to cover the whole patch of the species into SDM?
I used MaxEnt to carry out SDM for 16 species across 26 environmental variables. I did 10 replicates for most species and more replicates for some due to the small sample size. To show my results I basically replicated the plots that MaxEnt makes, using the text files to draw the response curves, but with no confidence intervals so my figure wouldn't be too overcrowded. Now I want to make the same figures but with the confidence intervals to put in the appendix, but I can't find the data to draw the confidence intervals. Is it possible to obtain this data somehow, or do I have to calculate the confidence intervals myself if I want to make my own figures? Has anyone been in the same situation before?
Thank you :)
I'm trying to run species distribution model with dismo package in R and I would like to get a better response curves of my variables.
This is how I've done, but the resulting curves are rather rough.
me <- maxent(variab, ab)
How can I improve the result? For example getting smoothed response curves?
Hello! I am currently doing GARP species distribution models. However, I can't find a clear paper about the model's assumptions on what the input data should be or how should they behave.
For example, is it necessary that environmental data are normally distributed? Or that data should follow a certain function (e.g. logistic or linear)?
Thank you for your answers!
According to the evaluations we have made among our colleagues on this subject and our own inquiries, another requirement has emerged. This means that there is a lack of standardization of the numbers used in the world's herbaria and given as the plant type codes. For example, for a plant samples of a species, collected from Turkey, stored in Geneva (G) herbarium, it has a different codes in other herbarium. For this reason, the species should be presented with the herbarium codes to be added to the country origin codes. Or some other digitising and coding systems. In this way, both the origin is indicated and even the collected plants can be classified. What do you guys think about it?
"TUR-G 125" instead "G 125"
Country codes are given below:
I would like to predict some species distribution models to the LGM. For (bio-)climatic variables alone this is straightforward and still reasonable for topographic variables based on a paleo-DEM. However, I could not find a global dataset (raster, *.tif, *.afd, etc.) of paleo forest cover in percentage.
Please, let me know if you are aware of such data. Alternatively, I'd like to hear your opinions on how to model proportional LGM forest cover. For the latter, I would like to know which variables and which algorithms (ANN?) you would suggest to model forest cover.
Hi, I am not really sure if this question is valid nor makes any sense.
But for example we have a single (imaginary) species, let's say Pikapika pii, and determined its genetic diversity. PCA and STRUCTURE clustering showed three groups, GRP1, GRP2, and GRP3.
My question is, can I treat these three groups as "separate species" and use it to run a multispecies SDM, or run an ensemble of single species SDM, or this is not valid/possible at all.
I would appreciate any help/correction with this thought. If possible, you can also refer publications that I can read, or experts that I can directly consult/talk with.
Thank you so much for your time and help.
We have a locally endemic plant species which distributes just a specific narrow area. It spreads almost everywhere in that area, but it does not occur anywhere except that area. We want to model the distribution of the species with maxent. The bedrock in the distributed area is the same everywhere and the elevation variation is really low. Would it be right to produce artificial presence data and model it by putting artificial (random) sample points into the field?
I am trying to build my own species distribution models (SDMs). I have seen that some authors use, say, 100 replications, but I do not understand how they get to this number. Hence, my question is: when it comes to the replications, how is that number decided? Is there a rule of thumb?
Thank you a lot in advance!!
In many study areas, both the line transect survey data and camera trap data are available. How to combine them in a species distribution model is what I care about. Line transect surveys cover large areas with a low survey intensity, whereas camera trap surveys cover small areas for 24/7. Suggestions for appropriately weighting these two types of data are highly appreciated.
In Maxent model evolution the output result using AUC value; If AUC is more then 0.7 and close to 1 that result is best. But if we ruled out AUC value for evolution then what are the methods for selection the best result? Using Different variables for a single species I run Maxent model and I have four (4) output. Then how to I evaluate the best result for my study? How to I Compare those result and select a best result except AUC value?
I am the new user of "R" and "Maxent.jar". And I generated some species distribution models by "Maxent.jar". But I am getting in the trouble that is I can't use R package to evaluate this models. The most difficult thing is I can't make R to read models generated by "Maxent.jar" whatever the file extent is .asc or .tif even .img. Maybe my thought is naive and silly, but what I want is the R package can read my models and can make me do some evaluations like AICc, Kappa, TSS, CCR, AUC and so on.
The R package I am using now is "ENMeval 2.0" by Jamie M. Kass (It is a good R package but maybe it is not easy for the new user of R and Maxent or I am too naive and silly to use it.)
Plz, I need your help, thx.
Since it is preferred to check any autocorrelation among the variables; one has to remove highly correlated variables to run an SDM (I am using MaxEnt). For my study, I have calculated the Pearson correlation coefficient (r) among the variables (correlation matrix is provided). But as I am new to this, I am finding it hard to interpret the correlation matrix table. Meaning how and on what basis, I am going to remove the variables? (I am taking the threshold ≥0.8 for the purpose), I need some expert suggestions.
Q1. How the variables are chosen? And please suggest me accordingly to the provided table. Which are the variables I have to select for my study?
Q2. How one variable is selected, when there is a high correlation between two variables?
Q3. Is negative correlation not a problem? I am asking this because; I have seen few papers where highly negatively correlated variables are also selected.
Please help me.
I am not a spatial analyst or an expert in a related field, that's why I decided to contact you for advice or help in extracting maximum information from the data I have.
Data (2 trials) - routes crossing the study area (about 1250 sq. km.), about 1000 points along the lines represents an event of species occurence, and a collumn about the number of occurences (z data) attached to the x,y coordinates.
Which Exploratory Data Analysis tools I should try to describe the pattern? I have an idea about visualising the Standard Distance circle (Centrography), box plot of an events, for example. Maybe Kernel Density? And what I can do with z data? Maybe interpolation? I think It would be easier to work with transects, but I don't know what I can do with this kind of irregularly shaped data. Can I do some tests or predictions with this kind of data, or the samples are too small and not representative? For example I have a hypothesis that the distribution of an event is not random, and the binomial probability and number (z) of events is higher in the central south part because of some factors.
I look forward to your suggestions on what tools, tests I can use, which concepts I should learn about.
Now I am trying to use R and QGis for visualisation and analysis.
I also apologize that my english may confuse you.
I have seen maxent modelling being applied to a range of distribution types from small scale to large scale. Is it effective to use maxent for small areas like Protected areas and National Parks (<1000 km2 area) . In my opinion macroscale covariates like the bioclimatic variables make little impact in small ranges. Therefore, using finer scale covariates that make an impact within the study area would make a better impact on the model. I am looking for ideas and suggestions.
#1.Background of the work: I am trying to workout the habitat suitability modeling of a tree species endemic to the Western Ghats. This query is regarding a confusion that arose when doing General Niche-Environment System Factor Analysis (GNESFA). #2.Main Query: While doing GNESFA, I got some strange results while plotting the niche on the factorial axes ( please see the attached figures). Extreme dark circles are forming in the scatter plot. Can anyone please help me in finding out why this is happening. #3.Attachments: Final result and specific plot of niche on the factorial axes are attached.
Thank you very much for all your patient reading and time. Hoping for a positive response.
I am currently searching for alternatives for the Neotoma database. The database did not offer the expected amount of data I need for the Mediterranean. Does anyone know alternative databases to Neotoma, where I can find geographically referenced pollen data for the Mediterranean (e.g. Universities, Collaborative Research Centres, etc.)? I am pleased about every hint.
I am looking for a tool in ArcGIS to help in identifying priority areas for conservation based on the output of species distribution models (maxent, GLM, BRT,..). Something maybe similar to the zonation algorithm.
I've been doing MaxEnt modelling for a while now using bioclimatic and topographic variables. I have already obtained a 'water line systems' of our country which is in a shapefile, however, I haven't tried to incorporate this "distance to water/water systems" as my input file for the modelling since I am not sure what format this variable should be in as well as how to prepare this data for the modelling. So, does anyone know how to incorporate this file for MaxEnt modelling?
I am wondering if anyone knows of a species distribution model that can take in percentage data (i.e. presence data with prevalence values attached to them)? The dataset I have right now has many occurrence points of a species, with each point containing information on the prevalence of the species at that particular location. However, the current SDMs (e.g. MaxEnt, GAM etc.) that I am using only allow me to input all these as presence data and treats all presence data equally in predicting the potential distribution of the species of interest. I would like to run a model that can take in the prevalence of each occurrence point and weight the points accordingly to produce the final results. I would greatly appreciate if anyone has any information on this, or if there's an alternative method that I can use to account for percentage data for each occurrence point. Thank you so much in advance!
What is the significance in carrying out species distribution modelling or habitat suitability modelling in a tree species with very limited occurrence (~under 10000 trees). Other than points like:
1) It will help in locating unknown populations based on the probable species distribution map. (back to field studies)
2) Based on the probable location map we can try introducing the plant to that location.
1) When comparing bioclimatic variables for doing species distribution modelling, which is more relevant when working on narrowly endemic tree species. ( regional study)
2) What is the suitability of BIOCLIM variables for a regional level study?
3) Are there other methodologies for comparing the variables when trying to do distibution modelling ?
I am working on the species distribution modelling of an endemic forest tree. After completion of the modelling , how can I use this data to trace back or find new populations in the study area. Is there any established pipeline or work plan for the same?
I am specifically looking for ways in which I should start looking for actual population clusters of tree, based on the predicted map that we generate.
Does anyone have ever troubled with an error of parallel calculation? See blow
R Version: R version 4.0.3
snowfall 1.84-6.1 initialized (using snow 0.4-3): parallel execution on 5 CPUs.
Library biomod2 loaded.
Library biomod2 loaded in cluster.
Error in checkForRemoteErros(Val):
5 nodes produced errors; first error: arguments imply differing number of rows: 5
Thank you very much!
I am running MaxEnt with 15 replicates.
In MaxEnt output file there are 0 to 14 sample predictions and background predictions CSV files.
I want to know which replicate has the average sample predictions and background prediction values.
I am running MaxEnt modeling for my target species distribution. There I need to select the least correlated variables to avoid multicollinearity. This multicollinearity can be tested using various tests such as Pearson's correlation coefficient (r), Variance inflation factor (VIF), and Principal component analysis (PCA). Among these, I find PCA a bit difficult to understand. My questions are
1). Can I go with any one of these methods, to check collinearity?
2). Which one of the tests is the best? if any.
3). Will it be okay if I only go with Pearson's correlation coefficient (r)? Will it make my result and interpretation sufficient?
Hi, I am currently undertaking a species distribution modelling project using maxent and RCP predictions.
However I am encountering issue where when I put it the required number of replicates (5) it will complete the first replicates (labelled 0) and then stop displaying an error saying layer bio 1 is missing and then failing to produce the further replicates. Any help would be much appreciated.
The field of species distribution modelling has experienced fast growth in the last decade. With so many R packages available (sdm, dismo, ENMeval, BIOMOD, SDMtune, ssdm, esdm, ENMTML, ...) it is difficult to find the "best" approach. Which is the most comprehensive platform to fit, evaluate and project/predict species distributions across space and time, in addition to assessing variable's importance and response curves?
Hello. Can some one show me how to make SDM with Maxent for future distribution ? Is there some specific options for future distribution ? Or is it exactly the same process as for a current distribution ?
I already have the present occurence and future climat database ?
Hello Maxent Community,
I have been generating SDM models and projections in Maxent. I have had some success with one set of variables (some Hydroshed variables mixed with World CLIM variables); however, I am having a bit of trouble with a model based solely on World CLIM variables.
I used a Peasron correlation matrix to select eight World CLIM variables, which model well with my species' occurrence data. The issue arises, when I ask Maxent to project on future World CLIM variables - the returned projection is almost entirely blue. There are a few specs of red and yellow (indicating prediction probabilities), but the vast majority of the projection is blue (indicating zero prediction probability). I tried rerunning the model with clamping turned off, but the problem persisted.
I am relatively novice with Maxent, so I was wondering if I am overlooking a setting that may help correct this issue. I am a bit confused, because when I ran the projection with the Hydroshed variables and some of the same World CLIM variables, the future projection looked just fine. Any advice or suggestions would be tremendously appreciated.
Thank you in advance for your help!
Hi, guys. Thanks for reading this question.
I am modeling a reactor (steady, RANS). CH3OH and O2 go into the reactor via inlet (a jet). After reactions, species leave the computational domain via an outlet. I turn on the species model and define the reaction between CH3OH and O2 as a two-step reaction,
CH3OH+O2 = CO + 2H2O ; CO + 1/2 O2 = CO2
I choose the eddy dissipation model and run the simulation. The temperature field looks reasonable (look at the attached picture). But the mass flow rate (kg/s) of atom 'C' at the outlet is almost 3 times bigger than that of the inlet. The mass of atom 'C' is non-conserved!
Then, I use a one-step reaction instead of a two-step with all other settings unchanged. I check the result and find that the mass flow rate (kg/s) of atom 'C' at the outlet is the same as that of the inlet. The mass of atom 'C' is conserved this time!
It is weird! Why the two-step reaction leads to the non-conservation of atoms? I am sure that the stoichiometric coefficients are balanced. I would be very grateful if someone can discuss this problem.
Do you know any reliable Python library for Species Distribution Modeling which provides methods for creating absence data?
I've come across multiple packages for R like "dismo" and "biomod2" but I've found nothing for Python.
I'm not familiar with algorithms provided in these R packages I mentioned above, are those algorithms (like SVM or BRT) somehow modified for this type of Modeling? And is it possible to use Python machine learning libraries like sickit-learn for this purpose?
Thanks for your attention.
I need some clarifications on these things:
- What are the most important differences that can be taken into consideration between the various algorithms used in SDM ?
- What is the most machine learning model widely used in SDM algorithm ?
I am currently in the process of trying to learn how to do SDM correctly and I want to evaluate the models I run with OpenModeller. My understanding is that one should use om_test, but I am having trouble understanding exactly how to use it. Could anyone help me out on how to use it?
From my understanding, one runs om_test with the Serialized model created from om_console, but not the projection result. However, I thought that what one should actually be interested in evaluating is the projection result and not the model one created to make the projection. Is my understanding of what it means to evaluate a model wrong? Hopefully my question makes sense. Any help would be greatly appreciated.
Hi everyone, I have to clarify a few of these things
01. Is there any difference “10 percentile training presence logistic threshold” ( maxentResults.csv) in the column titled and the "10 percentile training presence" in the .html file(output)?
02. When representing unsuitable habitat and suitable habitat can be used "10 percentile training presence" instead of “10 percentile training presence logistic threshold”? If yes which one is most suitable
I am working on Species Distribution models. I had taken the data from GBIF and DIVA. but the resolution of both the data sources is different. I am using QGIS (Don't have ArcGIS). Suggest to me how I can create the same resolution and dimensions for the data.
Hi guys, I would love to know how I could go about solving the above question. I currently have 2 different species distribution models, one created with the current climate projection and the other using a future climate projection. I already know how to use R to calculate the 3 different statistics which concerns niche overlap (Schoener's D/WIlliam's I/Rank correlation) using ENMTools package. Now I would love to find out how I could get an absolute value of the average change in habitat suitability in the SDM (concerning each pixel I guess). I know I could potentially use ArcMAP and utilise raster calculator and zonal statistics as a table functions to find this change but is there a shorter way I could go about this?
I'm working on a species distribution model of anteaters in Maxent. I want to project the model with a CanESM5 ssp370 bioclimatic variable. To do this, I need first to convert this raster to a ascii file. I tried to do it in Qgis, but the program only transforms one band rasters. Does somebody knows another way to transform this raster? Or perhaps a way to separate each band from the raster so I can work each one separately?
I thank you all beforehand.
I am proposing that a hierarchical approach is used where coarse resolution variables are used to run a maxent SDM and delineate a presence/absence map for my species (tropical conifer tree species). After which, the higher-resolution variables derived from the LiDAR data used to generate models within these areas. Can anyone foresee potential problems with this approach? or have any better ideas?
Background: VIF-stepwise test deals with multicollinearity and automatically eliminate the highly correlated variables according to the determined threshold. The problem is when using bioclim 19 variables, there are variables derived from original variables like bio1, 2 and 12 and this affects the elimination process. Since they are the source, they are found highly correlated with the other derived variables, thus excluded first. VIF-step doesn't distinguish between original -important variables in my opinion- and other derived variables.
Now for PCA, it works on one by one collinearity with no self eliminating option so the user must choose the variables to be retained and this requires defense later -on why we choose this variable over the other-. I thought in another approach where I will use PCA along with jackknife results to determine what variables to retain. This approach will allow me to retain some original variables and I can depend on their Jackknife results in the defense.
My main questions are: is the third approach scientifically valid? and is it more robust than VIF-stepwise test?.
Side question: how can we defend the variables we chose using PCA alone?
I'm creating SDM for an endemic species to a country. The country have different areas where climate, elevation and LULC differ. Is there a tool / way to indicates the most limiting factor / factors for each area in that country?. I don't have enough data in each area to make separate models in maxent. I can obtain one model only for the whole country. So, is there a helpful tool in arcgis, an option in maxent or any known program?.
I'm trying to create current and future SDM using maxent for a native species to an area. What would be different if I trained the model on that area then project it worldwide from training the model worldwide from the very start and then projecting worldwide too?.
Monthly values of minimum temperature, maximum temperature, and precipitation were processed for nine global climate models (GCMs):
BCC-CSM2-MR, CNRM-CM6-1, CNRM-ESM2-1, CanESM5, GFDL-ESM4, IPSL-CM6A-LR, MIROC-ES2L, MIROC6, MRI-ESM2-0, and for four Shared Socio-economic Pathways (SSPs): 126, 245, 370 and 585. My confusion is which one would be accurate for the purpose of building niche models in tropical climatic conditions. Or do I need to use any other GCM?
I used ENMeval R package to produce my species distribution model with algorithm "maxent.jar". I hope to calculate TSS for my binary map output from generated from maxSSS. However, I am unsure about how do I get the sensitivity and specificity to calculate my TSS score for my model.
Can I check how do I calculate TSS from what I have on hand from ENMeval output for my model?
Clustered/random data are very common in data analysis. For example, if I want to model the occurrence (presence/absence) of a species over multiple countries I could suggest that the countries are clustered/random effects. In theory I could use a binomial GLMM. However, the structure of the dataset, performance and information resulting from this model are not satisfiable and mostly do not fit my questions. The non-linear responses, high variability of the data, (randomly) missing values of the predictors, categorical predictors, and unbalanced dataset make it more challenging. Because of this I often use Random Forest models. Although, (sometimes) it suggested RF models are black-box models this is hardly the case. The return of variable importance, display of partial dependency plots, extraction of split-points at the root node, depth and number of split of the predictor variables makes it a complex white-box model. These results are also fitting most of the questions I ask. One could suggest a to use a GAMM, but there are so much buttons to tweak on these models, I do not feel confident and comfortable using them.
To handle the missing values, categorical data, and unbalanced datasets I used the randomForest package for R (Liaw and Wiener, 2002). The randomForest package has the possibility to impute the median for missing values and stratify (downsizing) the data in unbalanced datasets, which makes well suited for the data I work with. The stratification of the data is key as well as the imputation of the median. However, a drawback is that the randomForest package cannot take in account clustered/random effects. This then ends up as a discussion points for basically each analysis.
There are some scientific publications of MERFs (i.e. A. Hajjem et al. 2014) and R-packages of available (i.e. MixRF). However, from the description of the manual of these packages it does not seem they can impute the median and stratify the data. I do not want to lose a lot of my data by balancing my datasets before analysis and I do not want to lose information by removing incomplete samples.
Is there any news on an R-package that implements RF models that can handle al these things? Or, is there a suggestion for other types of models in R which can return similar information as the RF models and are (sort to say) user friendly like the randomForest package?
Thank you in advance,
Liaw, A., Wiener, M., 2002. Classification and Regression by randomForest. R News 2, 18–22.
Ahlem Hajjem, François Bellavance & Denis Larocque (2014) Mixed-effects random forest for clustered data, Journal of Statistical Computation and Simulation, 84:6, 1313-1328, DOI: 10.1080/00949655.2012.741599
Hello all. I have a technical question regarding species distribution models on intertidal species.
The main issue here is the fact although the species in question are associated with intertidal regions, I come across a problem:
- BioOracle variables do not have information for the occurrence data of the species (from gbif). I think that the coordinates, although being correct and placed in intertidal regions, are not considered "ocean" or "marine" by biooracle?
Has anyone ever encountered such a situation? I wasn't looking forward to loosing occurrence points since the ones with Nan env values make up a lot of the total number of presences.
Thank you in advance for any advice.
Dear Species Distribution Modelers and All,
Have you ever tried to select the specific month/quarter (e.g. wettest quarter for bio8) in a way other than the default one? If yes, how? If not, why not? E.g. in this paper a novel method (called 'static' approach) is suggested for calculating bioclimatic variables for future time periods: the month/quarter is selected once in the reference ('current') period and used later as it is fixed for all the studied periods.
Have you ever considered this method, or do you think it may have relevancy for your further research?
Comparison of results according to variable combinations
I would like to compare SDM (Species distribution model, ex. Maxent) results according to a combination of about 10 variables.
I think it will take too much work to do this one by one.
So I am looking for a package or code to do this in R or other programs in a short time.
It would be nice to compare the results with AUC values or other values.
If this is possible, I'm wondering if I can get the results in a raster (or workable in gis).
I am wondering if there is a way to express numerically which variable combination is the best, even if it is not the result of SDM.
Thanks for reading.
In our study system, using BIOMOD2, we are obtaining SDM models with high evaluation metrics (i.e. TSS >0.8) where all the selected variables have very low variable importance values (<0.04). Isn't this counterintuitive? Should not a highly evaluated model have at least one variable of relevant importance value?
I want to know the best method to select the pseudo-absences between : minimum Convex Polygon method, kernel density estimate and from the buffered area around the occurence points. Any other suggestion will be welcome.
Thank you in advance
I have recently developed a Maxent distribution model in R for a reptile endemic to Madagascar, using native-range presence-only data. The model performs well.
I have now projected this model onto Florida, where it identifies no suitable habitat. However, my focal taxon is well established in Florida. I did not use Florida presence data in the model because I wanted to check if the model would have a priori predicted the colonisation event.
I would like to recover quantified probability values for the known Florida occurrences; at the moment, I can see in general that the predicted probabilities in the colonised area are low, but I cannot quantify them. I can find the equivalent, quantitative values for my known Madagascar occurrences (because they were included in the model) in the model output (Training points) or using evaluate in 'dismo' (Validation points). Does anybody know of a method/package/function which I can use to find these values for my Florida points (currently just stored as a .csv).
I can share more technical details of the model if necessary; it is a fairly standard Bioclim/Maxent/presence-only data in R, using mainly 'dismo'.
I am doing a series of SDMs using MaxEnt for different species of bats, I am using different features and regularization values, in addition, I am using spatial partition and jackknife for cross validation.
I would like to start selecting the models according to their significance, and then continue with other metrics such as the omission rate and AIC in order to select one of the models.
I know that the partial ROC exists but I don't know if it can be done with the partitions that I am using (space partition and jackknife) or it can only be calculated with random non-space partitions.
That is the question
I want to employ Occupancy and Detection probabilitiy estimates for a large vertebrate species. Do you recommend using and running different models with presence-absence matrices by sex (i have robust data on the sex ratio in my study population), or using sex as a covariate (i.e. using the typical species presence-absence matrix) of occupancy/ detection? The rest of the covariates will be from the habitat type and the species community at my study sites.
I was doing some geostatistical analysis (variogram+kriging) for a "presence only" type data in a species distribution modeling context. Since, we know that when estimating the (empirical) variogram, the attribute is basically assumed to be a realization of continuous random variables (although an attribute can occur in counts too). If the attribute is just the presence, and no sub-categories then all the values at all positions will be same (say 1, if we denote a presence by 1). Hence the variogram can not be calculated, not even the indicator variogram. In some papers such as  and references there in, a grid based approach was used. In this approach a grid of certain size (e.g. 10 x 10 m etc) was superimposed on the sampling area and the number of species inside each cell were counted. This constitutes a count/frequency table like data. In the other approach pseudo absences or background data were generated using some algorithm e.g. Maxent etc (see e.g. [2, 3]). The pseudo absences are generated taking many factors into account and stacked/combined with actual data. This is merely generating x, y coordinates and giving it an absence status (say 0s). The result is a binary data with two categories, presence 1 and absences 0.
Now the questions that are bothering me are
1. For the grid based approach, what should be the optimal cell size? How to find it and decide it? How to proceed with variogram with kriging etc?
2. For pseudo absences/background approach, how many absences (as compared to actual data)? How to decide it? How to proceed with variogram with kriging etc?
1. Rossi, Richard E., et al. “Geostatistical Tools for Modeling and Interpreting Ecological Spatial Dependence.” Ecological Monographs, vol. 62, no. 2, 1992, pp. 277–314. www.jstor.org/stable/2937096.
2. Tomislav Hengl, Henk Sierdsema, Andreja Radović, Arta Dilo, Spatial prediction of species’ distributions from occurrence-only records: combining point pattern analysis, ENFA and regression-kriging, Ecological Modelling, Volume 220, Issue 24, 24 December 2009, Pages 3499-3511.
As a researcher with about 15 years of experience I learned from my submissions of articles to various national and international journals that Macro level studies are given preference over the Micro level studies especially in international journals with high impact factors. To my beliefs Micro level studies with proper field surveys yield better empirical data and realistic information. Most of the Macro level studies are based on deductive approach that is without field visits. Researchers who are carrying out such studies at country and global level are based on the available data and literature reviews. More specifically, when we talk about the species distribution modelling using MaxEnt Macro level studies are based on the occurrence data either GBIF or other databases collected in different time periods. Here, it is pertinent to mention that such data are old data and who knows at the time of modelling the species might have extinct from the locality. On contrary to that, in a Micro level study, the possibilities of visiting fields and collection of real occurrence points are possible and by using such points the authenticity of the findings are more valid. I am a field worker and believe in direct visual appreciation of the species in the field and based on that the modelling are being carried out. During the past my articles got rejected from many reputed international journals on the ground that the study has been conducted at smaller areas. But, I am not discouraged and continuing my efforts and working with the same momentum. By expressing all these comments I am not trying to prove that Macro level deductive studies are not significant. In fact, such studies have very high impact and good findings too. However, Micro level inductive studies should be also given priority in terms of publication in good journals. After all, both approaches are useful in research.
I invite the valuable opinions of researchers on this pertinent discussion.
I work with stream fish species and have used Maxent to model species distributions within stream networks. My workflow has been to use stream segments (e.g., NHDplusV2 polyline dataset) as my base layer for modeling, linking covariates to these segments, and using the Maxent samples-with-data (SWD) approach to run models within the Maxent java applet. In this way, I have not needed, nor used, rasters to characterize my covariates (i.e., think of the stream segments as my model grain, or the "pixels" in a raster).
Although this approach has worked fine in the past, I'm now finding myself unable to adopt many of the new approaches for evaluation of model complexity and fit (e.g., calculation of AICc) that are being employed in several R packages (e.g., ENMeval and MaxentVariableSelection).
In R, I can run my Maxent models with the 'dismo' package with a simplified SWD format (one data.frame with all covariate data, another vector file indicating 0 (background) or 1 (present) for each row. However, all implementations of an AICc calculation I've come across involve the use of raster files, including the packages ENMeval, MaxentVariableSelection, and rmaxent.
Any suggestions on how to move forward?