Questions related to Ecological Statistics
I would like to perform a redundancy analysis with a response coded as a distance matrix (disimilarity in species composition). That can be done using #dbrda function in #vegan package for R. However, the problem is that I have as predictors two matrices: one with "raw" data (soil variables) and the other a distance matrix calculated from "spatial" distances among sites. Function #dbrda accepts more than one explanatory matrix, but not if it one is a distance matrix. Does anyone knows if there is other function in R able yto do this? Probably in #ade4 or #phytools?
It is somewhat common in ecological analyses to see a principal coordinates analysis (PCA) used as a variable reduction step, followed by the use of PCA axis loadings in linear regression (or related) analyses.
Do you know if non-metric dimensional scaling ordination (NMDS) site scores can be used in this same way?
Is there any reasons why they should not be?
Thanks in advance for your time & insights.
I am testing several environmental variables' ability to predict fruit production in a tropical forest over 24 years, with an eye on how climate change may affect fruit production. I need to consider several different time scales (time lags) for each of these environmental variables. For example, I'm looking at rainfall 3 months, 6 months, and 1 and 2 years prior to the fruit production.
I'm attempting to determine the variables to include in my model by using Bayesian model averaging by including 5 different time frames of 4 different variables (so effectively 20 different covariates).
My first question is whether including the same variable, but at several different time lags, would confound the variable selection process.
My second question is: once I've run the analysis (Bayesian model averaging of GAM using a spike and slab prior with the gamclass::spikeSlabGAM function in R) and identified the covariates with high probability of inclusion (posterior inclusion probabilities P[gamma = 1]), is it then appropriate to re-analyze the model with only the selected covariates? I have done this, and the results of re-running the model with only selected covariates slightly alters the inclusion probabilities.
One last question: If anyone is familiar with gamclass and the spikeSlabGAM function, what would then be an appropriate way of determining how much variability is accounted for by my model (i.e., is there a Bayesian equivalent to GAM Rsquared values)? I'm new to Bayesian statistics, so I apologize if this is a very basic question!
I designed a factorial experiment involving 2 explanatory variables (A and B, qualitative). Because I couldn’t achieve the assumptions of a parametric model, I used kruskal.test on the variable to explain (VAR) for A and B like: kruskal.test(VAR ~ A, data = data) and kruskal.test(X ~ B, data = data).
But, I was also interested in the effect of “A and B interaction” on VAR. So, does anybody know if it is right to perform a kruskal-Wallis test on interactions? Here, what I did it with R:
kruskal.test(VAR ~ interAB, data = data)
Moreover, in order to access which level of each variable is significantly different from each other, I used as post-hoc test after the kruskal.test: pairwise.wilcoxon.test(data$VAR, data$A,p.adjust.method ="holm",exact=F, paired=F). The pairwise test didn’t work on the variable interAB and I was wondering what method I should use as post-hoc test for each variable A and B and for the interaction interAB.
Any Idea please?
This is a real dbRDA plot using real invertebrate abundance data (taxa-station matrix) with environmental data (substrate characteristics-station matrix) as predictor variables. The plot is produced in PRIMER v.7. Invertebrate data is 4th root transformed, Bray-Curtis similarity was used. Environmental data is normalized, Euclidean distance was used.
My question is: why is the vector overlay not centered at 0,0 in the plot? Interpreting this plot, one would conclude that every sampling station within the study area has values below the mean for predictor variables 2 and 13, which is impossible. Why would the center of the vector overlay be displaced -40 units? How can this be? Why is the plot centered on the dbRDA2 axis but the dbRDA1 axis?
Please let me know if anyone needs more information. Thank you!
I have a large dataset of fish abundance as well as some environmental variables covering around 795 sampling sites. I have tried to find the relationship of my environmental variables with the biological data with the RELATE function in PRIMER-E. The results indicate that using Spearman rank correlation, the sample statistic (Rho) is 0.11. Now the significance level of sample statistic is 0.1% (much less than 5%), so according to the manual, this is significant result! I used 999 permutations to get this result. I am unable to interpret this result as I would usually expect that if the p value is significant, the corresponding degree of association should be high also. So, I would expect the sample statistic to be much higher than 0.11! (above 0.7 or so). Smilar situation with the distLM procedure, here the p value suggests that each of the variable has significant effect on the model but again the overall R^2 of fit is only 0.13! How is this possible that with such a poor R^2, individual variables are all significant.
I have used the square root transformation and Bray-Curtis similarity on the biological data and have normalized the environmental variable and Euclidean measure. I haven't transformed the environmental variables.
I would really appreciate it if someone can help me to interpret these results.
We have two nominal variables: diet and location. We want to know whether the diet of the species differs in various locations. I know it is possible to do it by chi-square, but I also have seen others have done it through the Kruskal-Wallis test. I applied chi-square, though I still believe Kruskal-Wallis is the right choice. Am I right? However, I have difficulties with performing this test on my data. Here is a view of a little part of my data.
Maybe I should re-order my data in another way?
I really appreciate it if you could help me.
I have several metagenomic samples, from which I got the viromes through a bioinformatics pipeline. I made a PCoA of those samples, and some of the samples cluster together. I would like to know which families or species of virus in particular are making these samples similar and at the same time different to the other samples. Is there a way I can do this?
Hello there, i would like some advice on how to correctly perform a CCA, and if i do need to transform my data. To do so ill explain my intends abut this work and how is my data. (i'll try to be as much specific and to explain correctly in english)
Im working on a ecologic assessment using fitoplankton and environmental parameters on tropial eutrophic reservoirs. To do so i collected weekly summer samples for each site. I made a monthly mean for the biotic (density data) and abiotic parametres (Tempratura, pH, NO3, etc...). This mean, is the data that i have right now to do a CCA (60 taxons and 19 environmental variables). The taxons are in densities (Org/mL) unit, and i'm thinking abut perform a pre PCA for the abiotic parameters and biotic to exclude some data. But i'm worring about my statistical approach, if it is correctly, or if i need to do some more steps, like a data transformation in my analysis..
Analysing morphology-habitat relationships in a montane plant species, I am thinking of using slope exposition (i.e., northern, southern slopes, etc.) as one of the habitat features, since a direct measuring of all the associated microclimatic factors appears problematic. I have plant samples from many sites within a montane area of ca. 1300 squared kilometres and for each site I have slope sexposition data (cardinal and inter-cardinal directions). I need to correlate this data with leaf morphometric anatomical/morphological traits.
I would be grateful if someone could also recommend some papers reporting relationships between plant growth/occurrence and slope exposition in mountains.
I have a dataset of fish collected in different rivers over different years each of them sampled a different number of times during different projects . This different number of observations among rivers in some cases can be important: e.g.
River X = 1 project (1 observation=1 sampling x 1 year);
River Y = 5 projects (15 observations= 1 sampling x 3 years x 5 projects);
River Z= 15 projects (105 observations=1 sampling x 7 years x 15 projects);
I want to calculate in the all region (so not interested in specific rivers) how the abundance is related to Years, Latitude, Altitude and Anthopic pressures (APindex). I thought to use the following model:
lme: Abu~Years+Latitude+Altitude+APindex + (1|river/project) + corrARMA (form = time|River/project).
-What is the influence of RiverZ with its 105 observation compared to the other rivers which have less number of observations?
-Am I accounting for this unbalanced observations in the random structure (1|river/project)?
-Do I have to account in the model for the different number of observations with (weight=1/n observation for each project?)
I'm having hard time choosing the right statistical method for my study. The data that I have is summarized below.
- Waterbird count data (absolute counts). Non-standard, varying effort but standard effort is assumed.
- The counts are done yearly and the years are categorized into two groups: high water level years and low water level years. Sample sizes are different for these groups (e.g. there are 8 high level years and 6 low water level years)
- I want to see if there is any statistical difference between high water level years and low water level years in terms of mean/median number of birds counted.
Because the count data is not normally distributed, I directly used Mann-Whitney U-test for the purpose. But I wanna know what other or better methods can I use for the same purpose? I also want to compare the two groups in terms of different biodiversity measures like species richness.
I have maybe for the "time series" experts a silly question:
-I have a dataset of European rivers =80
-In 50% of the rivers I have more than 1 project; in the other 50% is 1 river = 1 project
-In 50 % of the projects I have data collected only for 1 year; in the other 50% of the projects data were collected over years (from 2 untill 20 years, depending on the project)
->I want to assess the Fish diversity depending on the altitude, latitude, catchment size.
After exploring data for the model assumption of normality, variance heterogeneity etc..I though to run this model:
mod<-lme(Fish Diversity~log(altitude)+log(latitude)+log(catchment size), random~1|Rivers/Projects, method="ML", data=dati)
When I look at the residuals of model mod and at the acf (residuals(mod) and pacf(residuals(mod), they are pretty good but in acf there is autocorrelation in lag1
and in pacf the line goes slightly over in lag 3. I think I would give it a try with CorAR1 (p=1) correction in lme.
My questions are:
1- Is the model developed in your opinion correct?
2- Can I fit a correlation CorrAR1 in the lme by just looking at the acf and pacf plots from the model mod? As u see I have different project over time that means potentially multiple time series (for each project). Can I just fit a unique AR1 structure looking at the residuals of the model (without CorrAR1) and not at the raw data and assume that the same temporal trend is present in all the projects analysed? How can the acf and pacf know what is the temporal repetion (i.e.
how the acf and pacf biuld the lags in the plots)?
3- if the question number 2 is yes, do I have to organise in the dataframe chronologically in the dataset for each project? (e.g. Project1 from 2000 untill 2008; Project 2 from 1998 untill 2015, and so on?) as dati[order(dati$Project_names, dati$Year_evaluation), ]
and give to the corrAR1 the form structure form=1|Rivers/Project_names
Would this model be ok?
modAR<-lme(Abu~log(altitude)+log(latitude)+log(catchment size), random~1|Rivers/Project_names, method="ML",
correlation=corARMA(form = ~1|Rivers/Project_names, p=1)
Thank you for your time
If I have a 100 Square Km of forest site (homogenous vegetation accros the site), how many ligne transects should I design to get reliable density estimates of forest primates through Distance Sampling methods - Is there a relation between study site size and number of transect somewhere ?
I am trying to create a model in R that accounts for my survey. I’ve been doing a lot of reading over the past couple of weeks trying to understand the best way to do this, and my brain is ready to explode so I thought I’d ask for some advice!
I am studying a particular species of reptile, trying to discover the environmental variables that account for their distribution over my study site.
The study involves placing refuges that attract the reptiles evenly across the site, and counting the reptiles that I find underneath. There are 68 refuges in total across 34 grid squares: Each grid square contains 2 refuges, of different materials. I have surveyed these refuges 11 times, so this is a repeated measures study.
My dependent variable is number of reptiles found. This is count data so I think I need a Poisson distribution. I thought I’d have zero-inflation but using a goodness of fit test for Poisson distribution tells me no, as does a comparison of mean and variance:
Reptiles under refuge – Frequency
0 - 579
1 - 140
2 - 24
3 – 5
Mean = 0.271
Variance = 0.302
My independent variables are as follows:
• Date of survey (I assume I use this to tell R that this study is repeated measures)
• Temperature under each refuge (continuous)
• Proportion of area around refuge that is scrub vegetation (could be converted to area if proportion is problematic)
• Mean vegetation height immediately around each refuge (continuous)
• Material of refuge (binomial – although as I have already found that there is a significant preference by my reptiles for one over the other, should I still include this?)
• Angle of slope that the refuge is on (continuous)
• Direction of slope (continuous)
• ID code of grid square (factor – I’m not sure about this one. There were some squares that no reptiles were found, while others had several)
Temperature was measured for each refuge at each survey, while the other variables were measured once and assumed to remain constant. There are no missing values.
Do I need to check for independence between all these variables before including them? Should I use a correlation table for that?
From my reading, a repeated measures GLMM looks most appropriate for my study but I wanted a second opinion. I’m also getting confused on which factors are fixed and which are random.
Here is my attempt at building a model:
model <- glmer(reptiles ~ (1|date) + temp + scrub + meanvegheight + material + slope + direction + square, data = dataset, family = poisson)
Would this provide me with what I’m after? Please treat me as an ignorant ecologist rather than an experienced statistician! If anything needs clarification please just ask. Many thanks for your help!
I'm doing EBSD analysis on a same area after ECSTM analysis to define any diffrences that could explain dissimilarity behavior between two grains boundaries of a same type.
The EBSD data include sigma degree (CSL or not), misorientation and GBs plan. but also a parameter called deviation and two athers called plan(P1) and plan(P2), exemple:
GB1: sigma=3, misorientation=58.7, plan=-18 17 17, deviation=1.7, plan(P1)= 10 -21 -12, plan(P2)= -10 -9 4.
I would like to know what means deviation and if plan(P1) and plan(P2) are the real plans of the two cristals from either sides of the GB.
I collected benthic samples from 12 stations (3 samples per station across 4 locations) across mudflats from one estuary, in autumn & the following spring, totalling 72 samples. In addition, I collected one sediment core from each station per season (total 24).
Data analysis is being conducted through R, though I do have access to CAP4. Within R, I've generated dendrograms, NMDS plots, rarefaction curves and basic Simpson's diversity analysis (Vegan package).
Sediment analysis was conducted through Gradistat.
To understand benthic density/presence on sediment type, I'm trying to analyse sediment composition against benthos for each station/season. I'm assuming it is better to analyse against the full breakdown of sediment type rather than the generated classification (i.e. muddy sand/sandy mud).
However, to get this working in R, there is a lot of data that would need to be imputed into the main data sheet alongside the species/site data and I'm really not sure the best way to do this/how it would look.
Additionally, I also not too sure what package/analysis is the best to use to analyse these data?
Any help or advice would be greatly appreciated.
I'm working on dung beetle assemblages, and I would like to test the hypothesis that the community structure of these insects is different along a gradient of grazing pressure. In two similar sites, dung beetles were sampled into 3 levels of grazing pressure (High, Moderate and Low), with 5 pitfall traps in each level.
After analyzing my data with a Correspondence Analysis (where sampled communities are classified in 3 groups : High, Moderate and Low grazing), I would like to know if the dung beetle community structure is significantly different (or not) between the 3 levels of grazing pressure. An ANOSIM (build under R software) shows that : R = 0.4097, p = 0.000999. That, it's ok ! But I don't understand the other parts of the results... for example, the values of "Dissimilarity ranks between and within classes".
Thanks a lot for your help !
I want to run a PGLS analysis. I have a phylogeny with branch lengths, but I want to run the PGLS analysis for a slightly smaller subset of taxa than the ones contained in the phylogeny. Is it ok/feasible to prune (remove) taxa from the pre-existing phylogeny, so I don't have to re-calculate a tree from scratch?
Thank you in advance :)
I want to model the distribution of several species. I have read about the subject and have found that there are several models to achieve it:
Which one do you suggest to use, considering that I only have presence records (GBIF)?
I have data on a population of carrot fly, which have been trapped at various distances from a probable source of the flies, at regular time intervals.
I am interested at looking at the associations between the number of flies caught and other independent factors- including distance from the source, hedgerow properties (e.g. age of hedgerow) and the proportion of host plant species present.
Does anyone have any thoughts on how I could statistically analyse or measure the combined effects of the independent variables above upon the number of flies caught?
I'm aware and confident that I could simply look to see if there are correlations/associations between a singular variable e.g. % cover of a host species against the number of flies caught. However, I am more interested in (and think it is more interesting) considering how say % cover of the host plant species AND distance from the source may impact upon the presence of flies.
Programs available- SPSS, Minitab, GraphPad etc. and ArcGIS software.
Many thanks for anyone's kind advice in advance! :-)
Hello everyone, I am trying to apply a PERMANOVA with covariables to a benthic community dataset. I have species density per sample in 4 different distances from a shipwreck and 4 covariables. I am trying to do this using Primer but all the time the results are "no test" and df=0, to pairwise tests for distances. Can anyone help me with that? What am I doing wrong?
I am looking for suggestions for analyses that can compare of different taxa in terms of the relative difference in composition among sites.
I have 4 parallel datasets of species abundance data from 4 different taxa sampled in the same sites (n=12).
Each site was sampled between 4 - 10 times. Usually (not always) sampling was done at the same time for all taxa within a site, but not all sites were sampled at the same time so the data are unbalanced.
I can create balanced subsets if needed but this would severely truncate the data.
I've heard of co-correspondence analysis, co-inertia anlaysis, and possibly multiple-factor analysis as potential candidates for doing this type of comparison but I'm not sure about the differences or which is most appropriate.
Are there pros and cons/restrictions/assumptions for each of these?
Is there an alternative method that I have mentioned that would be better?
Also what do these analyses allow me to test exactly - is their intention is to be able say for example that taxa A and B had high correlation in terms of variation in composition across sites, while taxa C showed low correlation with any other taxa ...etc ?
While searching the net, there seems to be a plethora of codes/packages, and I was wondering if ecologists could suggest the simplest aproach to it
Hi, any suggestions will be welcome as the title reads.
- Four habitats to compare, one which is a control
- Bird data (categorical and quantitative, traits, abundances, microhabitat uses, nesting categories) for breeding and non-breeding seasons
- Various tree characteristics measured.
Which tests could I perform with what data to understand how bird communities use the different habitats?
Our research campaing involved sampling of 4 rivers (at three different altitude stations each river). At every station we collected three different samples in a longitudinal 100 m-transect of the river taking special care to sample the full heterogeneity of substrata, and analyzed for benthic macroinvertebrates.
At the same time we evaluated numerous catchment variables in order to test the relevance of the land use and catchment properties on the macroinvertebrate community.
Therefore, we ended up with 4 rivers * 3 stations * 3 samples per station = 36 samples.
My question is Whether all samples could be wisely included as cases in a Random Forest Model (n=36)?…, or should I instead average macroinvertebrate samples per station to avoid pseudoreplication (n=12)?
I would greatly appreciate any help and advice on this issue.
Salud, y gracias
I want to compare evolution rate between two set of data (morphological and reproductive traits) because the values of the reproductive traits is very small in compare with morphological traits (both of the same unit and scale [mm]), so I want to perform a log-log regression of species mean trait measurements on species mean thorax volume which I used it as the index of the body size, but I don't know how can I perform it. Thus, I am looking for help or potential collaborator.
thanks for any suggestion.
There are four sampling sites on a hillslope (top, upper, lower, bottom), each site has three replications. We have studied soil nutrients and plant biomass in these four treatments, a reviewer suggested that a proper statictical analysis (autocorrelation between topographic positions) was needed.
My question is that how can I do the analysis for autocorrelation for my study? I am familiar with SPSS. Thank you very much.
I have data where I measured the distribution of individuals among 4 patches that differ in known resource density (a continuous variable). Groups of 12 individuals were observed and their presence in the 4 patch types was recorded 10 times (every 20 minutes). Trials differed in the presence or absence of another species. If I only had two patches, I believe I would use a GLMM (family = binomial) with arena_ID as a random variable and presence/absence of other species as a fixed effect.
However, with 4 patch types I want to use patch type (amount of resources) as a continuous fixed effect. However, the percent patch types always sums to 100% (so a regression of percent in patch versus resource in patch seems somewhat incorrect). And the data is multinomial, rather binomial.
I have been calculating the slope of percent in patch versus patch resource for each arena_ID, and then asking if the collection of slopes differ from 0 using a t-test, but I am looking for a better way.
To see if fish species distribution is dependent on area, I have collected fish sample for one year from four selected sampling site of a given lake. Thus, ten fish species were recorded from all sites. Hence, I used a two way classification chi square to see the difference of the species distribution among the sampling sites and I had got significant difference. However, could not able to see which site is significantly different from which other sites. Therefore, is there anyone who can give me some explanation about this.
Hi. I want to assess the diversity of fish from 10 sampling sites at temporal scales. However, at each site I am going to use only 1 fishing gear to catch the fishes. If I want to prepare 5 replicates per site/sampling, is it valid if I take the replicates from the same fishing gear? I'm planning to use stratified random sampling technique to prepare these replicates. Thank you.
Does someone know if any R package can be used to perform meta-analysis take into account spatial and temporal autocorrelation (maybe separately)?
I work on fish abundance data and associated diversity metrics at 35 stations located along several large French rivers (Rhône, Vienne, Loire, Meuse, Seine).
Some of the stations are closers than others : for example, 7 stations are located in the same area (distance <10 km) while some of them are located in different catchments without direct connectivity. Consequently, I expect that my data/results will be strongly spatially autocorrelated.
I am looking for a way to correct the time series meta-analysis for this spatial heterogeneity in R. Ideally, I was thinking of a method that would allow the weighting of the different time series in the meta-analysis according to their relative distance along the river network.
The stations were sampled annually and the time series range from 18 to 36 years. So, consecutive years are likely to be more correlated than the first and the last years for instance. I would like to correct the temporal autocorrelation in the meta-analysis. For now, I have applied a Mann-Kendall trend analysis that account for the temporal autocorrelation, and I have extracted the correlation coefficient to be used in the meta-analysis. Do you think of another way to perform this correction?
I would like to calculate the functional diversity measures proposed by Chiu & Chao (2014). The paper gives formulas for the calculation of functional hill numbers, mean functional diversity and (total) functional diversity. But it doesn't mention any R script or package and I don't feel confident encoding these formulas myself.
I was wondering if anyone already calculated these measures and how they did it.
Chiu, C. H., & Chao, A. (2014). Distance-based functional diversity measures and their decomposition: A framework based on hill numbers. PLoS ONE, 9(7). doi:10.1371/journal.pone.0100014
I am currently testing the correlation between environmental variables and the biological data. I have been using DistLM but since well I have more environmental variables, the programme is unable to process. Please I need suggestions on similar programme which can handle more variables.
I'm currently working on the alkaloid composition of the skin secretions of salamanders and am trying to test whether this composition differs between different populations.
In line with previous research on alkaloid profiles in poison frogs, I tested for differences among populations using an ANOSIM. Since I work with relative concentrations (a.k.a. proportions), I thought it was more appropriate to construct an Aitchison dissimilarity matrix for this analysis.
I was further interested in seeing which exact compounds were responsible for differences between the populations. A SIMPER, often associated with an ANOSIM, seemed perfect ... but SIMPER in R uses Bray-Curtis dissimilarities.
I was wondering if there is an alternative for SIMPER that uses other indices of dissimilarity? Could a PCA do the same?
I have a statistical question concerning my data. I am currently working on a methodological experiment to model some soil chemical variables. Let's say, for simplicity, that I have measured variable A in 6 different SITES (S) in four, fixed layers in each site (L). I want to use A as a model for another variable, B.
My question is: what would be the best model to use in this case? I am aware that, given my experimental design, factor S is Random and factor L is nested within S. I have tried to use a GLMM, but the results are not clear to me and, I think, too obscure for the purpose of my research. Given that my goal is to prove that variable A can be a good proxy for variable B, can I use a more straightforward regression to create a simple model that can be useful to the scientific community?
Best approach for statistical analysis of intercropped maize+cowpea experiment. Please take a example of any growth parameter such as plant height, number of leaves, LAI, CGR etc. and compare the both ANOVA tables. Thank u.
I am performing a site selection analysis for wastewater treatment plants, however, I need help in performing AHP to determine the criterion weights of land cover, slope, and distance to roads for overlaying it in ArcGIS. Thank you for kind response.
Hi, I am using the analyses; Canonical analysis of principal coordinates (CAP) and Similarity Percentages - species contributions (SIMPER) in the stats program, PRIMER. However, I prefer to use R and was wondering what the equivalent functions were? I believe the vegan package is where one goes for permutational analyses but am not sure which functions to use. Any help would be appreciated. Cheers
I am currently working on a project attempting to assess the niche overlap of various species using functional traits.
The issue I am running into is that the analysis I had intended to use (link in replies) is individual based and requires multiple individuals of the same species within the data set in the form (Sheet 1) however my data takes the form (Sheet 2) due to my data dedicating a single row to a species and their predominant trait (literature based). My data incorporates categorical and continuous data (reason for using first analysis).
Thanks in advance.
We have distribution data for larvae of 2 species that were first ransdomly spread on an experimental square arena. After some time, the area was divided in 100 quadrats of 1*1cm and the number of larvae of each species in each quadrat was counted.
What is the best way (i.e. agregation index) to 1/evidence that the observed distribution of larvae is aggregative and 2/evidence that this agregation is interspecific? There are several index and methods reported in the litterature, but I was unable to find the best way to answers these 2 questions according to our dataset (quadrat).
I am trying to analyze an ecological data set. At different sites, we measured different parameters, e.g. sedimentation (n = 3) and fish biomass (n = 5). I want to explain fish biomass with the sedimentation rate. I can’t do this directly, because the sample size is different (3 and 5) and the measurements were not directly linked: The sediment traps used to assess sedimentation rates were spot measurements, the fish biomass was assessed with transects). I don’t want to explain fish biomass by the mean of the sedimentation rate because then I would loose the variability of the sediment rate measurements. Would resampling be an appropriate approach to deal with these problems? Are there other possibilities?
we have tried to make NMDS analysis (for plant species composition versus environmental gradients) by using both presence-absence and abundance data. I have found the result quite similar. but, I get confused to decide which result should be presented. Is that possible to compare the abundance of tree to bushy species?
I have mortality data for incubating Atlantic salmon embryos that have been recorded as proportions of dead embryos/total embryos fertilized. In order to run ANOVAs with data that satisfied the heterogeneity of variance assumption, the proportion data were logit transformed. When presenting the data in a paper, should the original proportions be plotted or should the logit transformed data be plotted? My instinct is that the transformed data should be plotted, however I feel that the biological significance and readability of the plot would be sacrificed if I ignored the original proportions... Thoughts?
I am working on an ecological community species data matrix (site by species), and I have many species and sites. I want to select sub-communities with different sample sizes randomly, and later compare the similarity of these communities. The idea of doing is that some of my sites have a few specimens, so I want to find a sample size (a threshold) that I can use to compare the communities with each other, and discard certain sites that fall below that threshold. I am trying to decide which sites I want to include in my data analysis.
1- How can I randomly subselect the communities? Along with this line, I tried various options, i.e., rarefy the communities to a certain size or use 'sample' package of R.
2- If I have communities with different sizes, and generate distance matrices using these communities, I am not able to compare them using mantel test in R, due to incompatible dimensions. How would you compare samples with different sizes, regarding their similarity?
Any suggestions on these issues are appreciated.
I am analysing abundance data using Primer 7 and I am a bit confused about how to pre-treat the data before carrying out SIMPER. I don't know if I have to standardise the samples by total or if I need to standardised the variables (species).
I have downloaded dose response data from ToxCast.
I would like to know how come there are compounds that have a higher inhibitory effect on aromatase enzyme, when compared to the control (Letrozole)?
Considering more than one random factor is sometimes very important and useful. However, in the case of glmmPQL (in R), I do not why it is not possible to consider two (or more) random factors ?!!
I'm currently looking for a statistical model to analyse competition between three individuals/species. So far I have only been able to find papers where they focus on one individual, not on all three at the same time.
If anyone can direct me towards papers that do this that would be fantastic as I'm sure they're out there!
I want to compute Moran's I with spdep in r using nearest-neighbour distances. I have computed Moran's I with ape using inverse distance weights but this isn't quite what I need to do.
I am trying to find if abundance data from sample plots within the same field are spatially autocorrelated.
I have attached an example data file.
Any help with code would be greatly appreciated.
Now I have 10 years data of species abundance, I try to test community composition response to treatment. Whether can add the year as covariate factor in RDA. As following formula:
How to add the multiyears in Rudandance analysis (RDA)?
I am currently studying about marine gastropod species composition in mangrove and rocky habitat. And I want to see the correlation between environmental variable and species composition. What ordination should i choose? DCA? NMDS? PCA? And what is the difference?
Your answer will be so much helpful
With my experiment on decomposition, I will need to obtain a litter group to study. I would like to encompass a fair amount while still maintaining an element of randomness
Count data are the dependant variable (Y) and ecological area is independent (x). Can you recommend which test would be most appropriate (its a small data set)?
My understanding of statistics is very weak, so I apologise if this is not clear, or the question is unwarranted
Undergraduate students set out to find out whether substrate type (four types on a coral reef) affected algal community structure. They used a stratified random sampling design, with five replicates in each stratum, with each replicate represented by a quadrat that was randomly thrown.
Using a 1m by 1m quadrat subdivided into 100 squares they estimated percentage cover of different species of algae (i.e., percentage of each quadrat occupied by each species). Their resolution was 0.25% (quarter of a square).
They wanted to do an ANOVA, so they needed continuous data. Instead of using the data as percentages, they instead used actual area covered (each square is 10 cm by 10 cm, or 100 cm2).
A colleague disagreed with this strategy on the following counts:
- they felt the percentage data should have been Arc Sine transformed instead, and converting to area did not represent a valid transformation for this purpose. Applying an arithmetic conversion was not a satisfactory option
- the percentages, and hence areas, were an estimate and not an absolute measure. They mentioned that that means they are likely to vary from one person to the next (no questions were raised about whether or not the estimates were done by one or more students)
- the resolution of measure was quarter of a square, or 25 cm2, so they felt this was not really continuous data
Are these valid concerns, and if so, which and why? In addressing 2 and 3, please add comments on how using ArcSine would have been better than using area (I felt that the transformation may carry forward the concerns – estimated data and what I think they meant as inadequate resolution.
Good day to everyone!
I've completed a Mood's median test comparing sea louse median intensity levels (number of lice/individual host) at four separate sites. I am doing a separate test for each month of data collection to detect if and when significant differences occurred between site locations. There are some results that signify a statistical difference of medians between at least one site and the other three, but I need a method to determine which site differs statistically. I've been unsuccessful at finding an appropriate post-hoc choice and have resorted to doing pairwise Mood's median tests when significance is found. I know that this this is not a strong method, and was hoping for feedback....
I also gathered that the Kruskall-Willis compares medians, but that this is not an appropriate method for parasitic count data that has a negative binomial distribution. If I could use it, this would be simple because I would be able to complete a Dunn's median test for post-hoc.... Am I being much too cautious? Any thoughts?
In a morphometric variability study of montane shrub species populations, which I have just launched, leaf shape would be quite a promising character. Although commonly used in such studies, leaf shape is often analysed as actually a set of individual “shape-describing” linear traits and their ratios, leaf area and perimeter. But still, each of these traits is treated by the analysis as an individual independent variable. Thus, I am looking for a high-precision method to measure and analyse leaf shape as a single whole.
The supposed algorithm is following: on the photographed or scanned leaf images, a number of control points are placed along the leaf outline in a computer program. The program then analyses the differences between leaf outlines based on these points, resulting in numerical/graphical representation of the leaf shape variation.
Having reviewed some literature, I found that this can be done by so-called Elliptic Fourier leaf shape analysis using R statistics. Has anyone dealt with such kind of analysis? Is it applicable for within-species population studies? This analysis can be carried out by any of numerous algorithms, so did anybody compare their effectiveness? Also, are there any easier-to-use substitutes for this method? I would be grateful for recommending a relevant statistics and software, some manuals and publications.
In my regression analyses (performed through LM and GLM models) I found R-squared values from 8% to 15%, but high P values. The predictor variable was incisors' procumbency angle and the response variables were the mechanical advantages of jaw adductor muscles in a subterranean rodent species.
Can I include such low R-squared values in my research paper? Or R-squared values always have to be 70% or more. If anyone can refer me any books or journal articles about validity of low R-squared values, it would be highly appreciated.
i am doing phytosociological research and interested only to find vegetational composition (plant communities), so which is the best statistical software i should use?
I am looking at boldness in male Siamese fighting fish. Using sand and white gravel I created a slope from a deep end (16 cm deep) to a shallow end (5 cm deep) and split the tank into 3 equal sized sections to create a 'safe', intermediate and 'scary' zone. The 'safe' zone provides shelter with stones and plants. The scary zone is shallow, brightly lit and empty. A bird silhouette hangs above to act as a predation risk. The intermediate zone creates a gradient between the two zones. I allowed each male to acclimatise in the safe end for 5 minutes and then recorded the time they spent in each zone over 20 minutes. I repeated this 3 times for each fish over 3 weeks with different stimuli each trial.
I want to know what is the best way to analyse this information as I have had trouble with the intermediate zone and what to do with it.
Any advice would be welcome.
I am running some exercises to assess the correspondence of bootstrap samples to asymptotically expected results. For example, from a relatively small, single-variable, simulated normal population (N=100), can one confirm that the sample means and variances of a large number of bootstrap samples (with replacement) are independent? How dependent is the result on the size of the original population, the number of bootstrap samples, and the size of each sample? Not asking for actual results, just what you might expect to see, or resources you can suggest in the literature.
Currently I tested shoots hormones with each concentration 15 replicates and root hormones with each concentration 5 replicates only...
I need to analyse the increased in length, number of newly formed shoots and the length of newly formed shoots. For root, I need to analyse the number of roots formed, length of primary roots and length of secondary roots.
Your suggestion is highly appreciated!
I need to know about the recent/advanced softwares for statistical analysis specially for plant tissue culture and diversity studies. if you suggest me some softwares or available sites,so that i can download if it is available for free, otherwise you can send me the *exe files too if possible.
thank you all.
I have 50 soil samples from which I took sub-samples to carry out a number of tests such as chemistry analysis, pH, EC etc. I have 3 replicates (i.e. 3 sub-samples) of 3 of the samples. I would like to assess how representative my sub-samples are of the sample as a whole. Are there any statistical tests I can use for this? I have considered comparing the standard deviations of my replicates to the standard deviation of the samples overall, but I'm unsure if this is correct or if there is a critical value I can use to assess whether or not the difference in standard deviation is significant.
I want to calculate the Simpson Index of Diversity(1-D) for cover % data of plant species in plots. I have a lot of plant species that have <1% cover in a plot which then result in - values in the formula. E.g, plant A is 0.17%-->D=n*(n-1)=0.17*(0.17-1)=-0.1411.
I also have a lot of species that have a cover of 1% which results in 0 values in the formula: D=n*(n-1)=1*(1-1)=0. Because of these low/0% cover values, my Simpson Index for some of my plots result in values >1 (it should be betw 0-1). Is that possible/correct?
Could I possibly multiply all my % cover values by 10 to get rid off my values <1 and 1 to avoid - values/0-values in my formula?
I'm interested in ecological statistics, so I want to know the best software to use.
I am dealing with samples from trees, where samples which are higher up in adjacent trees are closer to each other than they are to those lower down on the same tree. Using a standard distance matrix function places samples on the same tree at zero distance from each other. Is there any way to do this that doesn't involve manually calculating each sample pair? For example an R package which has a distance function incorporating the height or elevation data, I couldn't find any.
The goal is to use this in a mantel test comparing with an ecological distance matrix.
- I placed 19 litter traps in both areas.
- I separated the litters into four constituent part – fruits, flowers, branches and leaves.
- And obtained their dry biomasses
- And I have two collections for each site.
What statistical analysis can I use to compare these two areas?
I have two separately generated habitat suitability maps and a set of independently gathered GPS relocations of the species.
How can I use the independent data to judge which one of the suitability maps is more accurate?
I have the DNA sequence of 20 species, and the data for all characters of these species like (thorax volume, proboscis length, wings and legs length, abdomen length, ........ect), my question is :
1- which kind of file format I should use for interning the sequences in to R?
2- After building the tree How can I connect between the character data that I have and the tree?