Science topic

# Agricultural Statistics - Science topic

An application of statistics in agriculture

Questions related to Agricultural Statistics

I'm looking for topics that need research, especially related to time series, agricultural production, Agricultural Economics, Cointegration. I'm interested in Agricultural Economics and interdisciplinary topics.

Please suggest me some relevant topics.

Thanks.

I'm doing a germination assay of 6 Arabidopsis mutants under 3 different ABA concentrations in solid medium. I've 4 batches. Each batch has 2 plates for each mutant, 3 for the wild type, and each plate contains 8-13 seeds. Some seeds and plates are lost to contamination. So I don't have the same sample size for each mutant in each batch. In same cases the mutant is no longer present in the batch. I've recorded the germination rate per mutant after a week and expressed it as percentage. I'm using R. How can I analyse them best to test if the mutations affect the germination rate in presence of ABA?

I've two main questions:

1. Do I consider each seed as a biological replica with categorical type of result (germinated/not-germinated) or each plate with a numerical result (% germination)?

2. I compare treatments within the genotype. Should I compare mutant against wild type within the treatment, the treatment against itself within mutant, or both?

I came across a lot of peer-reviewed journal articles and most of the authors have concluded there is a climate change phenomena happening by applying Mann-Kendall Trend test on Hydro-Metrological variables (rainfall, temperature). It has to noted that Mann-Kendall is a statistical technique which on applied to dataset (including time series) shows whether there is a monotonic increasing or decreasing trend & whether that trend so arrived is statistically significant or not ?

**My question is that how we can conclude the trend detected is due to climate change only without citing any physical process/phenomena (like Teleconnections) drives this change ?, that too based on Statistical test (Mann-Kendall) at a particular Level of significance (LOS).**The LOS applied is also statistically subjective and the value can vary from person to person?

Hi, I was hoping someone could recommend papers that discuss the impact of using averaged data in random forest analyses or in making regression models with large data sets for ecology.

For example, if I had 4,000 samples each from 40 sites and did a random forest analysis (looking at predictors of SOC, for example) using environmental metadata, how would that compare with doing a random forest of the averaged sample values from the 40 sites (so 40 rows of averaged data vs. 4,000 raw data points)?

I ask this because a lot of the 4,000 samples have missing sample-specific environmental data in the first place, but there are other samples within the same site that do have that data available.

I'm just a little confused on 1.) the appropriateness of interpolating average values based on missingness (best practices/warnings), 2.) the drawbacks of using smaller, averaged sample sizes to deal with missingness vs. using incomplete data sets vs. using significantly smaller sample sizes from only "complete" data, and 3.) the geospatial rules for linking environmental data with samples? (if 50% of plots in a site have soil texture data, and 50% of plots don't, yet they're all within the same site/area, what would be the best route for analysis?) (it could depend on variable, but I have ~50 soil chemical/physical variables?)

Thank you for any advice or paper or tutorial recommendations.

Do Serial correlation, auto-correlation & Seasonality mean the same thing? or Are they different terms? If so what are the exact differences with respect to the field of statistical Hydrology? What are the different statistical tests to determine(quantity) the serial correlation, autocorrelation & seasonality of a time series?

I have two different formulations of one active ingredient that was teated on 3 crops (A,b,c) Using the same concentration and environmental conditions. I want to check if these two formulations used are acting significantly the same on those crops or not. n=10 and the data is perfectly normal. It would be great to compare the means of each of the 3 crops between these two formulations used. The mean of crop(A) treated by formula X with the means of the crop(A) treated by formula Y of the same active ingredient. That is To see if they are acting similarly in terms of residue detected on those 3 crops

I have the feeling that using normal Anova is not correct! any advice?

If I want the annual average of the country production of oil for 2019 and I have 25 stations,

1- should I take the sum ( of 12 months) for each station individually so I get the annual sum for each station and then divide by 25 to calculate country annual

2- or I take the sum of January for the 25 stations and then February .... etc. and then divide by 12 which is number of months to get the annual average of the country

Hi,

I want to statistically analyse my metabolomics data (18 metabolites). I have two species and four levels of drought. I analysed each species individually by CRD and for mean comparison I used Fisher's LSD (0.05%) by adjusted P-value (FDR). Is this procedure correct or I must use another statistical method?

Thank you all.

Dear colleagues,

I used the R package for line*tester analysis (Agricolae library).

It calculated the GCA effects, SCA Effects, S.E. (gca for line), S.E. (gca for tester) and S.E. (sca effect).

The experimental material comprises eight genotypes. Five genotypes were used as females (line) and three genotypes were used as males (testers).

The 15 F1’s and their parents were evaluated in a randomized complete block design with three replications.

I want to know the degree of freedom (T-test table) for performing the significance test of General combining ability (gi) effects and Specific combining ability (sij) effects.

Regards

Dear all,

I want to analyze a factorial split-plot in time using SAS.

Factorial Experiment using Completely Randomized Design (CRD);

Factor A: treatments (a1-a4)

Factor B: harvest time, different days after treatment (b1-b5)

Replication: 3

Does anyone have SAS codes for this analysis?

Regards,

We have SSR marker (150) based genotypic data of 190 rice landraces. Want to prepare a research article. Which kind of analysis may be performed with these data? Kindly give your opinion and suggestions.

Regards

Parmeshwar

I want to develop a Hybrid SARIMA-GARCH for forecasting monthly rainfall data. The 100% of data is split into 80% for training and 20% for testing the data. I initially fit a SARIMA model for rainfall and found the residual of the SARIMA model is heteroscedastic in nature. To capture the information left in the SARIMA residual, GARCH is applied to model the residual part. The model order (p=1,q=1) of GARCH is applied. But when the data is forecasted I am getting constant value. I tried applying different model orders for GARCH, still, I am getting a constant value. I have attached my code, kindly help me resolve it? Where have I made mistake in coding? or is some other CRAN package has to be used?

library(“tseries”)

library(“forecast”)

library(“fgarch”)

setwd("C:/Users/Desktop")

**# Setting of the work directory**data<-read.table("data.txt")

**# Importing data**datats<-ts(data,frequency=12,start=c(1982,4))

**# Converting data set into time series**plot.ts(datats)

**# Plot of the data set**adf.test(datats)

**# Test for stationarity**diffdatats<-diff(datats,differences=1)

**# Differencing the series**datatsacf<-acf(datats,lag.max=12)

**# Obtaining the ACF plot**datapacf<-pacf(datats,lag.max=12)

**# Obtaining the PACF plot**auto.arima(diffdatats)

**# Finding the order of ARIMA model**datatsarima<-arima(diffdatats,order=c(1,0,1),include.mean=TRUE)

**# Fitting of ARIMA**modelforearimadatats<-forecast.Arima(datatsarima,h=12)

**# Forecasting using ARIMA model**plot.forecast(forearimadatats)

**# Plot of the forecast**residualarima<-resid(datatsarima)

**# Obtaining residuals**archTest(residualarima,lag=12)

**# Test for heteroscedascity****# Fitting of ARIMA-GARCH model**

garchdatats<-garchFit(formula = ~ arma(2)+garch(1, 1), data = datats, cond.dist = c("norm"), include.mean = TRUE, include.delta = NULL, include.skew = NULL, include.shape = NULL, leverage = NULL, trace = TRUE,algorithm = c("nlminb"))

**# Forecasting using ARIMA-GARCH model**

forecastgarch<-predict(garchdatats, n.ahead = 12, trace = FALSE, mse = c("uncond"), plot=FALSE, nx=NULL, crit_val=NULL, conf=NULL)

plot.ts(forecastgarch)

**# Plot of the forecast**Prior to 2008, There were many studies that suggested determining the order of a SARIMA model. After the publication of the article "

**Automatic Time Series Forecasting: The forecast Package for R**" by Hyndman & Khandakar in the Journal of Statistical Software, many studies related to Hydrology were using it. My question is there any more automated technique in R/Python similar to HK algorithm to determine the order of the SARIMA model?Following the determination of model order (p,d,q, P, Q, D & m ) by HK-Algorithm, the model parameters which are coefficients of seasonal and non-seasonal AR and MA are determined by Maximum Likelihood Estimation (MLE). Is there any other technique to find the model parameters?

Grubbs's test and Dixon's test are widely applied in the field of Hydrology to detect outliers, but the drawback of these statistical tests is that it needs the dataset to be approximately normally distributed? I have rainfall data for 113 years and the dataset is non-normally distributed. What are the statistical tests for finding outliers in non-normally distributed datasets & what values should we replace in the place of Outliers?

I have Monthly rainfall data from 1901-2013 for 29 stations covering the entire state of Kerala. I took the first 80% of the data for training the model and the rest 20% for validating it. I developed SARIMA monthly model for forecasting rainfall. The reviewer has asked

**What is the scientific basis for forecasting rainfall over a point location (station) over a longer time scale (a Month)?**What was the reviewer trying to convey by this question?Dear colleagues,

I used the R package for line*tester analysis (Agricolae library).

It calculated the General combining ability effects (GCA) of parents, but I don't know how to calculate the significance of the results.

In addition, I want to Estimate Narrow sense heritability and Heterosis (Better Parent (BP) and Mid-Parent (MP)) for hybrids by R.

Does anyone have a solution?

Regards

What is the method to compare the performances on two different cognitive tests (that measure different cognitive functions) of the same or different group(s)?

As two cognitive tests are inherently different from each other and many a times, have different parameters.

It will be helpful if anyone can direct me to some useful references.

Thank you

Hello,

I am performing statistical analysis of my research data by comparing the mean values by using Tukey HSD test. I got homogeneous group in both small and capital alphabets. This is because of large number of treatments in my study. Is this type of homogeneous group is acceptable for publication in any journal?

We have sufficient seeds of 120 rice genotypes. We want to evaluate the genetic parameters related to early seedling vigour through replicated trials of these 120 genotypes. We didn't want to use Augmented design. Can anyone suggest us the appropriate experimental design for the same? I shall be highly grateful.

Hi everywone!

I am performing a RDA analysis with vegan package in R.

I have a doubt regarding 'decostand' & 'scale' functions. Are they the same? Should I use one of them?

I have many soil variables ('data.var' : pH, CIC, N content, C/N ratio, microbial biomass, pH, CE, etc), and I was using both functions in my script:

.

.

*data.var.sdz<-*

**decostand**(data.var,"standardize")*rda_indexes <- rda(data.var.sdz ~ data$depth, data, perm=999,*

**scale=TRUE)**

*.*

*.*I just realized that maybe this is is wrong. Any ideas on this?

Thanks in advance!

I would like to compare the food security status of two group of people. Here, I will use household hunger scale. sample size is large. then which statistics will be most suitable?

Hii, please give me a complete guide or any material to analyse the Experimental data of RBD, CRD etc. to a find Genetic diversity, Character association, Path analysis & other Plant Breeding related experiments by using IBM SPSS

I am interested in carrying out a Cointegration Analysis about the international prices of a few energetic resources (coal, oil, uranium, plutonium, agrofuels, eolic, hidroelectric) in order to stablish or to discard any type of relationship.

Could you advise how to preceed for this task?

Thank you to everyone reading.

How can I select variables for PCA analysis from huge set of data? except the biological significance of the variables, are there other criteria for objectively choosing the most relevant variables to take into account in a multivariate analysis (PCA in this case)?

Thank you in advance...

Hello,

I hope you have a good time.

I work on a research project about temperature indices. Due to the high number of indices, I only work on tables and maps on an

**annual**time scale. In other words, I do most of my analysis for the**annual**time scale. Now I want to draw a**for studied indices. It should be noted that I have access to***box plot***daily**data. Do you think, for example, I should plot the**average air temperature**box plot using daily data or annual data?Also in the case of precipitation,

**is the box***plot better*drawn from daily data or annual data?I am waiting for your answer. Also, it would be great if you could introduce some reference.

Thanks in advance for your answer.

Hi everyone

I have a dependent categorical variable of three levels, corresponding to three sectors of activity in the agricultural field, A, B and C and within each of them there are sub-levels, for example under A there are four sub-levels a1, a2, a3 and a4.

Over time, for more profit, farmers change their activity, this change is subject to several factors like demands, financial support, etc. (we are talking here about independent, quantitative and categorical variables).

For example, a farmer who practiced activity A, changed his activity to B, i.e. He has completely changed the activity sector or can only change to a subsector, for example moving from “a2” to “a1” (the same applies to other farmers.

Is there a statistical technique that can be used to model these changes?

Hello;

I have made a dendrogram with mixed data (numbers, ordered factors and factors) using gower distance (daisy function in R) and cluster analysis with ward.D2 (hclust function in R) for a paper of the characterization of a plant specie.

However i am requested to calculate the bootstrap values in this dendrogram to show the confidence of this clusters, i understand that it is possible for this kind of mixed data but i have not found any R script or reference in how to perform it (for example pvclust function in R doesn't have gower distance option).

Thanks for the support

Hi everyone!

Reading many papers concerning soil repiration I've found different ways to report annual mean soil respiration rates:

**g CO**_{2}m^{-2}h^{-1}**μmol CO**_{2}m^{-2}s^{-1 }**g C m**^{-2}h^{-1}

The conversion from the first to the second (

*x/3600/44*10*) and vice-versa is "simple", what about the third? How to convert^{6}**CO**to_{2}**C**and**C**to**CO**? How to compare them?_{2}Thank you very much!

Cheers!

Available nitrogen, phosphorus and potassium (NPK) content data are correlated applying with the multiple correlation coefficients formula, the obtained values are approaching 1 that indicates a strong positive relationships with each other, but each variable of the soil nutrients is considered to be dependent on the other variables changes, governed by the extraneous factors, that badly needs the regression fit line, but how could I couldn't find any formula for the calculation of the regression fit line for the multiple variables, is it available where two variables are independent and one is dependent, or is the regression fit line for the multiple variables relevant at all?

A value of positive correlation coefficient 1 of the two variables means a perfect positive relationship, indicating a positive increase in one variable, there is a positive increase in the second variable - does it means if one variable remains constant without showing increasing trends, another variable is too a constant one?

Dear scholars,

As we know that: the Critical Difference of LSD < Critical Difference Tukey < Critical Difference Scheffe So, I'd like to ask whether the choice of one of the following tests Fisher's Least Significant Difference (LSD), Scheffe’s test,Tukey's Test depends on the accuracy of the results that the researcher wishes to obtain?

Best wishes

Huda

Hello,

I have measured morphometric parameters of plants grown in vitro (height, root mass, etc.). I have one variable, thus two test groups - control and treatment group. I've made three independent biological replicates of the experiment, 30 plants per each biological replicate. In total there are 90 plants for control group and 90 plants for treatment group. I have done single factor ANOVA for each replicate and achieved high F numbers and very low p-values. My question is, is there a way, similar to ANOVA for repeated measurements, to analyse this data as three independent units, or should I just merge the 3 replicates into one data set?

Hello,

I have little doubts about the statistical model that I am using to analyze my data. I have two groups of residue studies data

**Group 1 n=7**and the other**group 2 n=47**. they are independent and the studies are expensive and rare so I couldn't increase the sample size by any means.I have tested the normality of both groups using the SPSS and found to be not normally distributed then I transformed all the data to fit the normal distribution using the square-root calculation - they fit the normal distribution

**p= 0.2 (more than 0.05).****which test should I use especially that I am using SPSS package**

thanks

I am working on an experiment where I will be growing soybean in pots through all its stages in a greenhouse, I have a total of 65 plants in the beginning with 4 treatments and 1 control group making a total of 13 plants/replications per treatment. Mid-way through the experiment I will be taking 3 random samples from each group, a total of 15 plants, and removing them from the soil to look at nodule count and total plant biomass.

The experiment design that I have set up for pot placement in the greenhouse was done as a randomised complete block design with 13 blocks and 5 rows/treatments, my question is how do I take these 15 random samples from the experiment without destroying blocks and thereby making it impossible to do the statistical analysis that I plan to do? The reason for the design is to account for spatial variation in the greenhouse such as light exposure.

The figures I have made are my two suggestions for how I could do this. In figure 1 I show the design and placement of the pots, each letter A-E is one treatment (randomly assigned to a pot number for each block) and the numbers are the numbers that were assigned to those pots. In figure 2 I remove 3 randomly assigned full blocks from the experiment for destructive samples, and in figure 3 I took 3 random samples from each of the treatment groups independently and removed them.

I feel as if removing complete blocks like in figure 2 would be the best option, because then I would not be affecting the other blocks as happens in figure 3, the disadvantage of this option is that the samples are less spread and it would perhaps not account too well for variability in the greenhouse for the destructive samples, but do well for the non-destructive ones.

Figure 3 probably does well in accounting for spatial variability in the destructive samples, but it will be destroying the blocks for the finished plants and therefore it seems to me like it would not be possible to do a proper statistical analysis on the finished samples of which I care most about.

How can I solve this? I would also appreciate a suggestion on how to analyse these results if it is possible. Is there any other way I could take destructive samples like this without ruining it? I feel like growing the destructive samples separate from the main group in the greenhouse (but close to each other) would not account enough for variability.

My supervisors have said that taking the destructive samples will not affect the growth of the surrounding plants as long as I remove the pots and not clip the plants.

I want to compare a series of new treatments with existing standard treatment.

In the experiment I have

Treatments:
1. light intensity with three levels (No shade, single shade, double shade)
2. Nutrient with three levels ( without nutrient, NPK (chemical), VC (organic)
3. Media with two levels (without topsoil , with topsoil)

data recorded are growth performance which includes morphological characteristics, physiological and biochemical:

Which analysis should I use for this experiment? I want to analyse whether the treatments given will affect the growth performance. Can I use ANOVA or any other more suitable analysis?

Dear everyone,

do you have any experience on the analysis of yield stability or yield risk based on simulated crop yield data? I would like to use a modelling approach and simulate grain yields (of cereal crops like winter wheat) depending on different agronomic practices, soil profiles and weather data. Based on the simulated grain yield data, I would like to calculate stability parameters (like Shukla`s stability variance index, adjusted CV, etc.). The problem is that I

**don`t**have an underlying field trial design with randomisation or real replicates - I`ll get only one yield value per environment (soil profile*year) for each agronomic practice (like cropping sequence, fertilisation variant, etc.). Do you have any experience with that problem?With many thanks for your help in advance!

Janna

I performed a photo-interpretation where I laid randomly selected points on an aerial imagery and classified them into land cover classes (e.g tree, grass, etc). I then estimated the percent cover or amount of cover for each cover class and their Standard Errors (SE). I carried out the land cover assessment for the years 2010 and 2019. I am interested to know if there is a difference in a particular cover class (e.g tree cover) between the two years.

As an example, Tree cover values (i.e Area (ha) ± SE) for years 2010 and 2019 were estimated to be 3.61 ± 0.23 and 2.45 ± 0.20, respectively. I don't have replicates, all I have is the two values to compare, and I want to know if the decrease in tree cover is significant?

Hi, everyone. What's the different, if any, between calculating the coefficient of variation of a group of data disposed in a experimental design with different treatments (for ANOVA) and a "free" group of data. I supose, there might be a difference in case of ANOVA, ´cause the treatments may affect the evaluated variable between groups of replications of the same treatment.

Dear Scientists,

Please, can you help me understand why my data give no value of interaction (see attached) on XLSTAT upon running the two way ANOVA. the value of type I, II and III give empty result as attached and I am not able to understand what is wrong.

any answer is welcome

Thank you

I am curious if 5 replicates can get an accurate result of the experiment. Having 5 replication can be accepted? In your own idea or experiences, what are the minimum number of replication must be use in the study?

My study was all about the cassava must be grown in Greenhouse.

Dear All,

I am using ASREML-R to fit unstructured (UN) and factor analytic (FA) model to explore complex structure of genotype by environment interaction in multienvironment yield data. There are 160 genotypes across 30 environments. It is highly unbalanced data. I used two-stage approach so my data is in two-way GxE table of mean with a single observation for every genotype by environment combination.

The unstructured model is fitted as

UN <- asreml(fixed = fyld~env,

random =~us(env):id(gen),

rcov =~units,

data = met)

The warning message received is

158 singularities inAverage Information Matrix

Exit status: -158 - Singularity in Average Information Matrix

As for FA model, it is fitted as

FA <- asreml(fixed = fyld~env,

random =~fa(env,1):(gen),

rcov =~units,

data=met)

LogLikelihood not converged

DF loglik AIC BIC

1 61 -2265.464 4652.928 4951.688

And some output estimate follow.

I am thinking of fitting FA2 of order 2 but I am worried that FA1 is not even converged. Is there anything wrong with my script? Advise me please on what to do.

I have fitted a binary logistic regression model for my thesis. Can anyone help me to provide some documents that will help me to know more about comments on the model?

Dear Scientists,

Greetings

Please, could anyone give me an alternative to analyse data generated from an augmented Block design layout?

The Following known softwares are not working! Could anyone know the reasons? I urgently need your help!

Here are the softwares/links

Indian Agricultural Research Institute, New Delhi

•Statistical Package for Augmented Designs (SPAD)

•SAS macro called augment.sas

CIMMYT – SAS macro called UNREPLICATE

•Developed in 2000 – uses some older SAS syntax

Thanks in advance for your help

Regards

Dear Scientist,

I kindly plead for any one who has an idea on this to clarify me. I ran a two way ANOVA with data collected from two locations and got no location X genotype interaction, then a reviewer is requesting me to present represent replication within location effect and triplicate within sample effect.

Please, can any one give an idea on this? I use XLSTAT to run the analysis.

Thank you

I have recently collected data on a study area divided into three sites and each site was divided into four categories and each category was subdivided into 5 transects. Each transects were also subdivided into 5 quadrats which were also subdivided into 2 subquadrats each.

The data collected were actually about natural regeneration of balanites species.

Considering that my design is a stratified sampling, I would any advise on the best method, procedure and statistical package for analysis of such data. I have tried Two-way anova with XLSTAT but i feel it is not clear to me.

Please, any advise is welcome.

Thanks in advance

Regards!

Hi all,

I have a data set, including soil respiration and carbon amount. After plotting a scatter plot respiration vs carbon amount, I can see there is kind of positive relationship between them. I want to check if this relationship statically valid. But my problem is, soil respiration have much more data than the amount of carbon. Precisely, respiration was collected at 5 treatments, each treatment have 4 replicates, and sampled at 5 days, so in total there are 100 samples recorded. Carbon amount: 5 treatments, and each treatment has 3 replicates, so in total there are 15 samples recorded. It looks like impossible to do normal linear regression. My question is "is there any statistical way to measure their relationship in such a limited data set"?

Many thanks,

Xin

I am trying to compare soil carbon concentrations from different studies.

The problem is that most of them use different sampling depths:

0-10cm

0-15cm

0-20cm

0-30cm

...

Is there an acceptable technique to mathematically adjust concentrations to other sampling depths?

I understand that it will be a rough estimate since it depends on the distribution on C.

The concentration of C can decrease rapidly with depth, or conversely decrease only slowly...

Is it tolerable to define a model distribution to convert concentrations to subsequently make comparisons?

(Example for model distributions on the attached picture)

I wish to analyze the effect of tillage, nitrogen and cover crops on weed communities. The design can be described as the following:

- 4 blocks

- main plot factor: tillage (nested in blocks)

- sub plot factor: nitrogen (nested in tillage)

- sub sub plot factor: cover crops (nested in nitrogen)

Hence, in a classical univariate mixed model (lme syntax), the error structure would be defined as (1|block/tillage/nitrogen) if only one sample is present at the sub sub plot level.

Now, I wish to analyse in R the effects of tillage*nitrogen*cover crops (all simple effects, all second order interactions and the third order interaction) on weed communities through partial CCA (to filter out the block effect).

I believe the following CCA model is correct (in {vegan}):

cca2013=cca(log(cover_2013+1)~tillage*N*CC+Condition(block),data=cover_2013)

And from what I gathered on the {permute} vignette, the permutation design should be defined the following way:

h<- how(within = Within(type = "free"), plots = Plots(strata = interaction(block,tillage), type = "free"), blocks = block)

Is this correct? Am I not missing an additional level of nestedness?

Finally, the significance of each term would be tested the following way:

anova(cca2013,by="terms",permutations=h)

I would be grateful if someone could provide their expertise.

Sincerely,

Guillaume ADEUX

It usually happens when data is analyzed by independent T-test, results are significantly different. However, ANOVA with post hoc gives P value way different than T-test and non-significant. What would be better tool to analyze any data?

I cant find any material which provides the procedure to calculate the contribution of each character to the total divergence. If anybody have materials, please share me...

Example experiment: I spray a chemical with 3 concentrations on some plants and record their survival rate. The result is as followed:

Concentration (mg/l)...............Survival rate (3 replicates)

..............1.........................................100%; 100%; 100%

..............2.........................................90%; 80%; 90%

..............3.........................................50%; 40%; 30%

So with these kind of data, what type of statistical test and post hoc test should I use to analyse them?

By definition, it seems Fisher's exact test (or Chi-square test) fits these data, but the format of the tests isn't exactly the same (they don't use proportion but count number instead). So can I use Fisher's exact test here, do I need to transform the data format, and what post hoc test do I use to do pairwise comparison among treatments?

I ran through one paper having similar kind of data I have but couldn't understand the statistical approaches they used. Can somebody provide useful information about over-dispersion (any examples)?

The price of aged

*(Often sold by the well known wholesalers like***Basmati rice****Daawat, Indiagate, Lalmahal**etc) is quiet higher (3 to 4 times) than the freshly harvested rice in the Indian market, as natural ageing enhance and intensify its taste, aroma, and cooking characteristics.**So should we recommend farmers to keep a part of their produce, and sell it later for getting higher value ?**

& what are the

**ways to store these grains for long term at farmer's place**without their quality being deteriorated?I have found in a book ( Statistical Procedure in Agricultural Research by Gomen and Gomez), the authors used correction factor to estimate total sum of squares. I am interested to know what this correction factor means and why it is used in ANOVA analysis? Could anyone please explain this? Thanks in advance for your help.

If someone conducted an experimental trial of any field crop, repeated for 2 subsequent years

- With
**3 different dates of sowing**(*Factor 1*), - &
**3 different varieties**(*Factor 2*), **replicated 4 times**in**split plot design**

then how one can present the findings without simply doing its pooled analysis i.e. by comparing the results,

**taking “year” as one of the factor during statistical analysis?**Because In most of the publications it is quite apparent that, when researches take the similar observations during repeated trials they usually take the average of parameters under observation and go for the statistical analysis of their pooled data.

I have performed an experiment where treatment effects have been represented by - sign for decrease in mean brood area in honey bee colony and + sign for increase in brood area in honey bee colony.

**I tried to performed the arc sine transformation but it didn't work**.thanks in advance

Hello,

I recently analysed count data (in this case the no of stems per tree) using a GLM. The analysis was carried out on both log transformed and square root transformed data. Although most report that the square root transformation should be used on count data, I got a model with higher R2 and lower error values when carried out on the log transformed data. Which would you recommend I report on?

Regards

Glen

I have a short question regarding the data analysis. I have 2 data sets (data data frames *df). The first one is a signal which is yes / no / maybe and a second one is a data set which is response to this signal in percentage. The percentage increases after each yes and maybe (Like a rain fact and plant disease level). The signal response (second data set is delayed in time). I need to find out the relationships between this to data sets (signal and response). Which is the best method to do this in R.

PS.I tried cross correlation but results are not clear.

Thank you all

Example

Date Signal Response

01.02.2018 / no (as o) / 0%

01.03.2018 / yes (as 2) / 0%

01.04.2018 / no (as 0) / 0%

01.05.2018 / maybe (as 1) / 2%

etc....

Hello,

I am preforming research to check the impact of different storage bags on the damage parameters of maize due to red flour beetle at different storage times.

**Treatments:**8 storage bags (SB)

**Infestation time:**0, 15, 30, 45, 60, 75, 90, 105, 120, 135 and 150 days

**Parameters:**Percent grain damage (GD%), Final adult density (FAD), Percent weight loss (PWL) and Weight of insect feeding residues (WIFR).

How to statistically analyze the data of this experiment.

Data of one parameter is also attached with this question.

Thanks

I want to improve my knowledge in this area. I would like indications of good books that you have studied.

One-way ANOVA requires n=30. However, of two hormones (BAP and 2-iP) tested at different concentrations (0, 1, 2, 3, and 4 mg/L), in order to achieve 30 samples or replicates, we can do many ways as follow:

A) Do ALL concentrations of TWO hormones with 30 samples at one time,

B) Do ALL concentrations of TWO hormones with ONLY 3 samples but repeated 10 times,

C) Do ALL concentrations of TWO hormones with ONLY 5 samples but repeated 6 times,

D) Do ALL concentrations of TWO hormones with ONLY 10 samples but repeated 3 times,

E) Do ONE of the concentrations of ONE hormone first for 30 samples, experiment is repeated by changing the concentrations or the type of hormone used.

May I know which method is the most suitable? Why?

I was being told that, by repeating more times, I can't use one-way ANOVA already even it is fulfilled the assumptions (n=30, normal and equal var among each population)...by increasing the replicates, I have to use more complicated analysis like two-way ANOVA/ multivariate analysis...Any experts can help me for this also?

Dear all,

do you know if:

1 - can I run an RDA with negative (taxa) values (as delta Control - Treatment)?

2 - Do I have to use the function decostand function on these delta values before performing the RDA?

3 - Shall I use Bray-Curtis distance (dist='bray") in the RDA function?

Best

Alessandro

I am working on the evaluation of some tomato cultivars. I am wondering if you can give some suggestion on how genotypic and phenotypic correlation coefficient calculation and path coefficient analysis work in statistical program. It would be really nice if you can suggest any statistical program that is convenient for this calculation and analysis .

Good day,

I need to evaluate a total of 26 rice varieties in terms of their morpho-agronomic characteristics and yield components.

Among the 26 varieties, 18 are irrigated lowland varieties and 8 are upland varieties.

In RCBD, similar experimental units are grouped into blocks/replicates to control variation in an experiment (spatial effects of glasshouse).

In my case, the size of the pot, the soil volume and number of plant/pot and the replicates are the same. However, the soil type and fertilizers application are different for irrigated lowland and upland.

Therefore, I have few questions to ask to clear my doubt.

1. Is this experiment still consider as a single factor (genetic material) experiment?

2. Can we group all the 26 rice varieties (including lowland and upland) in a block?

3. Or two blocks are needed for lowland and upland, respectively? (Lowland block) (Upland block)

4. If there are two blocks, would it affect the statistical analysis? - Analysis of variance (ANOVA) and Duncan’s multiple range test (DMRT) -Genotypic and phenotypic correlation coefficients

To add on,

In my 18 lowland rice varieties, I have 7 which are high yield and 11 which are low yield. My research questions are:

1. Are the morpho-agronomic characteristics and yield components of high-yielding lowland rice accessions differ from low-yielding lowland rice accessions? (

**High yield lowland vs low yield lowland**)2. Are the morpho-agronomic characteristics and yield components of high-yielding lowland rice accessions differ from upland rice accessions? (

**High yield lowland vs upland**)3. Are the morpho-agronomic characteristics and yield components of low-yielding lowland rice accessions similar with the upland rice accessions? (

**Low yield lowland vs upland**)Can I still compare the results obtained from different blocks if these are the questions that I would like to address in my study?

Thank you

To study the effect of fibrolytic enzymes on straws degradability, I have as variables enzymes mixture (n=2), straw types (n=4), enzymes levels (n=4).

the enzyme levels are not the same for both enzymes mixture.

A field study of 4 levels of biochar (Factor A) and 4 levels of nitrogen fertilizer (Factor B) ; (16 treatment combination) is underway. Field study was set up in RCBD and replicated thrice, however i have a problem in selecting statistical model to use for data analysis.

I intend to use the general linear model (GLM), but i am not so sure if the model is the best fit for analysis. I need suggestions and guidance.

The dependent variable is also a continuous variable.