Science topic

# Agricultural Statistics - Science topic

An application of statistics in agriculture
Questions related to Agricultural Statistics
Question
I'm looking for topics that need research, especially related to time series, agricultural production, Agricultural Economics, Cointegration. I'm interested in Agricultural Economics and interdisciplinary topics.
Please suggest me some relevant topics.
Thanks.
Thank you so much Kaushik Gupta for the advice. I'll follow it.
Emmanuel V Murray Thank you much for the quick suggestion.
Chandan Kumar Thank you much for the advice and recommendations.
Regards,
Sujata
Question
I'm doing a germination assay of 6 Arabidopsis mutants under 3 different ABA concentrations in solid medium. I've 4 batches. Each batch has 2 plates for each mutant, 3 for the wild type, and each plate contains 8-13 seeds. Some seeds and plates are lost to contamination. So I don't have the same sample size for each mutant in each batch. In same cases the mutant is no longer present in the batch. I've recorded the germination rate per mutant after a week and expressed it as percentage. I'm using R. How can I analyse them best to test if the mutations affect the germination rate in presence of ABA?
I've two main questions:
1. Do I consider each seed as a biological replica with categorical type of result (germinated/not-germinated) or each plate with a numerical result (% germination)?
2. I compare treatments within the genotype. Should I compare mutant against wild type within the treatment, the treatment against itself within mutant, or both?
I suggest using mosaic plots rather than (stacked) barplots to visualize your data.
The chi²- and p-values can be calculated simply via chi²-tests (one for each ABA conc) -- assuming the data are all independent (again, please note that seedlings on the same plate are not independent). If you have no possibility to account for this (using a hierarchical/multilevel/mixed model), you may ignore this in the analysis but then interpret the results more carefully (e.g., use a more stringent level of significance than usual).
A binomial model (including genotype and ABA conc as well as their interaction) would allow you to analyse the difference between genotypes in conjunction with ABA conc. However, due to the given experimental design (only three different conc values) this is cumbersome to interpret (because you cannot establish a meaningful functional relationship between cons and probability of germination).
Question
I came across a lot of peer-reviewed journal articles and most of the authors have concluded there is a climate change phenomena happening by applying Mann-Kendall Trend test on Hydro-Metrological variables (rainfall, temperature). It has to noted that Mann-Kendall is a statistical technique which on applied to dataset (including time series) shows whether there is a monotonic increasing or decreasing trend & whether that trend so arrived is statistically significant or not ?
My question is that how we can conclude the trend detected is due to climate change only without citing any physical process/phenomena (like Teleconnections) drives this change ?, that too based on Statistical test (Mann-Kendall) at a particular Level of significance (LOS).
The LOS applied is also statistically subjective and the value can vary from person to person?
Climate change is a reality, but we must found a reference to be sur about the tendency of change. No one can explain which is the next step for climate change.
The statistical test can not respond to a dynamique change.
Question
Hi, I was hoping someone could recommend papers that discuss the impact of using averaged data in random forest analyses or in making regression models with large data sets for ecology.
For example, if I had 4,000 samples each from 40 sites and did a random forest analysis (looking at predictors of SOC, for example) using environmental metadata, how would that compare with doing a random forest of the averaged sample values from the 40 sites (so 40 rows of averaged data vs. 4,000 raw data points)?
I ask this because a lot of the 4,000 samples have missing sample-specific environmental data in the first place, but there are other samples within the same site that do have that data available.
I'm just a little confused on 1.) the appropriateness of interpolating average values based on missingness (best practices/warnings), 2.) the drawbacks of using smaller, averaged sample sizes to deal with missingness vs. using incomplete data sets vs. using significantly smaller sample sizes from only "complete" data, and 3.) the geospatial rules for linking environmental data with samples? (if 50% of plots in a site have soil texture data, and 50% of plots don't, yet they're all within the same site/area, what would be the best route for analysis?) (it could depend on variable, but I have ~50 soil chemical/physical variables?)
Thank you for any advice or paper or tutorial recommendations.
Thank you!
Question
Do Serial correlation, auto-correlation & Seasonality mean the same thing? or Are they different terms? If so what are the exact differences with respect to the field of statistical Hydrology? What are the different statistical tests to determine(quantity) the serial correlation, autocorrelation & seasonality of a time series?
Kabbilawsh Peruvazhuthi, Serial correlation & auto-correlation are same thing but seasonality is different.
Question
I have two different formulations of one active ingredient that was teated on 3 crops (A,b,c) Using the same concentration and environmental conditions. I want to check if these two formulations used are acting significantly the same on those crops or not. n=10 and the data is perfectly normal. It would be great to compare the means of each of the 3 crops between these two formulations used. The mean of crop(A) treated by formula X with the means of the crop(A) treated by formula Y of the same active ingredient. That is To see if they are acting similarly in terms of residue detected on those 3 crops
I have the feeling that using normal Anova is not correct! any advice?
It sounds like an independent t-test case. You can use independent t-test to compare the mean of The mean of crop(A) treated by formula X and the mean of the crop(A) treated by formula Y of the same active ingredient. Then you repeat the comparison for crops B and C. Three separate independent t-tests in this case.
Question
If I want the annual average of the country production of oil for 2019 and I have 25 stations,
1- should I take the sum ( of 12 months) for each station individually so I get the annual sum for each station and then divide by 25 to calculate country annual
2- or I take the sum of January for the 25 stations and then February .... etc. and then divide by 12 which is number of months to get the annual average of the country
These are 2 different averages. The numerator is the same for both 1 and 2 -this is the sum of production of 25 stations for 12 months, i.e. the total annual production of all 25 stations.
But division this numerator by 25 gives you the annual average production per station.
Division by 12 gives you the average production of all 25 stations per month.
There is no single correct average. The average depends on how you define it and what you want to characterize-production per station or production per month..
Question
Hi,
I want to statistically analyse my metabolomics data (18 metabolites). I have two species and four levels of drought. I analysed each species individually by CRD and for mean comparison I used Fisher's LSD (0.05%) by adjusted P-value (FDR). Is this procedure correct or I must use another statistical method?
Thank you all.
I think that the LSD mean comparison test is appropriate, but it is point that which test plan did you use for the four levels of drought? Are you considering the factorial experiments? However, it could be considered as an alternative test.
Question
Dear colleagues,
I used the R package for line*tester analysis (Agricolae library).
It calculated the GCA effects, SCA Effects, S.E. (gca for line), S.E. (gca for tester) and S.E. (sca effect).
The experimental material comprises eight genotypes. Five genotypes were used as females (line) and three genotypes were used as males (testers).
The 15 F1’s and their parents were evaluated in a randomized complete block design with three replications.
I want to know the degree of freedom (T-test table) for performing the significance test of General combining ability (gi) effects and Specific combining ability (sij) effects.
Regards
For significance test of both GCA (of lines and testers) and SCA (of crosses) effects, you should consider error degree of freedom, that is, (r-1) x (tr-1) = 2 x 22 = 44.
For further detals, you may go through:
Sharma JR. 1998. Statistical and biometrical techniques in plant breeding. New Age International Publishers, New Delhi, Pp. 138-152.
Singh RK and Chaudhary BD. 1996. Biometrical methods in quantitative genetic analysis. Kalyani Publishers, New Delhi, Pp. 205-214.
Question
Dear all,
I want to analyze a factorial split-plot in time using SAS.
Factorial Experiment using Completely Randomized Design (CRD);
Factor A: treatments (a1-a4)
Factor B: harvest time, different days after treatment (b1-b5)
Replication: 3
Does anyone have SAS codes for this analysis?
Regards,
Thank you
Question
We have SSR marker (150) based genotypic data of 190 rice landraces. Want to prepare a research article. Which kind of analysis may be performed with these data? Kindly give your opinion and suggestions.
Regards
Parmeshwar
hello,
Primarily we should get to know that for what purpose we are carrying out molecular studies and based on that analysis can be done. like,
1. for genotypic characterization:- Basic genetic parameter analysis like hetertozygotes level, allelic frerquency, PIC value etc.,
2. for genetic variability studies:- AMoVA
3. for ancestry studies: - phylogenetic analysis OR dendrogram studies.
4. for genetic distance studies:- PCA, Genetic distance matrices.
5. for population studies: - STRUCTURE.
Stat tools:- ArleQin, GenAlEx, Molkiv, Power marker, NTSYS, STRUCTURE, MEGA and Darwin.
*If you are using genic SSRs-
1. trait identification studies
2. QTL analysis.
Stat Tools: - Windows QTL Cartographer, TASEL.
all the best
Question
I want to develop a Hybrid SARIMA-GARCH for forecasting monthly rainfall data. The 100% of data is split into 80% for training and 20% for testing the data. I initially fit a SARIMA model for rainfall and found the residual of the SARIMA model is heteroscedastic in nature. To capture the information left in the SARIMA residual, GARCH is applied to model the residual part. The model order (p=1,q=1) of GARCH is applied. But when the data is forecasted I am getting constant value. I tried applying different model orders for GARCH, still, I am getting a constant value. I have attached my code, kindly help me resolve it? Where have I made mistake in coding? or is some other CRAN package has to be used?
library(“tseries”)
library(“forecast”)
library(“fgarch”)
setwd("C:/Users/Desktop") # Setting of the work directory
data<-read.table("data.txt") # Importing data
datats<-ts(data,frequency=12,start=c(1982,4)) # Converting data set into time series
plot.ts(datats) # Plot of the data set
adf.test(datats) # Test for stationarity
diffdatats<-diff(datats,differences=1) # Differencing the series
datatsacf<-acf(datats,lag.max=12) # Obtaining the ACF plot
datapacf<-pacf(datats,lag.max=12) # Obtaining the PACF plot
auto.arima(diffdatats) # Finding the order of ARIMA model
datatsarima<-arima(diffdatats,order=c(1,0,1),include.mean=TRUE) # Fitting of ARIMA model
forearimadatats<-forecast.Arima(datatsarima,h=12) # Forecasting using ARIMA model
plot.forecast(forearimadatats) # Plot of the forecast
residualarima<-resid(datatsarima) # Obtaining residuals
archTest(residualarima,lag=12) # Test for heteroscedascity
# Fitting of ARIMA-GARCH model
garchdatats<-garchFit(formula = ~ arma(2)+garch(1, 1), data = datats, cond.dist = c("norm"), include.mean = TRUE, include.delta = NULL, include.skew = NULL, include.shape = NULL, leverage = NULL, trace = TRUE,algorithm = c("nlminb"))
# Forecasting using ARIMA-GARCH model
forecastgarch<-predict(garchdatats, n.ahead = 12, trace = FALSE, mse = c("uncond"), plot=FALSE, nx=NULL, crit_val=NULL, conf=NULL)
plot.ts(forecastgarch) # Plot of the forecast
At the begin it happens as usual, and this way we learning, I would like to advise you to check your theory & codes line by line. It will work for sure.
Question
Prior to 2008, There were many studies that suggested determining the order of a SARIMA model. After the publication of the article "Automatic Time Series Forecasting: The forecast Package for R" by Hyndman & Khandakar in the Journal of Statistical Software, many studies related to Hydrology were using it. My question is there any more automated technique in R/Python similar to HK algorithm to determine the order of the SARIMA model?
Following the determination of model order (p,d,q, P, Q, D & m ) by HK-Algorithm, the model parameters which are coefficients of seasonal and non-seasonal AR and MA are determined by Maximum Likelihood Estimation (MLE). Is there any other technique to find the model parameters?
Kabbilawsh Peruvazhuthi Trend Analysis Approach: Commonly referred as "Time Series Analysis," this forecast method analyzes previous data to forecast future occurrences, removing outliers and prioritizing more current data. This strategy works best when there is a big amount of historical data with obvious and steady patterns.
Question
Grubbs's test and Dixon's test are widely applied in the field of Hydrology to detect outliers, but the drawback of these statistical tests is that it needs the dataset to be approximately normally distributed? I have rainfall data for 113 years and the dataset is non-normally distributed. What are the statistical tests for finding outliers in non-normally distributed datasets & what values should we replace in the place of Outliers?
Hello Kabbilawsh,
If you believed your sample data accurately represented the target population, you could: (a) run a simulation study of random samples from such a population; and (b) identify exact thresholds for cases (either individual data points or sample means or medians, depending on which better fit your research situation) at whatever desired level of Type I risk you were willing to apply.
If you don't believe your sample data accurately represent the target population, you could invoke whatever distribution you believe to be plausible for the population, then proceed as above.
On the other hand, you could always construct a Chebychev confidence interval for the mean at whatever confidence level you desired, though this would then identify thresholds beyond which no more than 100 - CI% of sample means would be expected to fall, no matter what the shape of the distribution. This, of course, would apply only to samples of 2 or more cases, not to individual scores.
Good luck with your work.
Question
I have Monthly rainfall data from 1901-2013 for 29 stations covering the entire state of Kerala. I took the first 80% of the data for training the model and the rest 20% for validating it. I developed SARIMA monthly model for forecasting rainfall. The reviewer has asked What is the scientific basis for forecasting rainfall over a point location (station) over a longer time scale (a Month)? What was the reviewer trying to convey by this question?
Hi Kabbilawsh,
You developed a statistical model based on the long-term observational data, which may have incorporated certain interannual variabilities intrinsic in climate system affecting the specific place. However, you have not understood what cause the interannual variabilities, or why the monthly precipitation is predictable. The reviewer would hope you to give an explanation (in discussion) about the drivers or mechanism underlying the variabilities.
Good lack.
Guoyu Ren
Question
Dear colleagues,
I used the R package for line*tester analysis (Agricolae library).
It calculated the General combining ability effects (GCA) of parents, but I don't know how to calculate the significance of the results.
In addition, I want to Estimate Narrow sense heritability and Heterosis (Better Parent (BP) and Mid-Parent (MP)) for hybrids by R.
Does anyone have a solution?
Regards
Dear @Mostafa Modarresi Please go through below mentioned books:
1. Biometrical methods in quantitative genetic analysis by Singh, R. K. and Chaudhary, B. D., Kalyani Publishers, New Delhi
2. Quantitative Genetics in Maize Breeding by Arnel R. Hallauer, J. B. Miranda Filho, and Marcelo J. Carena, Springer
In addition, you can also access a paper of your interest. Hopefully, you can find solution to your problem.
Question
What is the method to compare the performances on two different cognitive tests (that measure different cognitive functions) of the same or different group(s)?
As two cognitive tests are inherently different from each other and many a times, have different parameters.
It will be helpful if anyone can direct me to some useful references.
Thank you
Joan Jiménez-Balado I should have clarified above I was speaking specifically with respect to making statistical comparisons between different cognitive scales within the same sample. You are correct, however, the asker mentioned "same or different group(s)".
If there are independent groups, as described in your example, one could easily make statistical comparisons on any cognitive scale. However, I can think of no way to compare (statistically) scores on two separate cognitive scales within a single sample - unless perhaps you used some variation of a rank-order test and assessed whether individual ranks on one cognitive test are similar to the individual ranks on the other cognitive scale.
Question
Hello,
I am performing statistical analysis of my research data by comparing the mean values by using Tukey HSD test. I got homogeneous group in both small and capital alphabets. This is because of large number of treatments in my study. Is this type of homogeneous group is acceptable for publication in any journal?
You can use SPSS for this analysis but it is mostly done in Statistix 8.1 program
Question
We have sufficient seeds of 120 rice genotypes. We want to evaluate the genetic parameters related to early seedling vigour through replicated trials of these 120 genotypes. We didn't want to use Augmented design. Can anyone suggest us the appropriate experimental design for the same? I shall be highly grateful.
This depends upon seed available with you. If you do not want to go for augmented then conventional will work.
Question
Hi everywone!
I am performing a RDA analysis with vegan package in R.
I have a doubt regarding 'decostand' & 'scale' functions. Are they the same? Should I use one of them?
I have many soil variables ('data.var' : pH, CIC, N content, C/N ratio, microbial biomass, pH, CE, etc), and I was using both functions in my script:
.
.
data.var.sdz<-decostand(data.var,"standardize")
rda_indexes <- rda(data.var.sdz ~ data\$depth, data, perm=999,scale=TRUE)
.
.
I just realized that maybe this is is wrong. Any ideas on this?
Hi,
Using decostand + standardize is the same as scale. For example:
data(varespec)
sptrans <- decostand(varespec, "standardize")
ss <- scale(sptrans)
summary(ss[,1:3])
summary(sptrans[,1:3])
So, in your case, if you are using data.var.sdz, there is no need to use
scale=TRUE in rds function.
Using scale=TRUE will not do anything to data.var.sdz since it is already scaled.
Question
I would like to compare the food security status of two group of people. Here, I will use household hunger scale. sample size is large. then which statistics will be most suitable?
Question
Hii, please give me a complete guide or any material to analyse the Experimental data of RBD, CRD etc. to a find Genetic diversity, Character association, Path analysis & other Plant Breeding related experiments by using IBM SPSS
Dear @Rajasekhar Chowdary Duddukur As suggested by @Suyash Bhimgonda Patil, I also suggest you to access online portal of OP Stat for analysis of genetic diversity, character association, path coefficient analysis and other plant breeding experimental data. It is useful, and I have used it several times.
Best wishes, AKC
Question
I am interested in carrying out a Cointegration Analysis about the international prices of a few energetic resources (coal, oil, uranium, plutonium, agrofuels, eolic, hidroelectric) in order to stablish or to discard any type of relationship.
Could you advise how to preceed for this task?
Thank you to everyone reading.
Thanks for your good advise Mr. Ravichandran K R P
Question
How can I select variables for PCA analysis from huge set of data? except the biological significance of the variables, are there other criteria for objectively choosing the most relevant variables to take into account in a multivariate analysis (PCA in this case)?
Thank you in advance...
Question
Hello,
I hope you have a good time.
I work on a research project about temperature indices. Due to the high number of indices, I only work on tables and maps on an annual time scale. In other words, I do most of my analysis for the annual time scale. Now I want to draw a box plot for studied indices. It should be noted that I have access to daily data. Do you think, for example, I should plot the average air temperature box plot using daily data or annual data?
Also in the case of precipitation, is the box plot better drawn from daily data or annual data?
I am waiting for your answer. Also, it would be great if you could introduce some reference.
Amin Sadeqi, I think it depends focus of your study. If you are looking for a trend in the short term scale, for example, 5 through 10 years I would suggest daily time scale would be better representation of climate, both for the temperature and precipitation data. But if your objective is to observe the long term climate effect then, annual data with box and whsiker would be okay. All depends on the scale of the analysis. Even bar charts with error lines (max and min from average) would represent annual data if it conveys the message that you would like to make. I hope it helps!
Question
Hi everyone
I have a dependent categorical variable of three levels, corresponding to three sectors of activity in the agricultural field, A, B and C and within each of them there are sub-levels, for example under A there are four sub-levels a1, a2, a3 and a4.
Over time, for more profit, farmers change their activity, this change is subject to several factors like demands, financial support, etc. (we are talking here about independent, quantitative and categorical variables).
For example, a farmer who practiced activity A, changed his activity to B, i.e. He has completely changed the activity sector or can only change to a subsector, for example moving from “a2” to “a1” (the same applies to other farmers.
Is there a statistical technique that can be used to model these changes?
HI, may be you should consider a Markov (or semi-Markov) chain model to treat your data. Parametrization could be tricky as it may lead to numerous parameters but some are more parsimonious than a direct one.
Question
Hello;
I have made a dendrogram with mixed data (numbers, ordered factors and factors) using gower distance (daisy function in R) and cluster analysis with ward.D2 (hclust function in R) for a paper of the characterization of a plant specie.
However i am requested to calculate the bootstrap values in this dendrogram to show the confidence of this clusters, i understand that it is possible for this kind of mixed data but i have not found any R script or reference in how to perform it (for example pvclust function in R doesn't have gower distance option).
Thanks for the support
Hello Juan,
As long as the factors are recast as dummy variates (or similar; k - 1 variates for a k-level factor), then there's no reason that the dummy variates, combined with the continuous score variables, couldn't be used for a bootstrap analysis.
Good luck with your work.
Question
Hi everyone!
Reading many papers concerning soil repiration I've found different ways to report annual mean soil respiration rates:
1. g CO2 m-2 h-1
2. μmol CO2 m-2 s-1
3. g C m-2 h-1
The conversion from the first to the second (x/3600/44*106) and vice-versa is  "simple", what about the third? How to convert CO2 to C and C to CO2? How to compare them?
Thank you very much!
Cheers!
Hi there - I realize that this is an old thread, but I see an incorrect answer in the discussion regarding the conversion between μmol CO2 and g C and I wanted to address it. Each mole of CO2 has one mol C in it, or more formally 1 mol C/ 1 mol CO2. One mole of C has a molecular weight of 12 g / mol C. And 1 mol CO2 has 1e6 umols CO2 in it (1 mol CO2/ 1e6 umol CO2).
So x umol CO2 * (1 mol CO2 / 1e6 umol CO2) * (1 mol C/ 1 mol CO2) * (12 g C/ 1 mol C) = 12/1e6 g C
And similarly x mol CO2 * (1 mol C/ 1 mol CO2) * (12 g C/ 1 mol C) = 12 g C
I find it easiest to do unit conversions by writing them out as fractions and cancelling units that occur in both the numerator and the denominator of your expression. That way you can't miss anything.
In summary, there are 12 g C in each mole of CO2, which should also make intuitive sense because a mole of CO2 contains one mole of carbon and that is its molecular weight.
Question
Available nitrogen, phosphorus and potassium (NPK) content data are correlated applying with the multiple correlation coefficients formula, the obtained values are approaching 1 that indicates a strong positive relationships with each other, but each variable of the soil nutrients is considered to be dependent on the other variables changes, governed by the extraneous factors, that badly needs the regression fit line, but how could I couldn't find any formula for the calculation of the regression fit line for the multiple variables, is it available where two variables are independent and one is dependent, or is the regression fit line for the multiple variables relevant at all?
@Dipankar Bera
Yes, you are right. Only solution is to plot such q-q plot, residual vs best fitted and etc. Regression fit line for multiple variables is not possible but #Goutam Kumar Das sir you can use MLR model using R-Stat software to show plot in a single frame that helps to fit the regression line for each (One X as dependent variable and one Y as independent variable).
Question
A value of positive correlation coefficient 1 of the two variables means a perfect positive relationship, indicating a positive increase in one variable, there is a positive increase in the second variable - does it means if one variable remains constant without showing increasing trends, another variable is too a constant one?
Perfect correlation is deterministic, and it is a mathematical function. Eg: Y=2X.
It is possible when a third factor (covariate) influences and is not considered, correlation is not causation.
Remember it is for the data the relationship holds and cannot be extrapolated or generalized to new data.
Question
Dear scholars,
As we know that: the Critical Difference of LSD < Critical Difference Tukey < Critical Difference Scheffe So, I'd like to ask whether the choice of one of the following tests Fisher's Least Significant Difference (LSD), Scheffe’s test,Tukey's Test depends on the accuracy of the results that the researcher wishes to obtain?
Best wishes
Huda
I agree with Miranda Mortlock. Very nice explanation.
Question
Hello,
I have measured morphometric parameters of plants grown in vitro (height, root mass, etc.). I have one variable, thus two test groups - control and treatment group. I've made three independent biological replicates of the experiment, 30 plants per each biological replicate. In total there are 90 plants for control group and 90 plants for treatment group. I have done single factor ANOVA for each replicate and achieved high F numbers and very low p-values. My question is, is there a way, similar to ANOVA for repeated measurements, to analyse this data as three independent units, or should I just merge the 3 replicates into one data set?
Hi Doroteja,
Are you sure the 30 plants are one measurements each or did you measure 30 plants individually? In addition, how were the plants grown? All together in blocks?
This is important as what you determine as biological replicates may actually be blocks, which would allow you to use the individual replicates. But with inclusion of your experimental design as random variables in a mixed model or nested anova kind of setup.
Best,
Roel
Question
Hello,
I have little doubts about the statistical model that I am using to analyze my data. I have two groups of residue studies data Group 1 n=7 and the other group 2 n=47. they are independent and the studies are expensive and rare so I couldn't increase the sample size by any means.
I have tested the normality of both groups using the SPSS and found to be not normally distributed then I transformed all the data to fit the normal distribution using the square-root calculation - they fit the normal distribution p= 0.2 (more than 0.05).
which test should I use especially that I am using SPSS package
thanks
You said you "need to compare the mean[s] of the two groups," but then you are applying a non-linear transformation so that you are no longer comparing means. e.g.,
G1: 1 1 1 1 1 64, mean 11.5
G2: 4 4 4 4 4 4 4 4, mean 4
taking the square root
G1*: 1 1 1 1 1 8, mean 2.5
G2*: 2 2 ... 2, mean 2
So with the raw data the first group is higher, but after taking the square root is higher. If you NEED to compare means (and sometimes this is important) do not transform the data prior to either Welch's or Student's test.
Question
I am working on an experiment where I will be growing soybean in pots through all its stages in a greenhouse, I have a total of 65 plants in the beginning with 4 treatments and 1 control group making a total of 13 plants/replications per treatment. Mid-way through the experiment I will be taking 3 random samples from each group, a total of 15 plants, and removing them from the soil to look at nodule count and total plant biomass.
The experiment design that I have set up for pot placement in the greenhouse was done as a randomised complete block design with 13 blocks and 5 rows/treatments, my question is how do I take these 15 random samples from the experiment without destroying blocks and thereby making it impossible to do the statistical analysis that I plan to do? The reason for the design is to account for spatial variation in the greenhouse such as light exposure.
The figures I have made are my two suggestions for how I could do this. In figure 1 I show the design and placement of the pots, each letter A-E is one treatment (randomly assigned to a pot number for each block) and the numbers are the numbers that were assigned to those pots. In figure 2 I remove 3 randomly assigned full blocks from the experiment for destructive samples, and in figure 3 I took 3 random samples from each of the treatment groups independently and removed them.
I feel as if removing complete blocks like in figure 2 would be the best option, because then I would not be affecting the other blocks as happens in figure 3, the disadvantage of this option is that the samples are less spread and it would perhaps not account too well for variability in the greenhouse for the destructive samples, but do well for the non-destructive ones.
Figure 3 probably does well in accounting for spatial variability in the destructive samples, but it will be destroying the blocks for the finished plants and therefore it seems to me like it would not be possible to do a proper statistical analysis on the finished samples of which I care most about.
How can I solve this? I would also appreciate a suggestion on how to analyse these results if it is possible. Is there any other way I could take destructive samples like this without ruining it? I feel like growing the destructive samples separate from the main group in the greenhouse (but close to each other) would not account enough for variability.
My supervisors have said that taking the destructive samples will not affect the growth of the surrounding plants as long as I remove the pots and not clip the plants.
Eirik Lågeide Since each plant is in a pot of its own, as I assume, it should not matter if you keep the to be destructively sampled plants together and grouped apart from the main pots which you intend to take till the end. I would also suggest growing the experiment in the field where you can sample from the main plots itself. Plucking out the plants at proper places is to be planned. You can build an above ground rhizotron, maybe 5-6 ft of soil above ground level contained by bricks all around, then grow your treatments within for sampling as well as to full maturity. Remove the bricks from one side and carefully remove the soil with a steady jet of water which will help remove the plants from the soil without damaging the roots. This contraption mimics the open field for a more realistic environment. I understand you are using pots to handle the plants more easily.
Question
I want to compare a series of new treatments with existing standard treatment.
How do I use augmented block design in evaluation a single cross that has been advanced to F4 in rice
Question
In the experiment I have
Treatments: 1. light intensity with three levels (No shade, single shade, double shade) 2. Nutrient with three levels ( without nutrient, NPK (chemical), VC (organic) 3. Media with two levels (without topsoil , with topsoil)
data recorded are growth performance which includes morphological characteristics, physiological and biochemical:
Which analysis should I use for this experiment? I want to analyse whether the treatments given will affect the growth performance. Can I use ANOVA or any other more suitable analysis?
Well, I think Rabin Thapa is right. First the design should be finalised which can be Split split plot or Strip split plot of Factorial RBD. Then the analysis can be done as per the drsign used to conduct the experiment.
Question
Dear everyone,
do you have any experience on the analysis of yield stability or yield risk based on simulated crop yield data? I would like to use a modelling approach and simulate grain yields (of cereal crops like winter wheat) depending on different agronomic practices, soil profiles and weather data. Based on the simulated grain yield data, I would like to calculate stability parameters (like Shukla`s stability variance index, adjusted CV, etc.). The problem is that I don`t have an underlying field trial design with randomisation or real replicates - I`ll get only one yield value per environment (soil profile*year) for each agronomic practice (like cropping sequence, fertilisation variant, etc.). Do you have any experience with that problem?
With many thanks for your help in advance!
Janna
Hi Janna,
personally I don't see why you think that not having replicates is a problem in your case! When you analyse stability, you actually want to see the performance or yield of a crop in different environments. Replicates in field experiments will give you perhaps some more robustness but not different environments so you would have to average them anyway, or add a factor to take these into account, to avoid pseudo replication in your analysis. If you look at the stability index developped by Thomas Döring and colleagues, it applies to different sites/environment, but I don't think you need replicates to use it. Then of course your results will depend on the modelling assumptions, but this will be the case even if you had some random factors to create artificial replicates..
But there is perhaps a big issue I have overlooked..?!
Good luck with your work
Kind regards
Question
I performed a photo-interpretation where I laid randomly selected points on an aerial imagery and classified them into land cover classes (e.g tree, grass, etc). I then estimated the percent cover or amount of cover for each cover class and their Standard Errors (SE). I carried out the land cover assessment for the years 2010 and 2019. I am interested to know if there is a difference in a particular cover class (e.g tree cover) between the two years.
As an example, Tree cover values (i.e Area (ha) ± SE) for years 2010 and 2019 were estimated to be 3.61 ± 0.23 and 2.45 ± 0.20, respectively. I don't have replicates, all I have is the two values to compare, and I want to know if the decrease in tree cover is significant?
following
Question
Hi, everyone. What's the different, if any, between calculating the coefficient of variation of a group of data disposed in a experimental design with different treatments (for ANOVA) and a "free" group of data. I supose, there might be a difference in case of ANOVA, ´cause the treatments may affect the evaluated variable between groups of replications of the same treatment.
The coefficient of variation for a model can be calculated with the differences between predicted and actual values from the model, following a similar procedure as for CV for a "free" group which uses the differences between the values and the mean.
You can compare these by creating a model that has one predictor equal to the mean of measurement variable. Note that if you use a definition of CV as standard deviation / mean, if the formula for sample standard deviation is used, the model method and free group method will be slightly different. If you use the population standard deviation, they should be the same.
Question
Coefficient of determination R^2
Coefficient of Determination and Nash-Sutcliffe Efficiency are based on least squares so give greater weight to the peaks. A high CoD can indicate the peaks fit well but not the recessions. You can get a better idea of the fit of the recessions by taking logs. There are other measures you can use if you trawl through the literature however I used a combination of NSE and logNSE to indicate the fit of my models when working on my PhD. However what value to pick is a very good question - the answer to that is dependent on so many factors and is often a case of as good as you can get given the uncertainties involved. Good luck. It is probably good practice to determine your limits of acceptability before you start to model then throw out any models that don't comply - again trawl the literature particularly the work of Keith Beven.
Question
Dear Scientists,
Please, can you help me understand why my data give no value of interaction (see attached) on XLSTAT upon running the two way ANOVA. the value of type I, II and III give empty result as attached and I am not able to understand what is wrong.
any answer is welcome
Thank you
You fitted fixed effects for 67 labs. That eats up all your degrees of freedom. You shouldconsider a mixed model fitting "labID" as a random effect.
Question
I am curious if 5 replicates can get an accurate result of the experiment. Having 5 replication can be accepted? In your own idea or experiences, what are the minimum number of replication must be use in the study?
My study was all about the cassava must be grown in Greenhouse.
The probability of achieving a reasonable result depends on the true standard error (s) per unit, the number of replication (r), and experimental error (residual) degree of freedom (dfE) (Cochran and Cox 1957).
It, thus, depends on the number of treatments, availability of material, experimental unit and expenses. In the case of limited resources as minimum as 2 replications can be used provided that the number of treatments is high (say > 20) and in turn the number of dfE is high (say > 15).
Question
Dear All,
I am using ASREML-R to fit unstructured (UN) and factor analytic (FA) model to explore complex structure of genotype by environment interaction in multienvironment yield data. There are 160 genotypes across 30 environments. It is highly unbalanced data. I used two-stage approach so my data is in two-way GxE table of mean with a single observation for every genotype by environment combination.
The unstructured model is fitted as
UN <- asreml(fixed = fyld~env,
random =~us(env):id(gen),
rcov =~units,
data = met)
The warning message received is
158 singularities inAverage Information Matrix
Exit status: -158 - Singularity in Average Information Matrix
As for FA model, it is fitted as
FA <- asreml(fixed = fyld~env,
random =~fa(env,1):(gen),
rcov =~units,
data=met)
LogLikelihood not converged
DF loglik AIC BIC
1 61 -2265.464 4652.928 4951.688
And some output estimate follow.
I am thinking of fitting FA2 of order 2 but I am worried that FA1 is not even converged. Is there anything wrong with my script? Advise me please on what to do.
Hi Moshood,
You are doing a two-stage analyses. This means that you have to incorporate weights into your model as 1/se(pred), where your pred is the predictions from your first-stage. Then you need to use the family statement in asreml-R, as
family = asr_gaussian(dispersion=1).
I believe that is the reason you are finding problems. With the fixed error variance and with the weight statement you should be able to fit fa1 and even fa2. My only concern there will be how unbalanced is your data. In your 30 environments you need to have at least 2-4 genotypes in common for each pair, otherwise, they are efectively disconected, and hence no option to estimate a correlatoin between sites.
Good luck
Question
I have fitted a binary logistic regression model for my thesis. Can anyone help me to provide some documents that will help me to know more about comments on the model?
Yes. I have edited the question. Do you have any suggestions please?
Question
Dear Scientists,
Greetings
Please, could anyone give me an alternative to analyse data generated from an augmented Block design layout?
The Following known softwares are not working! Could anyone know the reasons? I urgently need your help!
Here are the softwares/links
Indian Agricultural Research Institute, New Delhi
•Statistical Package for Augmented Designs (SPAD)
•SAS macro called augment.sas
CIMMYT – SAS macro called UNREPLICATE
•Developed in 2000 – uses some older SAS syntax
Regards
None of your links worked, so maybe explain what you are trying to achieve. have you thought of using R which is freely available and a supported Open Access Program.
There are augmented block designs in the R package agricolae .
These are designs for two types of treatments: the control treatments (common) and the increased treatments. The common treatments are applied in complete randomized blocks, and the increased treatments, at random. Each treatment should be applied in any block once only. It is understood that the common treatments are of a greater interest; the standard error of the difference is much smaller than when between two increased ones in different blocks.
Question
Dear Scientist,
I kindly plead for any one who has an idea on this to clarify me. I ran a two way ANOVA with data collected from two locations and got no location X genotype interaction, then a reviewer is requesting me to present represent replication within location effect and triplicate within sample effect.
Please, can any one give an idea on this? I use XLSTAT to run the analysis.
Thank you
The attached image is of model parameters. Presenting this may also be helpful but this is not what the reviewer is asking. The reviewer is asking for differences between pairs of species within location A and differences between pairs of species within location B.
You have not done anything wrong. All you need to do is to provide the output of pair-wise comparisons from XLSTAT.
For example: At cite A. The species differences are:
Specie 1 - Specie 2 = 10.24 - 16.29 = -6. If you get this from XLSTAT, it will also give you standard error, confidence intervals and z-values and p-values.
Question
I have recently collected data on a study area divided into three sites and each site was divided into four categories and each category was subdivided into 5 transects. Each transects were also subdivided into 5 quadrats which were also subdivided into 2 subquadrats each.
The data collected were actually about natural regeneration of balanites species.
Considering that my design is a stratified sampling, I would any advise on the best method, procedure and statistical package for analysis of such data. I have tried Two-way anova with XLSTAT but i feel it is not clear to me.
Regards!
Dear Prof. David Eugene Booth,
Thank you, also for the helpful additional information on multilevel regression.
Kind Regards,
Lana Dobrindt
Question
Hi all,
I have a data set, including soil respiration and carbon amount. After plotting a scatter plot respiration vs carbon amount, I can see there is kind of positive relationship between them. I want to check if this relationship statically valid. But my problem is, soil respiration have much more data than the amount of carbon. Precisely, respiration was collected at 5 treatments, each treatment have 4 replicates, and sampled at 5 days, so in total there are 100 samples recorded. Carbon amount: 5 treatments, and each treatment has 3 replicates, so in total there are 15 samples recorded. It looks like impossible to do normal linear regression. My question is "is there any statistical way to measure their relationship in such a limited data set"?
Many thanks,
Xin
Thank you all for the kind answers. As I understand, to quantify relationship, x and y should always be paired? So if I have more x than y, then it is impossible to know there relationship?
I made the scatter plot based on their means, not all the measured samples. Is this wrong?
Question
I am trying to compare soil carbon concentrations from different studies.
The problem is that most of them use different sampling depths:
0-10cm
0-15cm
0-20cm
0-30cm
...
Is there an acceptable technique to mathematically adjust concentrations to other sampling depths?
I understand that it will be a rough estimate since it depends on the distribution on C.
The concentration of C can decrease rapidly with depth, or conversely decrease only slowly...
Is it tolerable to define a model distribution to convert concentrations to subsequently make comparisons?
(Example for model distributions on the attached picture)
Dear Thomas,
I strongly suggest not to convert. Indeed soil organic carbon concentration typically decreases with increasing soil depths, but the degree varies strongly between soil types. Thus, it does not excist a general rule or equation that you can apply. However, converting carbon concentrations into carbon stocks (stock=carbon concentration*thickness of soil layer/horizon*soil density) works well and is widely accepted in soil science.
In case soil density (or bulk density) has not been reported you can apply a pedo-transfer function introduced by Post & Kwon 2000 (doi: 10.1046/j.1365-2486.2000.00308.x ). Land use change may have an impact soil mass, which you can address by applying an approach introduced by Ellert & Bettany 1995 (doi: 10.4141/cjss95-075 ).
BR Nils
Question
I wish to analyze the effect of tillage, nitrogen and cover crops on weed communities. The design can be described as the following:
- 4 blocks
- main plot factor: tillage (nested in blocks)
- sub plot factor: nitrogen (nested in tillage)
- sub sub plot factor: cover crops  (nested in nitrogen)
Hence, in a classical univariate mixed model (lme syntax), the error structure would be defined as (1|block/tillage/nitrogen) if only one sample is present at the sub sub plot level.
Now, I wish to analyse in R the effects of tillage*nitrogen*cover crops (all simple effects, all second order interactions and the third order interaction) on weed communities through partial CCA (to filter out the block effect).
I believe the following CCA model is correct (in {vegan}):
cca2013=cca(log(cover_2013+1)~tillage*N*CC+Condition(block),data=cover_2013)
And from what I gathered on the {permute} vignette,  the permutation design should be defined the following way:
h<- how(within = Within(type = "free"), plots = Plots(strata = interaction(block,tillage), type = "free"), blocks = block)
Is this correct? Am I not missing an additional level of nestedness?
Finally, the significance of each term would be tested the following way:
anova(cca2013,by="terms",permutations=h)
I would be grateful if someone could provide their expertise.
Sincerely,
I don't think your model structure is accounting for the type of permutation you want. I recommend you read Winkler et al (2015) to make sure which effects you really want to test using your experimental design. Once you decided, you can build a small randomization script to build the null models for your hypothesis.
An alternative is to explore the function nested.anova.dbrda() in the BiodiversityR package.
I hope this helps.
Cheers
Winkler AM, Webster MA, Vidaurre D, Nichols TE, Smith SM (2015) Multi-level block permutation. Neuroimage 123:253–268. the Condition() terms your are using to test your model is working properly
Question
It usually happens when data is analyzed by independent T-test, results are significantly different. However, ANOVA with post hoc gives P value way different than T-test and non-significant. What would be better tool to analyze any data?
Tukey's HSD controls the family-wise error rate (FWER), individual t-tests don't. So if you want to control then FWER, you must use Tukey's HSD. If not, then individual t-tests may still not be a good option. It's almost always better to use a pooled variance estimate. Such tests are known as Fisher's LSD. If you think that pooling variances is not ok for your data, then you should ask yourself is comparing means (and doing t-tests in general) really makes sense.
Question
I cant find any material which provides the procedure to calculate the contribution of each character to the total divergence. If anybody have materials, please share me...
You can use PCA on XLStat, R and and so many.
Question
Example experiment: I spray a chemical with 3 concentrations on some plants and record their survival rate. The result is as followed:
Concentration (mg/l)...............Survival rate (3 replicates)
..............1.........................................100%; 100%; 100%
..............2.........................................90%; 80%; 90%
..............3.........................................50%; 40%; 30%
So with these kind of data, what type of statistical test and post hoc test should I use to analyse them?
By definition, it seems Fisher's exact test (or Chi-square test) fits these data, but the format of the tests isn't exactly the same (they don't use proportion but count number instead). So can I use Fisher's exact test here, do I need to transform the data format, and what post hoc test do I use to do pairwise comparison among treatments?
Duy Minh Pham , for the simple chi-square or Cochan-Armitage tests, each plant counts as an observation, so you have lots of replication with that approach. In this approach, your "replicates" are just for convenience. That is, you are just clumping them into groups of 10 to make it easy. There's no meaning to the replicates. That is, it's not the case that e.g. one replicate is from one clone and another is from another.
In an alternative approach, you could use each replicate as an observation. Here you might use correlation of Survival(%) vs. Concentration. Because the trend in your data is so clear, this will also work for you, even though that gives you only 9 observations.
I attached some code in R just to make these ideas a little more concrete. A couple of notes: 1) I used Spearman correlation for the correlation. 2) I used Monte Carlo simulation in the Cochran-Armitage test due to low cell counts.
Question
I ran through one paper having similar kind of data I have but couldn't understand the statistical approaches they used. Can somebody provide useful information about over-dispersion (any examples)?
In many distribution models, the variance is not a "free" parameter. In such distributions, the variance depends (somehow) on the mean. A very famous example is the Poisson distribution which is used to model count of "event" observed in a given "interval", where the process is known to "produce" these events with a given constant rate λ. The mean (expected value) of this distribution ist just λ (events per interval). The variance of this distribution is also λ. Thus, the higher the rate, the more events per interval will be expected, and the higher the variance will be.
Now if one observes events from a real-world process and assumes that this is a process producing events with a constant rate, then one should get data where mean and variance are (quite) similar. However, in real data we often find that the variance of such count data is considerably larger than the mean. This indicates that our assumption that the rate is constant is not adequate. The fact that the variance is larger than the mean is called "over-dispersion".
The presence of overdispersion tells us that there is additional uncertainty in the rate as well. This can be considered in a probability model. If this is pluged into the Poisson distribution, the result is the negative binomal distribution that can handle over-dispersed data much better than the Poisson distribution.
Starting from here, you can find more information in Wikipedia and numerous examples all over the web, if you google for "over-dispersion", "poisson", "negative binomial", "analysis of count data" etc.
Question
The price of aged Basmati rice (Often sold by the well known wholesalers like Daawat, Indiagate, Lalmahal etc) is quiet higher (3 to 4 times) than the freshly harvested rice in the Indian market, as natural ageing enhance and intensify its taste, aroma, and cooking characteristics.
So should we recommend farmers to keep a part of their produce, and sell it later for getting higher value ?
& what are the ways to store these grains for long term at farmer's place without their quality being deteriorated?
Respected Dr Saab Manpreet Jaidka
I think we are well aware of these facts and figures,
& also knowing that rice-wheat cropping system is deep rooted extensively in a major geographical area of both Haryana and Punjab.
But my main concern was that, what our universities should recommend to the farmers regarding this, so that a considerable portion goes into the pockets of farmers, rather than some other stakeholders who just sit in their Air Conditioned premises whole day, stealing the shares of farmers.
Question
I have found in a book ( Statistical Procedure in Agricultural Research by Gomen and Gomez), the authors used correction factor to estimate total sum of squares. I am interested to know what this correction factor means and why it is used in ANOVA analysis? Could anyone please explain this? Thanks in advance for your help.
Correction factor is defined / given by
Square of the gross total of observed values /Total number of observed values
The sum of squares (SS), used in ANOVA, is actually the sum of squares of the deviations of observed values from their mean. After algebraic simplification, the SS has been found to be the sum of squares minus the correction factor. Accordingly, the correction factor helps in computing the SS from the raw sum of squares in stead of computing the the sum of squares of the deviations of observed values from their mean.
Question
If someone conducted an experimental trial of any field crop, repeated for 2 subsequent years
• With 3 different dates of sowing (Factor 1),
• & 3 different varieties (Factor 2),
• replicated 4 times in split plot design
then how one can present the findings without simply doing its pooled analysis i.e. by comparing the results, taking “year” as one of the factor during statistical analysis?
Because In most of the publications it is quite apparent that, when researches take the similar observations during repeated trials they usually take the average of parameters under observation and go for the statistical analysis of their pooled data.
Respected Dr. R. Hadria
Thank you for your suggestions.
But Principal component Analysis is somewhat similar to factor analysis (a data reduction technique), where a large number of variables with too large number of observations which are hard to study and interpret properly are reduce to a few variables in a more meaningful form. Even the Extraction Method used by SPSS in Factor analysis is: Principal Component Analysis.
It is generally applied in Genetic and plant breeding studies where experiment is conducted to test a number of traits/characters from wider gene pools.
I was concerned about the statistical analysis of simple growth, yield and phonological observations that are generally taken by Agro-meteorologists and Agronomists during their repeated trials over the years.
Question
I have performed an experiment where treatment effects have been represented by - sign for decrease in mean brood area in honey bee colony and + sign for increase in brood area in honey bee colony. I tried to performed the arc sine transformation but it didn't work.
Use binary logistic regression analyses for percentage data.no transformation is required in such cases.
Question
Hello,
I recently analysed count data (in this case the no of stems per tree) using a GLM. The analysis was carried out on both log transformed and square root transformed data. Although most report that the square root transformation should be used on count data, I got a model with higher R2 and lower error values when carried out on the log transformed data. Which would you recommend I report on?
Regards
Glen
how to transform data square root on software statistix 8.1 software
Question
I have a short question regarding the data analysis. I have 2 data sets (data data frames *df). The first one is a signal which is yes / no / maybe and a second one is a data set which is response to this signal in percentage. The percentage increases after each yes and maybe (Like a rain fact and plant disease level). The signal response (second data set is delayed in time). I need to find out the relationships between this to data sets (signal and response). Which is the best method to do this in R.
PS.I tried cross correlation but results are not clear.
Thank you all
Example
Date Signal Response
01.02.2018 / no (as o) / 0%
01.03.2018 / yes (as 2) / 0%
01.04.2018 / no (as 0) / 0%
01.05.2018 / maybe (as 1) / 2%
etc....
Maybe just calculate the pearson or spearman correlation:
signal response
0 0
2 0
0 0
1 2
Question
Hello,
I am preforming research to check the impact of different storage bags on the damage parameters of maize due to red flour beetle at different storage times.
Treatments: 8 storage bags (SB)
Infestation time: 0, 15, 30, 45, 60, 75, 90, 105, 120, 135 and 150 days
Parameters: Percent grain damage (GD%), Final adult density (FAD), Percent weight loss (PWL) and Weight of insect feeding residues (WIFR).
How to statistically analyze the data of this experiment.
Data of one parameter is also attached with this question.
Thanks
Yes, your ANOVA is incorrect you have a 2 way repeated measures ANOVA instead. You can run this in SPSS. Details for running this in SPSS can be found here:
Best wishes, David Booth
Question
I want to improve my knowledge in this area. I would like indications of good books that you have studied.
Hello
1.Richard Johnson and Dearn Wichern (2007). Applied Multivariate Statistical Analysis.
2.Joseph Hair, Wuilliam Black, Barry Babin, Rolph Anderson and Ronald Tatham (2006).Multivariate Data Analysis.
3.Norman R. Draper, Harry Smith (1998). Applied Regression Analysis, 3rd Edition, Wiley.
Regards,
Zuhair
Question
Does pca have to be done to eliminate items?
Dear Shirelle
PCA transforms a number of correlated variables into a smaller number of uncorrelated variables.
Question
One-way ANOVA requires n=30. However, of two hormones (BAP and 2-iP) tested at different concentrations (0, 1, 2, 3, and 4 mg/L), in order to achieve 30 samples or replicates, we can do many ways as follow:
A) Do ALL concentrations of TWO hormones with 30 samples at one time,
B) Do ALL concentrations of TWO hormones with ONLY 3 samples but repeated 10 times,
C) Do ALL concentrations of TWO hormones with ONLY 5 samples but repeated 6 times,
D) Do ALL concentrations of TWO hormones with ONLY 10 samples but repeated 3 times,
E) Do ONE of the concentrations of ONE hormone first for 30 samples, experiment is repeated by changing the concentrations or the type of hormone used.
May I know which method is the most suitable? Why?
I was being told that, by repeating more times, I can't use one-way ANOVA already even it is fulfilled the assumptions (n=30, normal and equal var among each population)...by increasing the replicates, I have to use more complicated analysis like two-way ANOVA/ multivariate analysis...Any experts can help me for this also?
The study objectives, hypothesis (or research questions) as well as the study design should be clearly stated to be able to answer your question
Question
Dear all,
do you know if:
1 - can I run an RDA with negative (taxa) values (as delta Control - Treatment)?
2 - Do I have to use the function decostand function on these delta values before performing the RDA?
3 - Shall I use Bray-Curtis distance (dist='bray") in the RDA function?
Best
Alessandro
I have no idea yet.Good question.....
Question
I am working on the evaluation of some tomato cultivars. I am wondering if you can give some suggestion on how genotypic and phenotypic correlation coefficient calculation and path coefficient analysis work in statistical program. It would be really nice if you can suggest any statistical program that is convenient for this calculation and analysis .
Dear Chea,
As per my knowledge no software gives better results than indostat. it is better you go with indostat.
Question
Good day,
I need to evaluate a total of 26 rice varieties in terms of their morpho-agronomic characteristics and yield components.
Among the 26 varieties, 18 are irrigated lowland varieties and 8 are upland varieties.
In RCBD, similar experimental units are grouped into blocks/replicates to control variation in an experiment (spatial effects of glasshouse).
In my case, the size of the pot, the soil volume and number of plant/pot and the replicates are the same. However, the soil type and fertilizers application are different for irrigated lowland and upland.
Therefore, I have few questions to ask to clear my doubt.
1. Is this experiment still consider as a single factor (genetic material) experiment?
2. Can we group all the 26 rice varieties (including lowland and upland) in a block?
3. Or two blocks are needed for lowland and upland, respectively? (Lowland block) (Upland block)
4. If there are two blocks, would it affect the statistical analysis? - Analysis of variance (ANOVA) and Duncan’s multiple range test (DMRT) -Genotypic and phenotypic correlation coefficients
In my 18 lowland rice varieties, I have 7 which are high yield and 11 which are low yield. My research questions are:
1. Are the morpho-agronomic characteristics and yield components of high-yielding lowland rice accessions differ from low-yielding lowland rice accessions? (High yield lowland vs low yield lowland)
2. Are the morpho-agronomic characteristics and yield components of high-yielding lowland rice accessions differ from upland rice accessions? (High yield lowland vs upland)
3. Are the morpho-agronomic characteristics and yield components of low-yielding lowland rice accessions similar with the upland rice accessions? (Low yield lowland vs upland)
Can I still compare the results obtained from different blocks if these are the questions that I would like to address in my study?
Thank you
I think no
Question
To study the effect of fibrolytic enzymes on straws degradability, I have as variables enzymes mixture (n=2), straw types (n=4), enzymes levels (n=4).
the enzyme levels are not the same for both enzymes mixture.
You are welcome.
Question
A field study of 4 levels of biochar (Factor A) and 4 levels of nitrogen fertilizer (Factor B) ; (16 treatment combination) is underway. Field study was set up in RCBD and replicated thrice, however i have a problem in selecting statistical model to use for data analysis.
I intend to use the general linear model (GLM), but i am not so sure if the model is the best fit for analysis. I need suggestions and guidance.
Hello Dr. Segun,
You can apply two models by MSTAT-C to analysis this experiment:
1. Two Factor Randomized Complete Block Design (RCBD 2 Factor - Model a) ... With one error + Replications.
2. Randomized Complete Block Design for factor A with factor B a split plot on A (RCBD 2 Factor - Model b) ...With two errors + Replications.
I'm prefer model (b) for your experiment .... Kindly find the attached photos to the list and ANOVA of "a" and "b" models.
Best wishes
Nehal
Question