ArticlePDF Available

Modeling Unstructured Heterogeneity along with Spatially Correlated Errors in Field Trials

Authors:
  • International Center for Agricultural Research in the Dry Areas (ICARDA), Amman, Jordan

Abstract and Figures

In this paper we consider analysis of two experimental data sets for evaluating lentil genotypes. One of these data sets comes from an incomplete block design and the other one from a complete block design. The incomplete blocks contribute to the experimental error reduction and spatially correlated plot-errors can be modeled using autoregressive scheme that may lead to further improvement in the assessment of the genotypes. Such an approach was applied in several other studies to model the linear trends and spatially correlated errors. However, the assumption of a constant error variance restricts the scope of the analysis in many agricultural field trials, and in other situations in general, where heterogeneity of error variances is a reality. In this study, we have approached the problem first by fitting a model with constant error variance and generating the residuals. Using the squared residuals, we use K-cluster means technique to group the experimental units for similar squared-residuals. Next, we allow the error variances to vary with the group of the experimental units which need not require any spatial restrictions to model the error variances. The number of heterogeneous errors and the experimental units belonging to the heterogeneous clusters are obtained using the AIC criterion values followed by a groups merger scheme based on insignificant change in the residual maximum log likelihood values. The final models with heterogeneous variances were used to evaluate the precision of the genotype means comparisons. We found a substantial improvement on the effciency of the pair-wise comparisons over the other ways of analysis. We recommend the application of this procedure in any general situation permitting unstructured heterogeneity.
Content may be subject to copyright.
Available online at www.isas.org.in/jisas
JOURNAL OF THE INDIAN SOCIETY OF
AGRICULTURAL STATISTICS 64(2) 2010 313-321
Modeling Unstructured Heterogeneity along with Spatially Correlated
Errors in Field Trials
M. Singh1*, Y.P. Chaubey1, A. Sarker2 and D. Sen1
1Department of Mathematics and Statistics,
Concordia University, Montreal, Quebec, Canada
2International Center for Agricultural Research in the
Dry Areas (ICARDA), P.O. Box 5466, Aleppo, Syria
Received 11 August 2009; Revised 21 July 2010; Accepted 23 July 2010
SUMMARY
In this paper we consider analysis of two experimental data sets for evaluating lentil genotypes. One of these data sets
comes from an incomplete block design and the other one from a complete block design. The incomplete blocks contribute to
the experimental error reduction and spatially correlated plot-errors can be modeled using autoregressive scheme that may
lead to further improvement in the assessment of the genotypes. Such an approach was applied in several other studies to
model the linear trends and spatially correlated errors. However, the assumption of a constant error variance restricts the scope
of the analysis in many agricultural field trials, and in other situations in general, where heterogeneity of error variances is a
reality. In this study, we have approached the problem first by fitting a model with constant error variance and generating the
residuals. Using the squared residuals, we use K-cluster means technique to group the experimental units for similar squared-
residuals. Next, we allow the error variances to vary with the group of the experimental units which need not require any
spatial restrictions to model the error variances. The number of heterogeneous errors and the experimental units belonging to
the heterogeneous clusters are obtained using the AIC criterion values followed by a groups merger scheme based on insignificant
change in the residual maximum log likelihood values. The final models with heterogeneous variances were used to evaluate
the precision of the genotype means comparisons. We found a substantial improvement on the effciency of the pair-wise
comparisons over the other ways of analysis. We recommend the application of this procedure in any general situation permitting
unstructured heterogeneity.
Keywords: Heterogeneous error variances, Spatially correlated errors, Variogram, Clustering, Field trials.
1. INTRODUCTION
Control of field variability is normally done by
applying blocking methods where experience with the
obvious landscape configuration guides the formation
of the blocks for assigning the treatments such as
genotypes of a crop to the field-plots, i.e. the
experimental units. Furthermore, the design may
consist of complete blocks or incomplete blocks
allowing a certain degree of balance under a constant
error variance model. Such approaches are discussed
in standard texts on design and analysis of experiments
(see for example, Fisher 1990, Cochran and Cox 1992,
*Corresponding author : M. Singh
E-mail address : smurari@mathstat.concordia.ca
Cox and Reid 2002 and Hinkelmann and Kempthorne
2007). In the context of field experiments, the
experimental units on a rectangular layout would
generally be correlated due to their fixed physical
proximity, and, in addition, there might be presence of
local fertility trends. Analysis approaches in these
situations have been developed in order to account for
blocking effects and correlated errors in space and time
(see Gilmore et al. 1997, Cullis and Gleeson 1991 and
Grondona et al. 1996). Various criteria such as Akaike
information criteria (AIC) have been used for selection
of appropriate covariance models in these areas (see
Wolfinger 1996 and Singh et al. 2003).
314 M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321
The approaches used in the analysis to capture
spatial variability in field trials have been found useful
in enhancing the breeding efficiency of crop variety
improvement programs (Sarker et al. 2001, Malhotra
2004). The underlying models in most of these analyses
have assumed homogeneous error variances across all
the plots of the layout. We believe that in reality,
experimental errors need not be homoscedastic even
after accounting for various local fertility trends and or
autocorrelations across various directions in the layout.
This may be due to variety of reasons. In field trials,
lack of homogeneity may be attributed to ineffective
cover cropping in the preceding season, or the farmers
fields used for experimentation having been subjected
to the application of crop management input factors to
the where-needed plots or sections of the field. In a well
designed blocking experiment, the uniform application
of the management practices over the whole of a block
might have been overlooked or ignored. It is also
possible that the prevalence or distribution of
underground parasites such as orobanche in legume
fields or striga in the sorghum fields, may follow
irregular pattern and make the nearest neighbor
adjustment unreliable (Wilkinson et al. 1983).
Therefore, it is essential to allow for heterogeneous
error variances in the field trials in addition to
accounting for the other factors. The heterogeneous
error variance need not follow any spatial structure on
the field layout. The general objective of this study,
therefore, is to address the unstructured heterogeneity
of error variances in evaluation of variety trials and
apply on lentil data.
The identification of the sources and the structure
of heterogeneity is based on residuals from the fitted
model found most suitable when the heterogeneity was
ignored. The squared residuals were used to form
clusters or groups of homoscedastic experimental units
and to identify the structure of homogeneity, if any, by
using an empirical or non-parametric approach. The use
of squared residuals for studying the homogeneity of
variances have also been found robust to the departure
from normality (Levene 1960). Since, no clear structure
is expected in the residuals, non-hierarchical approach
or K-means clustering could be applied to obtain the
prevailing clusters of units with homoscedastic units.
Other alternative methods of clustering could also be
used (Everitt et al. 2001). The most appropriate cluster
could be determined from the trend of a criterion values
and the change in the log-likelihood value for the
heterogeneous models. This study uses data from two
lentil trials with relatively high coefficient of variation
that are described in Section 2. The statistical methods
for identifying the structure of heterogeneous errors are
given in Section 3, computational details appear in
Section 4 and results are summarized in Section 5.
2. EXPERIMENTAL DATA
Two trials consisting of genetic materials for a
preliminary yield trial (PYT) and an advanced yield trial
(AYT ) were evaluated in block designs at an
experimental station of the International Center for
Agricultural Research in the Dry Areas (ICARDA) at
Breda in northern Syria. Data on seed yield were
examined. Trial 1, a PYT, had 25 genotypes and was
evaluated in a square lattice with 4 replications on a
4 ´ 25 rectangular layout in 2005. In field trials, the
coefficient of variation (CV ) is normally used as an
indicator or a measure of experimental error variability.
The analysis using randomized complete block design
resulted in a CV of 51% for seed yield. Trial 2, an
AYT, was conducted in randomized complete blocks
with 30 genotypes and 3 replications on a 3 ´ 30 layout
in 2003 and gave a CV of 41% for seed yield. In the
PYT, the plot size was 4m ´ 1.5m, and in the AYT it
was 4m ´ 3m with a standard row-to-row distance of
30cm for lentil crop. However, at maturity, actual
harvest area per plot was 4.5m2 and 9m2 for the PYT
and the AYT, respectively. Analysis was performed
based on net harvested area per plot.
3. MODELING HETEROGENEITY OF
SPATIALLY CORRELATED ERRORS
The two data sets were first analyzed by fitting the
best spatial models described in Singh et al. (2003) to
screen the AIC best model out of the group of models
generated by various combinations of complete or
incomplete blocks, fixed linear, random cubic spline or
no trend, and first-order autocorrelated along rows and
columns or independent errors. In the two trials, the best
model for seed yield was found to be randomized
complete blocks with first-order autoregressive errors
along rows. At this stage, each model was based on the
assumption of homogeneous error variances. In order
to examine any possible indication of heterogeneity, the
residuals obtained from the fitted models in above can
be plotted and their variograms can be examined as well
M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321 315
(see Sarker et al. 2001 for details on obtaining the
variograms). Figs. (1 4) exhibit 3D plots of residuals
and their variograms for the two trials. We noticed no
clear spatial patterns in the residuals (Figs. 1 and 2).
This can be expected since we have screened various
models accounting for the presence of linear trends in
the field layout and the residuals are computed from
the best models as obtained using Singh et al. (2003).
Another way to explore the variability is in terms of
the variograms, which indicate the presence of different
levels of variability between the residuals over the
layouts. For instance, in Fig. 3, the variogram of the
residuals in the PYT (2005) indicates that there is a
variation in the variances of the plot residuals: 0.40 
0.65 for plots within 2 plot-units, fluctuating values
within 0.4 0.6 for distances from 3 22 units and
variation from 0.2 to 0.6 for plots separated by more
than 22 plot-units. There is no clear spatial pattern to
allow modelling of the variogram with different values
for nearly the same distances. In Fig. 4 (AYT, 2003),
the variogram indicates different levels of variances:
less than 0.3 for distances within 5 plot-units, between
0.3  0.4 is fairly constant between 5 to 23 plot-units,
while a higher value of nearly 0.48 and low values close
Fig. 1. 3D plot of the residuals from RCB-AR model analysis of
seed yields in the preliminary yield trials (2005) in 25
genotypes (RCB-AR model: The model incorporates
random replication effects and first-order autoregressive
plot-errors across columns)
Fig. 2. 3D plot of the residuals from RCB-AR model analysis of
seed yields in the advanced yield trials (2003) in 30
genotypes (RCB-AR model: The model incorporates
random replication effects and first-order autoregressive
plot-errors across columns)
to 0.2 are observed for distances exceeding 23 plot
units. Here also, there is no clear spatial pattern for
distances more than 23 units. Thus, these cases support
Fig. 3. Variogram of the residuals from RCB-AR model analysis
of seed yields in the preliminary yield trials (2005) in 25
genotypes (RCB-AR model: The model incorporates
random replication effects and first-order autoregressive
plot-errors across columns)
316 M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321
the need of examining non-spatial or unstructured
heterogeneity in the plot error variances.
In addition to the visual approach of exploring
heterogeneity in the above data, we also applied the
method presented in Chaubey (1981) for detecting the
presence of heterogeneity of variances in the data. The
residuals were ordered based on their absolute values.
The variances were computed using these ordered
residuals from (a) two groups formed from the highest/
lowest 50% of the residuals, and (b) three groups from
lowest/highest 33% of the residuals. The F-test was
used with residual degrees of freedom equally allotted.
As can be seen in Table 1, there is an indication of
presence of the heterogeneous error variance in the
data.
In the presence of heterogeneity, the next question
is to identify the experimental units groups with
heterogeneous errors variances. For this purpose, we
follow the following two-step procedure.
Step-1: Formation of clusters: Based on the best model
selected using Singh et al. (2003), we first applied K-
means clustering on its squared residuals using the
criterion which maximizes between group sum of
squares. The number of groups, set a priory, varied
from K = 2, ... 10. The change in the criterion values
were noted with successive values of K, the number of
groups or clusters of the experimental units. The value
of K, for which the change was not substantial was
considered as the potential number of clusters. For each
set of clusters of the experimental units, we modeled
the data using the spatial errors as per the best model
and a random factor where error variances were allowed
to vary with the cluster of units obtained for a chosen
value of K. For example, if K = 3, there were three error
variances,
22
12
,
σσ
and
2
3
σ
. For such a fitted model, we
computed the likelihood value in terms of 2ln (REML:
residual maximum likelihood) value and the successive
increase in its values with a unit increase in K. At this
stage, it is not likely to have a nested structure defining
the heterogeneous with increase in the number of
groups, therefore, we can not apply a test of
significance (such as chi-square) on the decrease in the
2ln(REML), however, we can use Akaike information
criterion (AIC) to decide on the number of groups,
smaller AIC is better. We used Genstat (Payne et al.
2009) for the computation which produces a quantity
Fig. 4. Variogram of the residuals from RCB-AR model analysis
of seed yields in the advanced yield trials (2003) in 30
genotypes (RCB-AR model: The model incorporates
random replication effects and first-order autoregressive
plot-errors across columns)
Table 1. Preliminary indication of heterogeneity of error variances using approximate F-tests in the data on seed
yields of the two trials at Breda, Syria
(a) Trial 1: Preliminary yield trial, 2005
Two groups:
2
1
s
= 10.01
2
s
= 20.26 F36,36 =
2
2
s
/
2
1
s
= 2.02 Pvalue = 0.0204
Three groups:
2
1
s
= 10.00
2
3
s
= 30.23 F24,24 =
22
31
/
s
s = 3.02 Pvalue = 0.0046
(b) Trial 2: Advanced yield trial, 2003
Two groups:
2
1
s
= 10.00
2
s
= 20.21 F29,29 =
22
21
/
s
s = 2.02 Pvalue = 0.0330
Three groups:
2
1
s
= 10.00
2
3
s
= 36.85 F13,14 =
22
31
/
s
s = 3.02 Pvalue = 0.0093
Note: F-test is based on Chaubey (1981) adapted to the fitted models.
M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321 317
called deviance which is equal to 2ln(REML)
ignoring a constant which depends on the fixed effect
terms. We used the quantity AICD which expresses AIC
in terms of the deviance where AICD = deviance  2q
where q is the number of covariance parameters (Singh
et al. 2003).
Step-2: Fusion of the clusters: Step-1 provides a
number of clusters, say K, with heterogeneous error
variances (
2
j
σ
, j = 1 ... K). Let the deviance at this step
be D0. The error variances were arranged in order, we
merged those two clusters which were the closest for
the values of their error variance estimates. Then the
model was fitted with, now, the reduced number of
clusters (K 1) and the deviance was computed, say
D1. Since the fusion of the clusters presents a nested
structure of the units, it is possible to test the hypothesis
of the equality of the variance components of the two
merged clusters. In the case of equality of the variances,
the difference D1 D
0 will follow a chi-square
distribution with 1 degree of freedom. If the observed
difference is greater than the chi-square value at the
chosen level of significance, then the number of clusters
K available at Step-1 will be taken as final, and the
estimation of the genotypes effects will proceed with
the K error variances. If the observed difference is
smaller than the chi-square value, then the K 1 merged
clusters will be considered for further analysis
repeating the process of fusing the clusters with closest
error variance estimates, and evaluating the change in
the deviance against the value of chi-square with 1
degree of freedom.
4. ESTIMATION OF THE VARIANCE-
COVARIANCE PARAMETERS
We present here a general model and a
computational procedure for estimation of the variance-
covariance parameters. Let y = (yijk) be the vector of
responses or yield from the plot receiving the i-th
genotype (treatment) in the k-th incomplete block of the
j-th replication of the design used. The vector y can,
equivalently, be denoted also by y = (yRC) as well where
R,C denote the row and column coordinates of the plot
associated with indices i, j, k. The model for yijk is given
by:
yijk = m + pj + bjk + ti + eRC
where m is the general mean, pj is the effect of
replication j, bjk is the effect of block k in the replication
j, ti is the effect of treatment i, and eRCs are random
errors with an auto-covariance structure along/across
rows/column. Let N be the number of the experimental
units. The N errors presented as the vector e = (eRC)
may have the heterogeneous variances,
2
l
σ
l (l = 1, ...,
K), where K is the number of clusters of the N
experimental units. The diagonal matrix of variances for
the N errors can be written as s2d using the associated
2
l
σ
for a given plot. Further, suppose that the model
selection using Singh et al. (2003) resulted in an auto-
correlated errors across columns with correlations
expressed as corr(eRC, eR¢C¢) = f|CC¢|, then the above
model can more compactly be written as:
y= Xa + Zb + e
where X is the design matrix associated with factors
with effects assumed as fixed, a say, consisting of
genotypes effects tis and m, and Z is the design matrix
with factors with effects assumed random, b say,
consisting of replication effects, pjs etc. The variance-
covariance of the plot-error vector e can be written as
R= sd(corr(eRC, eR¢C¢))sd
The computation of the estimates of the parameters
associated with the fixed effects a, variance
components of the factors in b, correlation parameter
f are given in the various computing software such as
GENSTAT and SAS. Generally, the matrix R has a
structure of correlations and variances. In the two
datasets, while the correlations between the plot errors
eRC have a spatial structure, the (plot) error variances
do not. For example, neither there is an assumed
structure in terms of
2
l
σ
over the positions of the units
nor the variances are totally unstructured as there are
K N distinct variances. Let REP, GENO, ROWS and
COLS stand for the replication, genotype (treatment),
rows and columns factors and YIELD for the response
variate. Let HGROUP stand for the factor with the K
levels representing heterogeneous variances units. The
key Genstat directives to compute the variances,
autocorrelation and standard errors are:
Vcomponents[Fixed=GENO]REP+HGROUP.ROWS.COLS;
constraints=positive
VStructure[Term=HGROUP.ROWS.COLS]diag, AR;
Factor=HGROUP, COLS
Reml[prin=m,c,w,mean,d; workspace=50;
maxcycle=150;pse=d] YIELD
The above codes produce a common
2
e
σ
(error
variance) and other variances as ratios dl or
2
l
σ
where
318 M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321
2
l
σ
= (dl + 1)
2
e
σ
signifies the error variance
corresponding to the lth cluster, which varies with the
level of the grouping factor HGROUP, l = 1, ..., K.
5. RESULTS & DISCUSSION
Following the test by Chaubey (1981), Table 1
gives estimates of error variances based on ordered
absolute residuals for assumed two and three groups.
As can be noted from the computed F-values for all the
three data sets, there is an indication of the
heterogeneity in the error variances. This supports our
venture to explore the heterogeneous clusters of units.
Table 2 gives the information on distribution of
experimental units with homogeneous error variances
obtained using a K-cluster means and the AICD (AIC
values expressed as deviance, see Singh et al. 2003).
It may be seen that the number of heterogenous groups
inferred at this step are 3 for each of the two trials.
Table 3 provides the estimates of the variance
components at Step-1 (i.e. when selected using AIC
criterion) and Step-2 (i.e. closest groups were fused and
tested for the change in deviance values against
chi-square). For Trial 1, fusion of two closest clusters
Table 3. Number of experimental units and estimates of
variance components for various groups when the
heterogeneous groups were selected using AIC
criterion or fused using the change in the deviance,
and corresponding deviance from the fitted model
for seed yield data from the two trials conducted
at Breda, Syria.
(a) Trial 1: Preliminary yield trial, 2005
(i) Overall grouping: Deviance = 33.76, DF = 68
Group (l) No. of Units (Nl)
2
ˆ
σ
l 60 0.15
2 13 2.12
3 27 0.85
(ii) Groups 1 and 3 merged: Deviance = 54.3, DF = 69
Group (l) No. of Units (Nl)
2
ˆ
σ
l 87 0.371
2 13 2.587
Change in deviance = 20.54, DF = 1, P-value < 0.001
(b) Trial 2: Advanced yield trial, 2003
(i) Overall grouping: Deviance = 5.28, DF = 54
Group (l) No. of Units (Nl)
2
ˆ
σ
1 10 1.515
2 19 0.973
3 61 0.0998
(ii) Groups 1 and 2 merged: Deviance = 3.88, DF = 55
Group (l) No. of Units (Nl)
2
ˆ
σ
1 29 1.0642
2 61 0.0961
Change in deviance = 1.4, DF = 1, P-value = 1.00
(iii) All the groups merged: Deviance = 43.54, DF= 57
Group (l) No. of Units (Nl)
2
ˆ
σ
1 90 0.501
Change in deviance = 39.66, DF = 2, P-value < 0.001
Note:DF = degrees of freedom associated with the deviance
(residuals).
Table 2. Clusters of experimental units with heterogeneous
error variances on seed yield data in the two trials
KCluster sizes Criterion Change in qDeviance AICD
value criterion
value
(a) Trial 1: Preliminary yield trial, 2005
1 100 3 71 77
2 78, 22 10.23 5 48.06 58.06
3 60, 13, 27 5.75 4.48 6 33.76 45.71
45, 49, 30, 16 2.85 2.90 7 32.02 46.02
(b) Trial 2: Advanced yield trial, 2003
1 90 3 43.54 49.54
2 12, 78 7.64 5 17.33 27.33
3 10, 19, 61 4.19 3.45 6 5.82 17.28
42, 8, 19, 61 1.28 -2.91 7 NC
Note: Bold letters indicate that the corresponding clusters
were identified with heterogeneous error variances.
q = number of covariance parameters. AICD = AIC (Akaike
information criterion) expressed in terms of deviance (Singh
et al. 2003).
resulted in significant increase in deviance (P < 0.001),
therefore, the three heterogeneous groups with 60, 13
and 27 units were considered for using the models for
the evaluation of the genotypes. For Trial 2, the three
clusters obtained from Step - 1 were fused into two
clusters with an insignificant increase in the deviance.
When merged again (now into a single group), there
M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321 319
Table 4. Position of experimental units grouped (1-3) according to heterogeneous error variances on the
rectangular layouts for the three trials conducted at Breda, Syria
Trial 1: Seed yield (Preliminary yield trial, 2005)
Using three heterogeneous groups selected on AIC criterion.
Columns
Rows 123456789101112131415
1131113311111121
2113131121132111
3131321311332311
4111133111122331
Columns
Rows 16 17 18 19 20 21 22 23 24 25
13113213133
21112112111
31331311131
43123211112
Trial 2: Seed yield (Advanced yield trial, 2003)
(a) Using three heterogeneous groups selected on AIC criterion
Columns
Rows 123456789101112131415
1212333233222133
2333332113323323
3333133323333332
Rows 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1233331121123333
2233332333332323
3333333333312333
(b) Merged to two heterogeneous groups
Columns
Rows 123456789101112131415
1111222122111122
2222221112212212
3222122212222221
Columns
Rows 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1122221111112222
2122221222221212
3222222222211222
320 M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321
was a significant increase in the deviance, implying the
presence of only two heterogenous groups of units.
Further, the spatial distribution of the experimental
plots are exhibited on the layout schema (Table 4) for
the various heterogeneous groups resulted at Step-1
and/or at the final stage of the formation of
heterogeneous clusters. In these two trials, nearly 60%
of the units have lowest level of error variability. The
positions of the units from the other clusters are
reasonably spread throughout the field layout. Using the
chosen combination of autocorrelation (spatial errors)
and heterogeneous variances for the errors in the model,
the estimates of various variances and autocorrelation
parameters are given in Table 5: Table 5 also exhibits
the P-value for equality of the genotypes effects based
on the Wald statistic and the average variance of
estimated difference of pair-wise genotypes effects. The
efficiency (%) values are given in comparison with the
Table 5. Estimates of variance components, Wald test statistics value and significance level, and average estimated variance
error of pair-wise genotypes comparison and efficiency of the design-analysis models for seed yield data from the two trials
conducted at Breda, Syria
(a) Trial 1: Preliminary yield trial, 2005
Variance components Estimates WStat DF P-value Av. var. Eff(%)
RCB, homogeneous
2
e
σ
= 0.630 ±0.10 36 36.07 24 0.094 0.3194 100
Homogeneous
2
e
σ
= 0.64 ± 0.111 40.36 24 0.055 0.2953 108
f= 0.27 ± 0.128
Heterogeneous
2
e
σ
= 0.15 ± 0.035 74.88 24 0:001 0.1495 214
d1= 0.00 ± 0.00
d2= 12.93 ± 6.78
d3= 4.62 ± 2.29
f= 0.56 ± 0.246
(b) Trial 2: Advanced yield trial, 2003
Variance components Estimates WStat DF P-value Av. var. Eff(%)
RCB, homogeneous
2
e
σ
= 0.470 ± 0.0871 43.41 29 0.096 0.3132 100
Homogeneous
2
e
σ
= 0.501 ± 0.109 60.84 29 0.014 0.2346 134
f= 0.46 ± 0.116
Heterogeneous
2
e
σ
= 0.0822 ± 0.0304 194.32 29 < :001 0.0933 336
d1= 12.27 ± 6.87
d2= 0.198 ± 0.4151
f= 0.80 ± 0.121
Note: WStat = Wald statistic for testing equality of genotype effects (assumed fixed). DF =Degrees of freedom of the genotype.
Av. var. = Average variance of difference of estimated effects between a pair of genotypes. AIC= Akaike information criterion.
P- value= P- value based on the Wald test. Eff(%)= Percent efficiency over RCB (randomized complete block design) model.
standard randomized complete block design model. It
may be noted that the best models, without
heterogeneity components in, fail to detect significant
statistical differences in genotypes effects in Trials 1
(P-values 0.055) while the P-value is 0.014 for
Trial 2. An introduction of the heterogeneous error
variances clearly shows an enhanced significance level
(P-value £ 0.001) for genotype main-effects in both the
cases. For the spatial models, reductions of 49% and
60% in the average variance of the difference of the
genotypes effects for Trials 1 and 2 respectively can be
considered substantial. While the spatial models for
Trials 1 and 2 are more efficient than RCB model even
without heterogeneity of error variances, incorporation
of heterogeneity of error variances in the model has
drastically improved the efficiency of the pairwise
comparisons of the genotypes. The efficiencies were
found as 214% and 336% for the Trials 1 and 2,
respectively.
M. Singh et al. / Journal of the Indian Society of Agricultural Statistics 64(2) 2010 313-321 321
The evaluation of these trials support the need for
examining the presence of heterogeneous errors in the
experimental units in field trials, and shows clearly that
considerable improvement can be made by their
identification and accounting at the analysis stage. Such
an approach actually can easily be incorporated in most
of the data analysis situations involving spatial, time
or even unstructured experimental units, and, therefore,
would enhance the efficiency of the associated plant
breeding process.
ACKNOWLEDGEMENT
The authors are thankful to Professor Sudhir Gupta for
his efficient handling of the paper and to a referee for his
critical reading and constructive comments. The research of
the second author was partially supported from the authors
Discovery Grant from the Natural Sciences and Engineering
Research Council of Canada.
REFERENCES
Chaubey, Y.P. (1981). Testing the equality of variances of two
linear models. Canad. J. Statist., 9, 119-127.
Cochran, W.G. and Cox, G.M. (1992). Experimental Designs.
Wiley Classic Edition (2nd ed.), John Wiley and Sons,
New York.
Cox, D.R. and Reid, N. (2000). The Theory of the Design of
Experiments. Boca Raton, CRC Press, Florida.
Cullis, B.R. and Gleeson, A.C. (1991). Spatial analysis of
field experiments an extension to two dimensions.
Biometrics, 47, 1449-1460.
Fisher, R.A. (1990). Statistical Methods, Experimental Design
and Scientific Inference. J.H. Bennett (Ed.), Oxford
University Press, Oxford.
Gilmour, A.R., Cullis, B.R. and Verbyla, A.P. (1997).
Accounting for natural and extraneous variation in the
analysis of field experiments. J. Ag. Biol. Environ.
Statist., 2, 269-293.
Gomez, K.A. and Gomez, A.A. (1984). Statistical Procedures
for Agricultural Research, (2nd ed). John Wiley and
Sons, New York.
Grondona, M.O., Crossa, J., Fox, P.N. and Pfeiffer, W.H.
(1996). Analysis of variety yield trials using two-
dimensional separable ARIMA processes. Biometrics,
52, 763-770.
Hinkelmann, K. and Kempthorne, O. (2007). Design and
analysis of experiments. In: Introduction to Experimental
Design. John Wiley and Sons, New York.
Malhotra, R.S., Singh, M. and Erskine, W. (2004).
Application of spatial variability models in enhancing
precision and efficiency of selection in chickpea trials.
J. Ind. Soc. Agril. Statist., 57, 71-83.
Payne, R.W., Harding, S.A., Murray, D.A., Soutar, D.M.,
Baird, D.B., Glaser, A.I., Channing, I.C., Welham, S.J.,
Gilmour, A.R., Thompson, R. and Webster, R. (2009).
The Guide to GenStat Release 12, Part 2: Statistics. VSN
International, Hemel Hempstead.
Singh, M., Malhotra, R.S., Ceccarelli, S., Sarker, A., Grando,
S. and Erskine, W. (2003). Spatial variability models to
improve dry land field trials. J. Exp. Agric., 39, 151-160.
Sarker, A., Singh, M. and Erskine, W. (2001). Efficiency of
spatial methods in yield trials in lentils (lens culinaris ssp.
Culinaris). J. Agric. Sci., 137, 427-438.
Wilkinson, G.N., Eckert, S.R, Aancock, T.W. and Mayo, O.
(1983). Nearest-neighbour (NN) analysis of field
experiments (with Discussion). J. Roy. Statist. Soc., B45,
151-211.
Levene, H. (1960). Robust tests for equality of variances. In
Contributions to Probability and Statistics: Essays in
Honor of Harold Hotelling, I. Olkin, et al. ( eds.)
Stanford University Press, CA, pp. 278-292.
Everitt, B.S., Landau, S. and Leese, M. (2001). Cluster
Analysis. Edward Arnold, London.
Wolfinger, R.D. (1996). Heterogeneous variance-covariance
structures for repeated measures. J. Ag. Biol. Environ.
Statist., 1, 205-230.
... These analysis approaches were based on the assumption of homogeneous error variances. Singh et al. [14] argued that the probably more common situation is that field heterogeneity is present amongst plot errors with different (heterogeneous) error variances. These errors need not have any obvious structured spatial pattern. ...
... Here, we first describe the method used by Singh et al. [14], and then illustrate its application with an example of a Lentil-YT with 16 genotypes. ...
... An examination of the graphs (or the table of deviance) established that in all four cases, there was a presence of at least two heterogeneous variances of plot error, as detected by a statistically significant difference in the deviances, when compared with the homogeneous variance model (Table 5). Singh et al. [14] provided a systematic approach for exploring the number of heterogeneous variances. The effect of accounting for the heterogeneity of variances has demonstrated a substantial gain in efficiency of the experimental design, and the enhanced analysis method used in all four trials led to reduced SED values, and hence increased the power of genotypic discrimination. ...
Article
Full-text available
Application of an experimental design based on blocking of homogeneous experimental units is an objective approach to reduce experimental error. Often, the experimental design and analysis efforts are considered to be executed satisfactory as long as the coefficient of variation is estimated to be below 10 %, say for yield, as a generally accepted guide for a well-run trial. Most of the statistical analyses of data are based on the assumption of a homogeneous variance of plot errors. In field plot experimentation, the question of heterogeneity of error variances has been addressed here. This study introspects a set of four such supposedly well-run trials in lentil and chickpea conducted in lattice designs. The presence of, in fact, heterogeneous error variances exhibits a distribution of coefficient of variation as opposed to a single index to measure heterogeneity of a field. This further suggests that it is more realistic to view the distribution of the heritability of a trait and its genetic advance due to selection, when assessing them, even in a single field. Objectives of the study were to examine the possibility of heterogeneous error variances, identify the associated experimental units and estimate the predicted means of the genotypes and gain due to selection. We show that accounting heterogeneous variances allows a significant increase in the efficiency of genotypic comparisons and the power of genotypic discrimination, higher heritability and genetic advance. Thus, it is likely to help shorten the breeding cycle for an expected genetic gain, and is in principle relevant to all experiments that use replications, involving crops or not.
... Experimental errors have been modeled with correlated error structures (Cullis and Gleeson, 1991;Singh et al., 2003) and heterogeneous error variances (Singh et al., 2010). The clusters of experimental units with error variances varying with the clusters ...
... were obtained by k-means clustering of squared residuals (Singh et al., 2010(Singh et al., , 2012. This method of cluster formation normally leads to a high probability of Type I error where one detects heterogeneous variances even in the absence of heterogeneity of error variances between the clusters. ...
... The method used by Singh et al. (2010), which builds on a spatial model, is based on nonhierarchical clustering schemes, in which the number of groups of heterogeneous variances has to be specified a priori. This method is likely to result in a large Type I error, that is, with a probability higher than a chosen level of significance; the specified number of groups is likely to be heterogeneous even if all the plots have the same error variance. ...
Article
Full-text available
Appropriate experimental designs provide an efficient estimate of cultivar performance by allowing better control of experimental error. The standard approach assumes homogeneity of experimental error variances across plots. However, a given trial may involve genotypes having different scales of response and/or associated with plots with different error variances. This study on cereal and legume trials detected the heterogeneity of the error variances associated with genotypes or groups of genotypes. A hierarchical clustering method, using absolute plot residuals, was used to group the genotypes. The heterogeneity of the error variances was determined using the difference in the deviances resulting from models based on an assumed homogeneity and heterogeneity of the error variances and on an adjusted level of significance for the chosen threshold for cluster formation. The efficiency of pairwise genotype comparisons, based on heterogeneous error variances, ranged from 105 to 116% for the cereals and from 124 to 254% for the legumes. Variations in the estimates of heritability and genetic gain resulting from selection were substantial. Accounting for the heterogeneous error variances, on average, led to higher estimates of heritability and genetic gain in each of the crop trials examined. This study found that heterogeneous variance, varying with the groups of genotypes, was frequent. The approach presented here is recommended for the analysis of cultivar trials.
... In all these models, the plot error variances were assumed constant. Singh et al. (2010) examined heterogeneity among the error variances in addition to the spatial error models discussed by Gilmour et al. (1997) and Singh et al. (2003). Competition effects (Durban et al., 2001) and spatial variability and within-row interplot competition models (Stringer et al., 2011) have been introduced for field trials. ...
... In field trials, experimental error variability is controlled in a number of ways, including by placement of blocks, use of possible covariates and models describing spatial variability (Fisher, 1935;Papadakis, 1937;Piepho et al., 2008;Singh et al., 2010). The effectiveness of a component of error variability is specific to the field where the experiment has been laid out. ...
Article
Full-text available
Lentil (Lens culinaris Medikus subsp. culinaris) is an important staple pulse and rich source of protein, especially to the economically resource-poor consumers of the developing world. Experimental and analytic technology and statistical tools are needed to enhance lentil breeding progress. One of the concerns of field experimentation is to design experiments in suitable block designs and model the data to account for any left-over trend in the field layout and for correlations in the plot errors. Elite breeding lines of lentil developed through conventional breeding methods at ICARDA were evaluated in three contrasting environments in northern Syria and Lebanon during 1999–2005. This study examines the data on seed and straw yields from 226 trials conducted in randomized complete block (RCB) and in square lattice designs. Suitable models incorporating blocking structures, linear trends and spatially correlated plot-errors were fitted to the individual datasets. The results indicated that the spatial analysis model, which accounts for the spatial pattern of the field, was better than the commonly used RCB design model. The spatial analyses gave substantial increases in precision of predicted means for the genotypes. An average efficiency of pairwise genotype means comparison over RCB was 141% for seed yield and 158% for straw yield from the trials conducted in incomplete blocks and where found superior to RCB. It also enhanced estimates of broad sense heritability on mean-basis, with an average of 72% for seed and 70% for straw yield under the superior models, compared to 62 and 55% for RCB model, respectively. The percentage genetic gain due to selection at 10% intensity was 26% for seed and 20% for straw yield based on those models, which were 2–3% higher than those from the RCB model. In general, it is recommended to continue the use of incomplete block designs for variety trials in lentils and use the most suitable spatial pattern for statistical analysis to assist field crop breeders to enhance precision in selection of desirable genotypes. These results are consistent with findings of a number of other variety trials in lentil.
... It also suggested the probability of existing spatial heterogeneity in an experiment seems to be higher in RCBD as compared with the incomplete design (simple lattice). Similar result was reported by Singh et al. (2010) where the incomplete design contributed to more error reduction, the spatial correlated plot error was best modeled using autoregressive scheme and showed good genotype mean adjustments in lentil than the traditional linear model. ...
... The linear models are as described in standard approaches in textbooks (Cochran and Cox, 1957;Hinkelmann and Kempthorne, 2005). Singh et al. (2010Singh et al. ( , 2012Singh et al. ( , 2013 modeled the heterogeneity of error variances in field situations. The error variances are usually assumed constant, while in reality they may vary over the layout. ...
Article
Full-text available
Block designs are normally used in evaluation of crop varieties. The responses or yield data arising from designed trials in a crop variety improvement program are generally analyzed using linear mixed models under the frequentist paradigm. Such analysis ignores information on the genotypic parameters available from previous similar trials. Another approach with a relatively wider inferential framework is Bayesian, which integrates the prior information with the likelihood of current data. While the Bayesian approach has been implemented in numerous situations, stepwise presentation of its application in routine crop variety trials is not available. Illustrated with a dataset from a resolvable incomplete block design, this study provides a working tool for Bayesian analysis based on priors available from a series of crop variety trials. The posterior estimates of predicted values of mean of genotypes and precision, coefficient of variation, heritability and genetic gain due to selection were obtained. The a posteriori mean of experimental error variance, coefficient of variation and genotypic variance were lower for the Bayesian than the frequentist approach. The precision of a posteriori means was higher than that of predicted means under the frequentist approach. Accounting for incomplete blocks, rather than ignoring them, using a Bayesian approach showed a large reduction in estimates of error variance components, and large increases in heritability and genetic gain. The current a posteriori distributions also serve as updated priors for future analysis. The step-by-step procedure presented here is recommended for routine analysis of variety trials.
... Such heterogeneity in general is less likely to be modeled by transformation or the use of a link function in generalized linear/non-linear model. Some attempts have been made at ICARDA to account for unstructured heterogeneity between clusters of plots for uncorrelated/ correlated plots errors (Singh et al. 2010Singh et al. , 2012) and between clusters of genotypes (Singh et al. 2013). These approaches include a level of similarity to form the clusters and a level of significance to control Type I error on heterogeneity of variances. ...
Article
Full-text available
Food security is essential to maintain the human population and the environment. The strategies for food security must take into consideration the human population dynamics and dynamics of natural resources, plant and animal populations on which human survival depends. It is essential to look into effective options for search and conservation of crop and animal germplasm, along with necessary genetic modifications to develop new seeds and suitable embedding of the new seeds in cropping systems for high and sustainable productivity. At almost every significant stage of technological development, statistical designs to gather valid and efficient evidence (data) and methods of assessment of the evidence are needed to form decisions on the technologies. This presentation is restricted to the crop and cropping system aspects of experiments. The statistical issues related to the following would be discussed: (1) search and maintenance of diverse crop germplasm, (2) evaluation of crop germplasm, (3) cropping systems related to conservation agriculture and crop rotations, and (4) need of crop actuary. In the context of germplasm conservation two aspects of maintaining the germplasm, mini-core selection and focused identification of germplasm strategy will be discussed. The issues related to field experimentation for evaluating moderate to large number of genotypes and methods of statistical data analysis with a view to exploit genotype – environment interaction will be presented. The challenges of design and analysis of cropping systems and evaluating the data in terms of productivity and sustainability indicators will be discussed. The role of a crop actuary to mitigate riskiness due to the effect of weather uncertainty on crop production and productivity will also be discussed in brief.
Article
Full-text available
Spatial variability in field trials is a reality. A proportion of this is accounted for as inter-block variability by using block (complete or incomplete) designs. A large amount of spatial variability still remains unaccounted for, however, and this may lead to erroneous conclusions. To capture this inexplicable variation (which is mainly due to intra-block variation), yield data from a series of variety yield trials, using cereals and legumes, were analysed using various spatial models. The most suitable of these, selected on the basis of the Akaike Information Criterion, were used to assess the relative performance of genotypes. Although incomplete-block designs have been found to be effective in variety trials, spatial models have added considerable value to trials with legumes and cereals. The ‘best’ spatial models gave efficiency values of over 330% in winter-sown chickpea (Cicer arietinum), 140% in lentil (Lens esculenta), and 150% in barley (Hordeum spp.) trials. Furthermore, the use of these best models resulted in a change in the ranking of genotypes (on the basis of mean yield), which resulted, therefore, in a different set of genotypes being selected for high yield. It is recommended that: (i) incomplete block designs be used in variety trials; (ii) the Akaike Information Criterion be used to select the best spatial model; and (iii) genotypes be selected after the use of this model. The selected model would account most effectively for spatial variability in the field trials, improve selection of the most desirable genotypes and, therefore, improve the efficiency of breeding programmes.
Article
Full-text available
We identify three major components of spatial variation in plot errors from field experiments and extend the two-dimensional spatial procedures of Cullis and Gleeson (1991) to account for them. The components are nonstationary, large-scale (global) variation across the field, stationary variation within the trial (natural variation or local trend), and extraneous variation that is often induced by experimental procedures and is predominantly aligned with rows and columns. We present a strategy for identifying a model for the plot errors that uses a trellis plot of residuals, a perspective plot of the sample variogram and, where possible, likelihood ratio tests to identify which components are present We demonstrate the strategy using two illustrative examples. We conclude that although there is no one model that adequately fits all field experiments, the separable autoregressive model is dominant. However, there is often additional identifiable variation present.
Article
Full-text available
This article provides a unified discussion of a useful collection of heterogeneous covariance structures for repeated-measures data. The collection includes heterogeneous versions of the compound symmetry and first-order autoregressive structures, the Huynh-Feldt structure, the independent-increments structure, correlated random coefficients models, the first-order antedependence model, and a simplified factor-analytic construction. These structures significantly broaden the arsenal of covariance models available to statistical researchers for accounting for and explaining variability. The structures have a simple and interpretable parameterization which is well-suited for likelihood-based estimation, testing, and selection, and they are easily fit with commercial software. Two examples are used to illustrate the following advantages of considering this class of heterogeneous covariance structures: (1) Parameter interpretability is retained by avoiding transformations. (2) The choice between the traditional univariate and multivariate approaches, involving "parsimony means power," is generalized, thus enabling more accurate inferences on mean-model parameters. (3) Alternatives to and connections with the "empirical sandwich" estimator are provided. (4) Scientific discovery is enhanced.
Article
The one-dimensional spatial analysis procedure proposed by Gleeson and Cullis (1987, Biometrics 43, 277-288) is extended to two dimensions using the subclass of separable lattice processes to model the errors. Residual maximum likelihood estimation of the models is described and diagnostics for testing model adequacy are derived. Results from the analysis of 24 sets of uniformity data indicate the frequent need for a two-dimensional analysis even when the plot shape is highly rectangular. These results also indicate the potential gain from using a two-dimensional spatial analysis rather than a row + column analysis. An example is presented of the analysis of a field experiment on tobacco.