ArticlePDF Available

Comparing Simulated and Measured Values Using Mean Squared Deviation and Its Components

Authors:

Abstract and Figures

When output (x) of a mechanistic model is compared with measurement (y), it is common practice to calculate the correlation coefficient between x and y, and to regress y on x. There are, however, problems in this approach. The assumption of the regression, that y is linearly related to x, is not guaranteed and is unnecessary for the x-y comparison. The correlation and regression coefficients are not explicitly related to other commonly used statistics [e.g., root mean squared deviation (RMSD)]. We present an approach based on the mean squared deviation (MSD = RMSD2) and show that it is better suited to the x-y comparison than regression. Mean squared deviation is the sum of three components: squared bias (SB), squared difference between standard deviations (SDSD), and lack of correlation weighted by the standard deviations (LCS), To show examples, the MSD-based analysis was applied to simulation vs. measurement comparisons in literature, and the results were compared with those from regression analysis, The analysis of MSD clearly identified the simulation vs. measurement contrasts with larger deviation than others; the correlation-regression approach tended to focus on the contrasts with lower correlation and regression line far front the equality line. It was also shown that results of the MSD-based analysis mere easier to interpret than those of regression analysis. This is because the three MSD components are simply additive and all constituents of the MSD components are explicit. This approach will be useful to quantify the deviation of calculated values obtained with this model from measurements.
Content may be subject to copyright.
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
345
lization in bean plants as influenced by phosphorus nutrition. Crop Ulrich, A., and F.J. Hills. 1990. Plant analysis as an aid in fertilizing
sugar beets. p. 430–448. In R.L. Westerman (ed.) Soil testing andSci. 36:929–935.
Steel, R.G.D., and J.H. Torrie. 1960. Principles and procedures of plant analysis. ASA, CSSA, and SSSA, Madison, WI.
Ware, G.O., K. Ohki, and L.C. Moon. 1982. The Mitscherlich plantstatistics. McGraw-Hill, New York.
Ulrich, A. 1952. Physiological basis for assessing the nutritional re- growth model for determining critical nutrient deficiency levels.
Agron. J. 74:88–91.quirements of plants. Annu. Rev. Plant Physiol. 3:207–228.
Ulrich, A., and W.L. Berry. 1961. Critical phosphorus levels for lima Wiggins, I. 1980. Flora of Baja, California. Stanford Univ., Palo
Alto, CA.bean growth. Plant Physiol. 36:626–632.
MODELING
Comparing Simulated and Measured Values Using Mean Squared
Deviation and its Components
Kazuhiko Kobayashi* and Moin Us Salam
ABSTRACT
between the calculated and measured values, and re-
gression of measured on calculated values. For example,
When output (x ) of a mechanistic model is compared with measure-
when Kiniry et al. (1997) compared measured and simu-
ment (y ), it is common practice to calculate the correlation coefficient
between x and y, and to regress y on x. There are, however, problems
lated yields of maize for 10 yr from 1983 to 1992 at nine
in this approach. The assumption of the regression, that y is linearly
locations in the USA, measured yields were plotted
related to x, is not guaranteed and is unnecessary for the xy compari-
against simulated ones, the correlation coefficient was
son. The correlation and regression coefficients are not explicitly
calculated, and regression lines were fitted. This has
related to other commonly used statistics [e.g., root mean squared
been common practice in the comparison between cal-
deviation (RMSD)]. We present an approach based on the mean
culated and measured values (e.g., Teo et al., 1992;
squared deviation (MSD RMSD
2
) and show that it is better suited
Chapman et al., 1993; and Retta et al., 1996). In this
to the xy comparison than regression. Mean squared deviation is
approach, the correlation is a criterion of the predictive
the sum of three components: squared bias (SB), squared difference
accuracy of the model along with the requirements for
between standard deviations (SDSD), and lack of correlation weighted
the regression line (i.e., the intercept is not significantly
by the standard deviations (LCS). To show examples, the MSD-based
analysis was applied to simulation vs. measurement comparisons in
different from zero and the slope is not significantly
literature, and the results were compared with those from regression
different from unity). Statistical testing of these require-
analysis. The analysis of MSD clearly identified the simulation vs.
ments for the regression line has been established
measurement contrasts with larger deviation than others; the correla-
(Mayer et al., 1994).
tion–regression approach tended to focus on the contrasts with lower
The correlation–regression approach is very common
correlation and regression line far from the equality line. It was also
in fitting an empirical model to data obtained from
shown that results of the MSD-based analysis were easier to interpret
experiments or surveys. The model parameters are ad-
than those of regression analysis. This is because the three MSD
justed to give the best fit of the empirical model to
components are simply additive and all constituents of the MSD
measurement. Software packages are readily available
components are explicit. This approach will be useful to quantify the
to plot the data and fit the model. This approach is
deviation of calculated values obtained with this model from mea-
surements.
hence familiar and convenient for most scientists, and
may be a reason why this approach is adopted for the
comparison between calculated values and measure-
ment when the model is mechanistic rather than empiri-
cal. As shown below, however, regression is not ideal
I
n simulating crop growth with a process-based
for this type of comparison, where comparison between
model, comparison between the model output and
the calculated values and measurements rather than
the measurement is an important activity to test the
fitting of the model to the measurement is of concern.
model accuracy and locate room for further improve-
Henceforth, simulated value for a growth trait is de-
ments. The comparison is often based on correlation
noted as x, and measured value is denoted as y.Itis
K. Kobayashi, National Institute of Agro-Environmental Sciences,
3-1-1 Kannondai, Tsukuba, Ibaraki 305-8604, Japan; M.U. Salam,
Abbreviations: CV, coefficient of variation; KS, experiment location
Rice FACE Project, Japan Science and Technology Corp.–National
in Kansas; LCS, lack of correlation weighted by the standard devia-
Institute of Agro-Environmental Sciences, 3-1-1 Kannondai, Tsukuba,
tions; MD, mean of the deviations; MSD, mean squared deviation;
Ibaraki 305-8604, Japan. Research partly supported by Core Research
MSV, mean squared variation; NE, experiment location in Nebraska;
for Evolutional Science and Technology (CREST) of Japan Science
NY, experiment location in NY; RMSD, root mean squared deviation;
and Technology Corp. (JST). Received 25 Jan. 1999. *Corresponding
RMSE, root mean squared error; SB, squared bias; SD
m
, standard
author (clasman@niaes.affrc.go.jp).
deviation of the measurement; SD
s
, standard deviation of the simula-
tion; SDSD, squared difference between standard deviations.Published in Agron. J. 92:345–352 (2000).
346
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
assumed that y is the sum of the true mean () and summarized with statistics of the overall deviation. The
most commonly used statistics among those based onthe random error (ε) associated with the measurement,
namely the deviation is the RMSD (e.g., Jamieson et al.,
1998), namely
y l ε [1]
RMSD
1
n
n
i
1
(x
i
y
i
)
2
[11]
In regressing y on x, a linear relationship is assumed
between x and , namely
Another commonly used criterion is the MD, namely
l ax b [2]
MD
1
n
n
i
1
(x
i
y
i
) [12]
where a is the slope and b is the y-intercept of the
regression line. Then, the null hypothesis
H
0
: a 1 and b 0 [3]
In literature, RMSD is often referred to as root mean
squared error (RMSE) (Retta et al., 1996), and MD is
is tested against the alternative hypothesis (Mayer et
often called bias (Retta et al., 1996; Jamieson et al.,
al., 1994)
1998). Of these two statistics, RMSD represents the
H
1
: a 1 and/or b 0 [4]
mean distance between simulation and measurement;
MD is the difference between the means of simulation
These hypotheses can be translated into the relation-
and measurement. Root mean squared deviation and
ships between x (simulated value) and (true mean
MD thus represent different aspects of the overall devia-
value)
tion, but the relationship between the two has not been
H
0
: l x [5]
well defined.
In literature, these deviation-based statistics are often
H
1
: l ax b and (a 1 and/or b 0) [6]
used in conjunction with correlation and regression co-
By contrast, in the direct comparison between x and y,
efficients (Addiscott and Whitmore, 1987; Retta et al.,
the null (H
0
) and alternative (H
1
) hypotheses are
1996; Kiniry et al., 1997; Jamieson et al., 1998). Although
these different statistics may represent somewhat differ-
H
0
: l x [7]
ent aspects of the model–measurement discrepancy, it
H
1
: l x [8]
is not clear how the different statistics relate to each
other, and if these statistics cover all aspects of the
The difference between the regression and the direct
discrepancy sufficiently. It is also noteworthy that, as
comparison is in the alternative hypotheses (Eq. [6] vs.
shown before, the deviation-based statistics (e.g.,
[8]). The regression analysis assumes the linear relation-
RMSD) and the correlation-based statistics (e.g., the
ship between x and under the alternative as well as
correlation coefficient) are not really consistent with
the null hypotheses, but this assumption is not guaran-
each other in their assumptions.
teed and should not be taken for granted. If each mea-
Our objective is to present a framework for the simu-
surement is based on replicated measurements, the vari-
lation vs. measurement comparison. The framework is
ance of the error term (ε of Eq. [1]) can be estimated
based on the deviation (Eq. [9] and [10]), yet includes
independently from the assumption of the linear rela-
the correlation coefficient as a constituent.
tionship (Draper and Smith, 1981, p. 33–38). The error
variance is then used to test the assumption. If the linear
DERIVATION OF THE METHOD
assumption (Eq. [2]) is rejected, the linear regression is
inadequate. A curvilinear relationship may be sought,
The difference between the simulation and the mea-
but it is possible that no continuous function fits the
surement is calculated with the MSD as
relationship between x and y. Note, however, that the
user’s concern lies more in the comparison between x
MSD
1
n
n
i
1
(x
i
y
i
)
2
[13]
and y than in the functional relationship between the
two. The direct comparison between x and y can always
Mean squared deviation is the square of RMSD (Eq.
be made by testing the equality hypothesis (Eq. [7])
[11]); i.e., MSD RMSD
2
. The lower the value of MSD,
against the nonequality hypothesis (Eq. [8]).
the closer the simulation is to the measurement. The
A more relevant criterion for the direct comparison
MSD can be partitioned into two components, namely
than regression is the deviation (d) of the model output
(x) from the measurement (y), namely
MSD (x
¯
y
¯
)
2
1
n
n
i
1
[(x
i
x
¯
) (y
i
y
¯
)]
2
[14]
d x y [9]
When the comparison is made for n measurements, d
where x
¯
and y
¯
are the means of x
i
and y
i
(i 1, 2...n),
can be computed for each measurement, namely
respectively. The first term of the right side of Eq. [14]
represents the bias of the simulation from the measure-
d
i
x
i
y
i
, for i 1, 2 ... n [10]
ment and is denoted as SB, namely
where x
i
and y
i
are the simulated and measured values,
respectively, for the i-th data. The n deviations can be SB (x
¯
y
¯
)
2
[15]
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
347
Squared bias is the square of the MD (Eq. [12]); i.e., and the other components may be more influential in
determining MSD than r. For example, when the SB isSB MD
2
.
The second term of the right side of Eq. [14] is the the major component of MSD, maximizing r does not
much improve the model accuracy.difference between the simulation and the measurement
with respect to the deviation from the means (i.e., x
i
x
¯
In Eq. [24], it should be noted that SDSD (Eq. [21])
and y
i
y
¯
) and is denoted as mean squared variation
and LCS (Eq. [22]) are not entirely independent. They
(MSV), namely
share the same constituents, SD
s
(Eq. [17]) and SD
m
(Eq. [18]). Hence a bigger SD
s
would increase both
SDSD and LCS, if SD
s
SD
m
. Through some re-
MSV
1
n
n
i
1
[(x
i
x
¯
) (y
i
y
¯
)]
2
[16]
arrangement, the relative sizes of SDSD and LCS can
be evaluated (see Appendix A for details). Across a
A bigger MSV indicates that the model failed to simu-
major portion of the possible range of r and the ratio
late the variability of the measurement around the
()ofSD
s
to SD
m
, LCS contributes more to MSD than
mean. Note that these two components, SB and MSV,
SDSD does, although there are some combinations of
are orthogonal and can be addressed separately.
r and that make SDSD greater than LCS (see Appen-
Mean squared variation can be further partitioned
dix A).
into two components as shown below. For the parti-
The above components of MSD can be calculated
tioning, standard deviation of the simulation is denoted
from the coefficients of regression. As in Eq. [2], a is
as SD
s
, that of the measurement is denoted as SD
m
,
the slope of the regression line and b is the y-intercept.
and correlation coefficient between the simulation and
Then, the components of MSD are given as
measurement is denoted as r, namely
SB
(1 1/a)x
¯
(b/a)
2
[25]
SD
s
1
n
n
i
1
(x
i
x
¯
)
2
[17]
SDSD
1 (r/a)
2
SD
2
m
[26]
LCS 2(r/a)(1 r)SD
2
m
[27]
SD
m
1
n
n
i
1
(y
i
y
¯
)
2
[18]
In the special case when a 1 and b 0,
SB 0 [28]
r
1
n
n
i
1
(x
i
x
¯
)(y
i
y
¯
)
/(SD
m
SD
s
) [19]
SDSD (1 r)
2
SD
2
m
[29]
After some rearrangement, MSV in Eq. [16] can be
LCS 2r(1 r)SD
2
m
[30]
rewritten as
MSD (1 r
2
)SD
2
m
[31]
MSV (SD
s
SD
m
)
2
2SD
s
SD
m
(1 r) [20]
Thus, comparison based on correlation becomes equiva-
The first term of the right side of Eq. [20], called SDSD
lent to the comparison based on MSD in this special
here, is the difference in the magnitude of fluctuation
case, as long as the comparison is made within the same
between the simulation and measurement, namely
data set (i.e., fixed SD
m
).
SDSD (SD
s
SD
m
)
2
[21]
Examples
A larger SDSD indicates that the model failed to simu-
late the magnitude of fluctuation among the n measure-
The MSD-based approach presented above is here
ments. The second term of the right side of Eq. [20] is
applied to published results of comparison between
essentially the lack of positive correlation weighted by
model simulation and measurement. Note that our in-
the standard deviations, and is denoted here as LCS,
tention is not to reanalyze the results against the original
namely
work, but to show examples of this approach compared
with the correlation–regression approach.
LCS 2SD
s
SD
m
(1 r) [22]
A bigger LCS means that the model failed to simulate
Example 1
the pattern of the fluctuation across the n mea-
surements.
The results of Kiniry et al. (1997), as mentioned be-
With all the above terms combined, the MSV and
fore, are analyzed below for the comparison of simu-
MSD can be written as
lated and measured maize yields across 10 yr within
each of the nine USA locations. The MSD and its com-
MSV SDSD LCS [23]
ponents were calculated from r, x
¯
, y
¯
, and the coefficient
MSD SB SDSD LCS [24]
of variation (CV) published in their paper. Figure 1
illustrates the correlation-based comparison among the
where SB, SDSD, and LCS are given in Eq. [15], [21],
nine locations, and Fig. 2 is the MSD-based comparison.
and [22], respectively. Equation [22] indicates the role
For ease of comparison between the two approaches,
of the correlation coefficient, r, in LCS and hence in
Fig. 1 shows the lack of fit of the regression (1 r
2
)
MSD. With all other components fixed, a larger r would
on the main vertical axis on the left, and the correlation
reduce MSD and hence increase the model accuracy.
However, r is only one of the components of MSD, coefficient (r) on the auxiliary vertical axis on the right.
348
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
Fig. 1. Comparison of the correlation between the simulated and mea-
sured maize yields at nine locations in different states of the USA.
The left vertical axis is the lack of fit of the regression (1 r
2
),
Fig. 2. Comparison of the mean squared deviation (MSD) and its
and the right vertical axis is the correlation coefficient (r ). Results
components, lack of correlation weighted by the standard devia-
from (A ) CERES-Maize model and (B ) ALMANAC model are
tions (LCS), squared difference between standard deviations
shown for locations in the states of Minnesota (MN), New York
(SDSD), and squared bias (SB), for the (A ) CERES-Maize model
(NY), Iowa (IA), Illinois (IL), Nebraska (NE), Missouri (MO),
and (B ) ALMANAC model for nine USA locations. Locations
Kansas (KS), Louisiana (LA), and Texas (TX). Data are from
are in the states of Minnesota (MN), New York (NY), Iowa (IA),
Kiniry et al. (1997).
Illinois (IL), Nebraska (NE), Missouri (MO), Kansas (KS), Louisi-
ana (LA), and Texas (TX). Data are from Kiniry et al. (1997).
In the correlation-based comparison (Fig. 1), the loca-
tion in the state of New York (NY) shows the biggest
7 of Kiniry et al., 1997). It therefore appears that the
difference between the simulation and measurement for
models fit the measurement better for this location com-
both models. The lack of fit of the regression (1 r
2
)
pared with NY. Nevertheless, MSD for KS is larger than
is greatest (i.e., correlation is lowest) for this location.
that for NY, in particular with ALMANAC (Fig. 2).
Slope of the regression lines is significantly 1 for both
This is because both SD
m
(1.86) and SD
s
(1.68 in
models: a 0.22 for CERES-Maize and a 0.09 for
CERES-Maize and 2.34 in ALMANAC) are larger for
ALMANAC (Tables 6 and 7 of Kiniry et al., 1997).
KS than for NY; hence LCS and MSD are also larger,
These statistics show clear differences between the
with MSD components SB and SDSD being almost neg-
model outputs and measurements; hence, causes of the
ligible. Thus, despite the relatively high correlation be-
difference were further investigated (Kiniry et al., 1997).
tween the simulation and measurement, MSD is larger
By contrast, on the basis of MSD, NY is among those
for this location than for NY. The above contrast be-
with smaller MSD (Fig. 2) than other locations. This is
tween KS and NY indicates that the overall deviation
because SD
m
is smallest (0.58) for NY among the nine
is not only dependent on the correlation, but also on
locations, and SD
s
is also small (0.59 in CERES-Maize
the variability of the measurement and simulation.
and 0.93 in ALMANAC). With these small SD
m
and
In most locations, LCS is the major component of
SD
s
values, LCS (Eq. [22]) is small and so is MSD (Fig.
MSD (Fig. 2), but there are exceptions. The location in
2). The other MSD components, SB (Eq. [15]) and
Nebraska (NE) shows clear distinction between the two
SDSD (Eq. [21]), are negligible (Fig. 2). In short, the
models. For CERES-Maize, MSD is smallest for NE
model–measurement deviation for this location is small,
among the nine locations, but second largest for ALMA-
because both measured and simulated yields show only
NAC (Fig. 2). This is partly because of the larger SD
s
small variability across the 10 yr, and the model simu-
(2.27) in ALMANAC than SD
m
(1.14), and hence the
lated the measured yields with little bias. The low corre-
large SDSD (Eq. [21]). This indicates that, for NE, AL-
lation may be a result only of the small year-to-year
MANAC is overly sensitive to the environmental fluctu-
variability rather than a deviation between the model
ations responsible for the year-to-year variability of the
outputs and measurements.
maize yield.
The location in the state of Kansas (KS) is in contrast
Thus, the MSD-based comparison enables the user
with NY. The lack of regression fit is smaller (i.e., the
to locate the simulation vs. measurement contrasts that
correlation coefficient is much larger) (0.825 in CERES-
have larger deviations than others, and to further ana-
Maize and 0.714 in ALMANAC), for KS than for NY
lyze causes of the large deviations. By contrast, the
(Fig. 1). The slope of the regression line is not signifi-
correlation-based approach tends to focus on the low
cantly different from 1, and the y-intercept is not signifi-
cantly different from 0 for either model (Tables 6 and correlation and the deviation of the regression line from
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
349
the equality line rather than the deviation of the model
outputs from the measurement. Kiniry et al. (1997) cal-
culated RMSD (Eq. [11]) and MD (Eq. [12]) in addition
to the correlation and regression coefficients. Although
they used those deviation-based statistics only to say
that the models simulated the measurements reasonably
well, they could have used RMSD to evaluate the devia-
tion of the simulated values from the measurements.
Example 2
Jamieson et al. (1998) compared outputs from five
different models of wheat growth with measurements
under different irrigation regimes in a field experiment
in New Zealand. Their published values of aboveground
biomass and grain weights at the end of the season are
used in the comparison between the simulation and
measurement below. Possibly because of the rounding
Fig. 3. Mean squared deviation (MSD) and its components, lack of
error in the published values, the correlation coefficient
correlation weighted by the standard deviations (LCS), squared
difference between standard deviations (SDSD), and squared bias
r and other statistics calculated here might be somewhat
(SB), in a comparison of five wheat models in simulating biomass
different from those published (Jamieson et al., 1998).
dry weight under different irrigation regimes in a field experiment
Note, however, that our purpose in using these data is
in New Zealand. Data are from Jamieson et al. (1998).
not to reanalyze the published results, but to give an
example of MSD-based analysis compared with regres-
al. (1998) compared MD (Eq. [12]) and RMSD to show
sion analysis of the simulation vs. measurement com-
that the models’ underestimation does account for the
parison.
major part of RMSD. This is in effect the same as the
Aboveground biomass dry weight has a large SB com-
comparison based on MSD and SB, since RMSD
2
ponent (Eq. [15]), which is the major component of
MSD and MD
2
SB as shown earlier.
MSD for all the models except Sirius (Fig. 3). This is
The results for grain yield (Fig. 4) are in sharp contrast
especially true for the model AFRCWHEAT2, which
to the aboveground biomass results. The comparison
shows the largest MSD among the models. The mean
on grain weight indicated that SDSD (Eq. [21]) and LCS
aboveground biomass estimated with this model is only
(Eq. [22]), rather than SB, are the major components of
14.2 t ha
1
; the mean measured value is 20.1 t ha
1
.
MSD. Among the models, SWHEAT shows the largest
For comparison with the above results from MSD
MSD, for which SDSD (Eq. [21]) is the dominant com-
analysis, the measured aboveground biomass weight was
ponent. This is because, in this model, SD
s
is only 0.80
regressed on the simulated weight using the REG proce-
tha
1
, which is SD
m
2.15 t ha
1
. It is obvious that
dure of the SAS/STAT software (SAS Inst., 1988). The
this model is not sensitive enough to the difference
results show that the AFRCWHEAT2 model had the
among the irrigation regimes in the field experiment.
highest correlation coefficient (0.92) among the models,
all of which are highly correlated (r 0.88) with the
measurement. Nevertheless, the null hypothesis
(slope 1 and intercept 0) (Eq. [3]) of the regression
of the measurement on the simulation is rejected for
AFRCWHEAT2 (P 0.0002), CERES-Wheat (P
0.007), SUCROS2 (P 0.003), and SWHEAT (P
0.019). The null hypothesis is maintained only for the
Sirius model (P 0.484). These results of regression
analysis indicated that all the models except Sirius have
difficulty simulating the measurements. The cause of
this discrepancy becomes clearer by testing the slope
and the intercept separately. For the AFRCWHEAT2
model, the intercept is 0(P 0.001) and the slope is
1(P 0.024). The intercept is also 0(P 0.015)
with the SUCROS2 model. The intercept is also some-
what 0(P 0.075) with the CERES-Wheat model.
The 0 intercepts suggest that these models underesti-
Fig. 4. Mean squared deviation (MSD) and its components, lack of
mated the measurement.
correlation weighted by the standard deviations (LCS), squared
The regression results are thus consistent with the
difference between standard deviations (SDSD), and squared bias
results of the MSD-based analysis, but it is not clear
(SB), in a comparison of five wheat models in simulating grain
from the regression how much of this underestimation
yield under different irrigation regimes in a field experiment in
New Zealand. Data are from Jamieson et al. (1998).
contributed to the large RMSD (Eq. [11]). Jamieson et
350
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
By comparison, in the correlation–regression ap-
For SUCROS2, which shows the second largest MSD,
proach, multiple criteria—correlation coefficient, the
by comparison, SB and LCS, rather than SDSD, are
slope and y-intercept of the regression line, and, often,
responsible for the large MSD.
RMSD (Eq. [11]) and MD (Eq. [12])—are presented
The above analysis, based on MSD and its compo-
simultaneously (e.g., Retta et al., 1996; Kiniry et al.,
nents, gives essentially the same results as the Jamieson
1997). Analysis based on these statistics of deviation
et al. (1998) analysis, which was based on RMSD and
along with regression analysis may give results similar
comparison of the range of the simulated and measured
to those obtained by the MSD-based analysis, if the
grain yields. This makes sense since both the range and
interpretation is made carefully. It is not easy, however,
the standard deviation are measures of dispersion of
to use these multiple criteria in combination, because
data, and, hence, the comparison of the ranges should
they are not explicitly related to each other.
give results similar to those based on the comparison
Thus, for direct comparison between model output
of the standard deviations (SDSD). Note, however, that
and measurement, the MSD-based analysis is better
SDSD and MSD are explicitly related to each other in
suited than the commonly practiced correlation–
Eq. [24], whereas the range and RMSD are not.
regression analysis. In some cases, however, the user’s
The measured grain yield was regressed on simulated
concern may not lie in the direct comparison but in
grain yield (Eq. [2]) using the REG procedure of the
a different aspect of the simulation vs. measurement
SAS/STAT system (SAS Inst., 1988). The results
contrast. When variability of the simulation around the
showed that all the models were highly correlated with
mean is of more concern than deviation (Eq. [9]), then
the measurement r 0.92, but that the regression lines
MSV (Eq. [16]) should be the criterion of the compari-
for some models deviated from the equality line (y
son. Since MSV is the sum of SDSD and LCS, differ-
x). The null hypothesis (slope 1 and intercept 0)
ences in MSV can be analyzed with respect to the two
(Eq. [3]) was rejected for SWHEAT (P 0.020) and
components. Squared difference between standard devi-
SUCROS2 (P 0.050). Sirius was close to the rejection
ations and LCS can be further analyzed with their con-
(P 0.064). Investigating the slope (a) and the intercept
stituents, SD
s
,SD
m
and r. Or, if the pattern of the fluctua-
(b) (Eq. [2]) separately, it is found that a 1 with
tion is the major concern, r is the primary measure of the
SWHEAT (P 0.008) and that b 0 with SWHEAT
comparison between the simulation and measurement.
(P 0.008) and SUCROS2 (P 0.029). For SWHEAT,
Thus, MSD or a part of it can be used for comparison
the slope (2.56) being 1 suggests that the model is less
of model output and measurement depending on the
sensitive to the variability among the irrigation regimes
user’s major concern.
than the measurement. This is consistent with the results
We have not addressed the statistical test of the equal-
of the MSD-based comparison. Although the intercept
ity hypothesis (Eq. [7]) against the nonequality hypothe-
(10.73) for SWHEAT was significantly 0, this should
sis (Eq. [8]). This can be done, if one has an estimate
not be misinterpreted as indicating overestimation of
of the error variance from repeated measurements (see
the model. This negative intercept is only a result of
Appendix B). The error variance could also be esti-
the slope being much larger than unity. With the MSD-
mated with the protocol based on resampling, as pro-
based comparison, no such misinterpretation is likely,
posed by Wallach and Goffinet (1989) and applied to
since each component was distinct from others in its
simulation vs. measurement comparison by Colson et
meaning.
al. (1995).
In this example, unlike in the previous example, the
However, it must be noted that, on most occasions,
simulation vs. measurement comparisons were made
we know the null hypothesis is wrong, no matter what
within the same data set. The results are therefore simi-
the significance test indicates. The model does deviate
lar whether the comparison is based on MSD or correla-
from the reality because of the simplifications inherent
tion and regression. However, the interpretation of the
in any simulation model. Such simplifications or omis-
results is more straightforward in the MSD-based com-
sion of details are inevitable or even necessary in the
parison than in the correlation–regression analysis.
modeling of a complex real system. The deviation of
the model from the reality would result in the difference
DISCUSSION
between the simulated and the true values. If this differ-
ence is smaller than the measurement error, the null
As shown in the examples, the simulation vs. measure-
hypothesis (Eq. [7]) is maintained. However, theoreti-
ment comparison based on MSD is straightforward, with
cally, we could reduce the measurement error by in-
MSD indicating the overall deviation of the model out-
creasing the number of replications, and could eventu-
put from the measurement, and the MSD components
ally detect the difference between the simulated and
representing different aspects of the overall deviation.
the true values, and reject the null hypothesis. There-
The components are simply additive (Eq. [24]); hence
fore, the relevant question is not whether the model is
the user can identify the major component and investi-
right or wrong, but how much the model output differs
gate it further with its constituents. Standard deviations
from the measurement and why. The model’s perfor-
of the simulation and the measurement are useful for
mance can be discussed only relatively, not absolutely
investigating SDSD and LCS. Correlation coefficient
(Oreskes et al., 1994). The MSD-based analysis we pres-
(r) is also important when LCS is the major contributor
ent here would be useful to quantify the deviation of
to MSD. For a large SB, the user would compare the
model output from measurement, and to locate possible
mean of the simulated values x
¯
with that of the measured
values y
¯
. cause(s) of the deviation.
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
351
ACKNOWLEDGMENTS
The authors gratefully acknowledge Dr. T. Miwa of Na-
tional Institute of Agro-Environmental Sciences, Tsukuba,
Japan, and Dr. L. Thalib of Griffith University, Nathan, Aus-
tralia, for their comments on the first draft of this paper. The
two anonymous reviewers are also gratefully acknowledged
for their comments, which helped the authors improve the
manuscript.
APPENDIX A
By denoting the ratio of SD
s
to SD
m
as , namely
a SD
s
/SD
m
[A-1]
MSV (Eq. [20] and [23]) can be rewritten as
MSV SDSD LCS [23]
(SD
s
SD
m
)
2
2SD
s
SD
m
(1 r) [20]
Fig. A-1. Ratio of squared difference between standard deviations
SD
2
m
a[(1 a)
2
/a 2(1 r)] [A-2]
(SDSD) to lack of correlation weighted by the standard deviations
(LCS) as a function of standard deviation of the simulation (SD
s
)/
The ratio of SDSD to LCS is equal to the ratio of the
standard deviation of the measurement (SD
m
) and correlation coef-
two terms in the brackets of the right side of Eq. [A-
ficient (r ). Indicates SD
s
/SD
m
.
2], namely
n
i
1
m
j
1
(z
ij
y
i
)
2
/
2
SSE/
2
2
[n(m 1)] [B-5]
SDSD/LCS [(1 a)
2
/a]/[2(1 r)] [A-3]
Figure A-1 depicts the ratio SDSD/LCS on the ␣⫺r
coordinate in the range
On the other hand, sum of the squared deviation of
y
i
from the true mean,
i
, divided by its variance,
2
/m
0.2 r 1 and 0.2 SD
s
/SD
m
3.2
(Eq. [B-3]), also has a chi-squared distribution of n
which should cover practically the whole domain of the
degrees of freedom, namely
␣⫺r combination. Note that change of the ratio is
quite nonuniform, but that the ratio is 1 (i.e., SDSD
n
i
1
(
i
y
i
)
2
/(
2
/m)
2
(n) [B-6]
LCS) for a major portion of the domain. Exceptions
are the region with high correlation (e.g., r 0.9) and
Under the equality hypothesis (Eq. [7]) (i.e., H
0
:
i
the region with ␣⬍0.3 or ␣⬎2.8 (Fig. A-1).
x
i
), Eq. [B-6] can be written as
n
i
1
(x
i
y
i
)
2
/(
2
/m)
2
(n) [B-7]
APPENDIX B
It is assumed that each of the n measurements (y
i
)is
The left side terms of both Eq. [B-5] and [B-7] have
the mean of m replicated observations, and that each
chi-square distributions and are independent from each
observation (z
ij
) in i-th measurement is a sample from
other. The ratio of the two terms divided by their de-
a normal distribution with mean
i
and variance
2
,
grees of freedom is distributed as an F distribution with
namely
n and n(m 1) degrees of freedom (Mood et al., 1974,
p. 246–249), namely
z
ij
N(
i
,
2
) [B-1]
y
i
m
j
1
z
ij
/m i 1, 2 ... n [B-2]
n
i
1
(x
i
y
i
)
2
/(n
2
/m)
SSE/[n
2
(m 1)]
F[n, n(m 1)] [B-8]
Then y
i
has a normal distribution with mean
i
and
variance
2
/m, and the sum of squared deviation for
The n
2
in both numerator and denominator is omitted,
each measurement divided by the error variance has a
and the square sum of x
i
y
i
is replaced with n MSD
chi-squared distribution with m 1 degrees of freedom
(Eq. [13]) to yield
(Mood et al., 1974, p. 240–246), namely
MSD
SSE/[nm(m 1)]
F[n, n(m 1)] [B-9]
y
i
N(
i
,
2
/m), [B-3]
and
This value is compared to the critical values {e.g.,
F [0.05, n, n(m 1)]} to test the null hypothesis.
m
j
1
(z
ij
y
i
)
2
/
2
2
(m 1) [B-4]
The sum of squared error (SSE) across the n measure-
REFERENCES
ments divided by the error variance also has a chi-
Addiscott, T.M., and A.P. Whitmore. 1987. Computer simulation of
squared distribution with n(m 1) degrees of freedom,
changes in soil mineral nitrogen and crop nitrogen during autumn,
winter and spring. J. Agric. Sci. (Cambridge) 109:141–157.
namely
352
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
Chapman, S.C., G.L. Hammer, and H. Meinke. 1993. A sunflower Mood, A.M., F.A. Graybill, and D.C. Boes. 1974. Introduction to the
simulation model: I. Model development. Agron. J. 85:725–735.
theory of statistics. 3rd ed. McGraw-Hill, New York.
Colson, J., D. Wallach, A. Bouniols, J.B. Denis, and J.W. Jones.
Oreskes, N., K. Shrader-Frechette, and K. Belitz. 1994. Verification,
1995. Mean squared error of yield prediction by SOYGRO. Agron.
validation, and confirmation of numerical models in the earth sci-
J. 87:397–402.
ences. Science (Washington, DC) 263:641–646.
Draper, N., and H. Smith. 1981. Applied regression analysis. 2nd ed.
Retta, A., R.L. Vanderlip, R.A. Higgins, and L.J. Moshier. 1996.
John Wiley & Sons, New York.
Application of SORKAM to simulate shattercane growth using
Jamieson, P.D., J.R. Porter, J. Goudriaan, J.T. Ritchie, H. van Keulen,
forage sorghum. Agron. J. 88:596–601.
and W. Stol. 1998. A comparison of the models AFRCWHEAT2,
SAS Institute. 1988. SAS/STAT user’s guide. 6.03 ed. SAS Inst.
CERES-Wheat, Sirius, SUCROS2 and SWHEAT with measure-
Cary, NC.
ments from wheat grown under drought. Field Crops Res. 55:23–44.
Teo, Y.H., C.A. Beyrouty, and E.E. Gbur. 1992. Evaluating a model
Kiniry, J.R., J.R. Williams, R.L. Vanderlip, J.D. Atwood, D.C. Rei-
for predicting nutrient uptake by rice during vegetative growth.
cosky, J. Mulliken, W.J. Cox, H.J. Mascagni, S.E. Hollinger, and
Agron. J. 84:1064–1070.
W.J. Wiebold. 1997. Evaluation of two models for nine U.S. loca-
Wallach, D., and B. Goffinet. 1989. Mean squared error of predication
tions. Agron. J. 89:421–426.
as a criterion for evaluating and comparing system models. Ecol.
Mayer, D.G., M.A. Stuart, and A.J. Swain. 1994. Regression of real-
Modell. 44:299–306.
world data on model output: An appropriate overall test of validity.
Agric. Syst. 45:93–104.
... Performance of the 'B-M Model' was analyzed statistically using three approaches: (i) correlationregression approach (Kobayashi and Salam, 2000;Gauch et al., 2003) (ii) paired mean testing approach (predicted value versus observed value) (Mead et al., 2002) and (iii) a deviation approach (predicted value minus observed value) (Kobayashi and Salam, 2000). ...
... Performance of the 'B-M Model' was analyzed statistically using three approaches: (i) correlationregression approach (Kobayashi and Salam, 2000;Gauch et al., 2003) (ii) paired mean testing approach (predicted value versus observed value) (Mead et al., 2002) and (iii) a deviation approach (predicted value minus observed value) (Kobayashi and Salam, 2000). ...
... For the deviation approach, two deviation statistics were used. The first deviation statistic was the root mean squared deviation (RMSD), which is the average product of deviations for each 'data-point pair' in two datasets (Kobayashi and Salam, 2000). The second one was the mean squared deviation (MSD). ...
Conference Paper
Full-text available
Quantifying knowledge on agriculture can have many benefits to stakeholders. While many knowledge-based systems exist in modern days for farmers' decision support, specific models are lacking on how knowledge traits can impact on agricultural production systems. This study employed modelling technique, supported by field data, to provide a clear understanding and quantifying how knowledge management in production practices can contribute to rice productivity in the environmentally stressed southwest Bangladesh. This research accounted for 'Boro' rice as the target crop and 'BRRI dhan28' as the test variety. The 'B-M Model' was developed following the principle and procedure from published literature, 'brainstorming' and data from field survey. Three knowledge management trait (KMT) were defined and quantified as the inputs of the model. Those are: self-experience and observation (SEO), extension advisory services (EAS) and accessed information sources (AIS). The yield influencing process (YIP), the intermediate state variable of the model, was deduced by accounting for the two dominant agronomic practices, seedling age for transplanting and triple superphosphate (TSP) application. 'Knowledge drives farmers' practice change which in turn influences yield' was composed as the theoretical framework of the 'B-M Model'. The model performed strongly against independently collected field data set. Across the 180 farmers' data, the average relative rice yield (RRY) predicted by the model (0.705) and observed in the field (0.716) was close (root mean squired deviation (RMSD) = 0.018). The difference between predicted and observed RRY was not statistically different (LSD = 0.03), indicating the model fully captured the field data. A regression of predicted and observed RRY explained 96% variance in observation, further proving the model's strength in estimating RRY in wider range of farmers' rice yield. In a normative analysis, the practicality and usefulness of the model to stakeholders were simulated for understanding of how much achievable yield could be expected by changing farmers' knowledge pool (the sum of three KMT) on rice production practices, and at what combination(s) of KMT to be considered at strategic hierarchy to materialize a targeted achievable yield. To best of the knowledge, a model quantifying rice yield in relation to knowledge management trait does not exist in literature. Upon successful testing under diverse yield scenarios using multiple and sophisticated statistical tools that enhanced credibility of the model, it is concluded that the model has the potential to be used for identifying quantitative pathways of farmers' knowledge acquisition for practice change leading to improved productivity of rice in the southwest region of Bangladesh.
... Mainly, estimates that do not repeat are compared with the pairs estimates, but without repetition. The parameters for the utility of the BRCANE model applied for analysis assessment techniques are: a) deviation of the median -SB (Kobayashi & Salam, 2000), b) Square root of the variance of the error -RMSV (Kobayashi & Salam, 2000); c) "Root mean squared error"-RMSE (Fox, 1981); d) "General standard deviation" -GSD (Jørgensen et al., 1991); e) "Modeling efficiency" -EF (Greenwood et al., 1985); f) "Índex of. agreement"-d (Willmott & Wicks, 1980); g) "Mean bias error" -MBE (Addiscott & Whitmore, 1987) and h) "Coefficient of residual mass" -CRM (Loague & Green, 1991).The total dry matter production results were corrected to include the application of a linear equation for variety in Australian conditions, correlated with yield estimated by the models APSIM and CANEGRO / DSSAT and BRCANE. ...
... Mainly, estimates that do not repeat are compared with the pairs estimates, but without repetition. The parameters for the utility of the BRCANE model applied for analysis assessment techniques are: a) deviation of the median -SB (Kobayashi & Salam, 2000), b) Square root of the variance of the error -RMSV (Kobayashi & Salam, 2000); c) "Root mean squared error"-RMSE (Fox, 1981); d) "General standard deviation" -GSD (Jørgensen et al., 1991); e) "Modeling efficiency" -EF (Greenwood et al., 1985); f) "Índex of. agreement"-d (Willmott & Wicks, 1980); g) "Mean bias error" -MBE (Addiscott & Whitmore, 1987) and h) "Coefficient of residual mass" -CRM (Loague & Green, 1991).The total dry matter production results were corrected to include the application of a linear equation for variety in Australian conditions, correlated with yield estimated by the models APSIM and CANEGRO / DSSAT and BRCANE. ...
Article
Full-text available
Resumo Este estudo apresenta a construção de um modelo ecofisiológico-matemático (BrCane) para predizer a produtividade potencial-sem restrições nutricionais ou de água, a fim de analisar a sustentabilidade da expansão do cultivo de cana-de-açúcar em novas áreas para produção de etanol. A arquitetura do modelo BRCANE foi concebida para uma planta tipo C4, onde a evolução mensal da biomassa foi estimada em função da temperatura do ar e da radiação incidente. Nas simulações apresentadas a produção de biomassa levou em conta a taxa bruta de fotossíntese subtraídas as perdas para respiração de manutenção, senescência de folhas e morte de perfilhos durante o ciclo da cultura. O modelo BRCANE também foi usado para descrever o comportamento fisiológico em função das condições ambientais relacionadas ao tempo termal. A implementação de tais condições permitiu ajustar os resultados das simulações a resultados experimentais disponíveis na literatura. As estimativas de biomassa foram comparadas com dados obtidos durante o ciclo da cultura em experimentos de campo com irrigação (Cultivares RB72 454, NA 56-79, CB 41-76, CB47-355, CP51-22, Q138 e Q141) no Estado de São Paulo (Brasil) e em Bundaberg e Queensland (Austrália) e os resultados foram expressos em toneladas de colmo por hectare (Mg ha-1), por meio de uma relação linear para cada variedade (R 2 = 0,89**) e superiores aos obtidos pelos modelos APSIM (R 2 =0,78*) e CANEGRO (R 2 = 0,71*). O modelo apresentou resultados consistentes com dados experimentais para crescimento e produção de biomassa no ciclo da cultura da cana-de-açúcar, oriundo de canaviais paulistas (Brasil) e de Bundaberg (Austrália). Palavras-chave: modelo ecofisiológico, Ecologia, Fisiologia vegetal, bioenergia. Abstract A model of sugarcane was built to simulate the potential yield (without nutrition and water restrictions) for sustainability analysis of new expanded cultivation areas to ethanol or sugar production or climate change impact studies. The potential yield in terms of dry matter of sugarcane was adjusted to estimate the carbon dioxide absorption. As photosynthetic pathway C4 plant, in relation with air temperature and solar radiation to calculate a monthly production of dry matter (DM) was calculated during the crop cycle. The DM take in account gross photosynthetic rate subtracting loses by maintenance respiration, senescence of leafs, and tillers during the cycle. The BRCANE is a dynamic simulation model, it isbuild by mathematical equations which describe the physiological behaviour due to environment conditions averaging the thermal variables, model was calibrated which constants that they was obtained through adjusts of literature results and it was validated with experimental data. The simulated DM by the model was contrasted with data which obtained during the cycle from experimental irrigated field (cultivars RB72 454, NA 56-79, CB 41-76, CB 47-355, CP 51-22, Q138, and Q141), in the São Paulo State (Brazil) and in Bundaberg SES, Queensland (Australia). The results of total DM were modified in stalk tons per hectare (Mg ha-1) through linear equation for each cultivar, with regression coefficients higher than 0,89**(R 2) and higher than those obtained by APSIM (R 2 = 0.78*) and CANEGRO (R 2 = 0.71*) models. The model showed consistent simulations for DM during the crop cycle, as well as on simulated yield.
... The performance of the framework was tested in two ways. Firstly, using a deviation approach (prediction minus observation) (Kobayashi & Salam 2000) by employing statistics viz., bias, root mean squared deviation (RMSD) and the coefficient of variation of the RMSD. Secondly, using 'an envelope of acceptable precision' around the reference zero line (when deviation between measurement and prediction is zero) as proposed by Mitchell & Sheehy (1997). ...
Thesis
Full-text available
Annual pasture legumes (APLs) are important in Western Australian farming systems, with subterranean clovers and annual medics being dominant. However, due to potential environmental, economic and biological constraints of these species, alternatives have been sought, with a second generation of new species being introduced since 1991. Despite the views of researchers about the advantages in WA conditions of the newly released annual pasture legumes over traditional pastures, there is a perception by some industry decision makers that their level of adoption has been lower than expected. However, there was not a good method for evaluating the level of adoption. The aim of this study was therefore to enhance understanding of how to improve the fit of new annual pasture legumes in Western Australian farming systems, taking two pastures, French serradella (Ornithopus sativus) cv. Cadiz and Biserrula (Biserrula Pelecinus) cv. Casbah (Hereafter, will be referred to as Cadiz and Casbah.), as examples. The objectives of the study were implemented in four steps. In step one, a framework, built on a three-tier hierarchy (broad adoption potential or BAP, broad attainable adoption potential or BAAP, and maximum attainable adoption potential or MAAP) was developed based on the agro-ecological suitability of the annual pasture legumes. BAP was calculated from the amount of suitable land in terms of soil and rainfall requirements for an APL. The BAAP was calculated by multiplying BAP with two coefficients related to the proportion of cropping area within a geographic region, and the crop-pasture ratio within the cropping area. The MAAP was calculated by multiplying BAAP with a coefficient related to the certainty of a successful pasture-growing season. This coefficient was derived from a Microsoft-Excell®-based Climate Reliability Calculator particularly developed for this study. The broad attainable adoption potentials (BAAP) for Cadiz and Casbah were calculated as 1.67 M ha and 1.18 M ha, respectively. These figures were about 81% less than the calculated broad adoption potential (BAP). The maximum attainable adoption potentials (MAAP) for Cadiz and Casbah in Western Australian cropping-belt were calculated as 0.99 and 0.89 M ha, respectively. In step two, a survey was conducted to understand the salient issues that farmers consider in relation to adopting a new annual pasture legume for their farming systems. An open-ended question was used for them for the attributes they desired for their ‘dream’ pasture. Questions were also asked about their experiences of strengths and weaknesses for Cadiz and Casbah. Responses were analysed using the principles of ‘grounded theory’. Furthermore, based on farmers’ perceptions, an APL-characteristics framework was developed for Western Australia. The framework consisted of six attributes of a pasture. They are, in order of importance calculated from the percent of farmers responses: superiority in establishment and growth (79%), ability in supplying quality feed (49%), improved potential in controlling weeds (38%), adaptability in broader agro-ecological horizon (36%), tolerant to major insect-pests (20%), and inexpensive (15%). Many farmers desired a combination of these components rather than just a single component. The two test APLs, Cadiz and Casbah, were compared under this framework based on the responses of the farmers. In the third step, using farmers’ perceptions of the salient attributes and other variables, an empirical model was developed to predict the likely adoption of any annual pasture legume in Western Australian farming systems. The model consisted of the product of two components, AAAR and TRMAP. The AAAR was the averaged annual adoption rate (as the percentage of all pastures grown in Western Australia) of the APL. TRMAP is the time, in years, required to reach the maximum adoption potential of the APL. The AAAR was related to the agronomic characteristics of the APL (the three most wanted characteristics by farmers, i.e. establishment and growth, feed supply and quality and weed control) and an ‘inter-competition’ factor, whereas the TRMAP was attributed to its scope of adaptation. Both AAAR and TRAMP were essentially regression models. The model performed well when tested independently for Cadiz and Casbah using inputs from two different sources, i.e. breeders and farmers. In the final step, the model was applied to predict the adoption of Cadiz and Casbah using inputs from breeders and farmers in order to understand what level of adoption breeders would have expected and to what extent farmers would support the breeders’ view. Results showed that breeders were expecting Cadiz and Casbah would be adopted in about 32% and 22% of their potential areas (MAAP) compared to the achieved adoption of 23% for Cadiz and 20% for Casbah, respectively. On the other hand, model output using farmers’ evaluation scores indicated that the adoption would be 20% for Cadiz and 19% for Casbah, which is much closer to the achieved adoption level. The difference between breeders’ expectation and farmers’ evaluation on adoption potential of Cadiz and Casbah was due to differences in evaluation scores provided by the two groups on different pasture characteristics in relation to establishment and growth, weed control and feed supply and quality. Some of the pasture characteristics desired by the farmers, such as reliable regeneration, seed settings, easy establishment, general vigor, good chemical tolerance, good feed supply and quality, suitable for wide range of soils, good insect tolerance are not commonly present when Cadiz and Casbah are grown in the farming environments. Two issues for further consideration if the adoption levels of Cadiz and Casbah were to be increased in WA farming systems are: decreasing the knowledge gap among farmers on tactical management of APLs though extension, and improved pasture characteristics through the breeding/selection process. Furthermore, this study designed a system consisting of three major components: the maximum attainable adoption potential (MAAP), the annual pasture legume characteristics framework (APL-characteristics for Western Australia) and achievable adoption potential (AAP). This system acts as a common platform - where breeders, farmers, extension specialists and policy makers could work as a team towards improving the fit of annual pasture legumes, and potentially other crops if the required supporting information was collected, in Western Australian farming systems.
... Proportion of mean square prediction error in ECT % and ER % indicates the reasons for predicted differences from observed values. Equations with lower ER % tend to keep consistent mean bias among predicted values (Kobayashi and Salam, 2000). Indeed, equations with lower ER % also presented lower RSR within datasets. ...
Article
Accurate estimates of methane (CH 4) production by cattle in different contexts are essential to developing mitigation strategies in different regions. We aimed to: (i) compile a database of CH 4 emissions from Brazilian cattle studies, (ii) evaluate prediction precision and accuracy of extant proposed equations for cattle and (iii) develop specialized equations for predicting CH 4 emissions from cattle in tropical conditions. Data of nutrient intake, diet composition and CH 4 emissions were compiled from in vivo studies using open-circuit respiratory chambers, SF 6 technique or the GreenFeed ® system. A final dataset containing intake, diet composition, digestibility and CH 4 emissions (677 individual animal observations, 40 treatment means) obtained from 38 studies conducted in Brazil was used. The dataset was divided into three groups: all animals (GEN), lactating dairy cows (LAC) and growing cattle and non-lactating dairy cows (GCNL). A total of 54 prediction equations available in the literature were evaluated. A total of 96 multiple linear models were developed for predicting CH 4 production (MJ/day). The predictor variables were DM intake (DMI), gross energy (GE) intake, BW, DMI as proportion of BW, NDF concentration, ether extract (EE) concentration, dietary proportion of concentrate and GE digestibility. Model selection criteria were significance (P < 0.05) and variance inflation factor lower than three for all predictors. Each model performance was evaluated by leave-one-out cross-validation. The Intergovernmental Panel on Climate Change (2006) Tier 2 method performed better for GEN and GCNL than LAC and overpredicted CH 4 production for all datasets. Increasing complexity of the newly developed models resulted in greater performance. The GCNL had a greater number of equations with expanded possibilities to correct for diet characteristics such as EE and NDF concentrations and dietary proportion of concentrate. For the LAC dataset, equations based on intake and animal characteristics were developed. The equations developed in the present study can be useful for accurate and precise estimation of CH 4 emissions from cattle in tropical conditions. These equations could improve accuracy of greenhouse gas inventories for tropical countries. The results provide a better understanding of the dietary and animal characteristics that influence the production of enteric CH 4 in tropical production systems. Implications This study aimed to evaluate and develop equations for predicting methane emissions from beef and dairy cattle in the tropics. The results support the use of prediction equations for lactating dairy cows, growing cattle and non-lactating dairy cows in Brazilian conditions. We developed prediction equations with different complexity levels. The results may be of interest to the scientific community for estimating energy losses during rumen fermentation, measuring the impacts of livestock on greenhouse gas emissions and developing mitigation strategies. Although the data are from Brazilian studies, the results may be applicable to other tropical regions. *These two authors contributed equally to this work. a Present address: EMBRAPA Dairy Cattle, Rua Eugênio do Nascimento, 610,
... La MSE a été décomposée comme proposé par Kobayashi et Salam (2000) : SDSD représente la différence entre la gamme de variation des valeurs mesurées et celle des valeurs prédites. Un SDSD élevé signifiera que l'amplitude des valeurs prédites diffère de celle des valeurs observées (la gamme de variation n'est pas la même). ...
... The goodness of fit of the model was evaluated through the relative root mean square error (RRMSE) (Kobayashi and Salam, 2000), which is a common criterion to quantify the mean difference between simulation and measurement: Chapitre IV : Dissection des processus écophysiologiques sous-jacents à la teneur en sucres du fruit de tomate par une approche modélisatrice 71 where i y is the observed value, i ŷ is the corresponding predicted value, N is the number of observed data, and y is the mean of all measured values. The smaller the value of RRMSE, the better is the goodness of fit. ...
... This is a basic concept by Aitken (1973) and Willmott (1981). More recently, Kobayashi and Salam (2000) developed the same concept to have residuals disaggregated into erratic and systematic components. They used the root mean square variation (RMSV) to indicate how much the model fails to estimate the variability of the measures around the mean, together with derived measures such as simulation bias (SB), square differences of the standard deviations (SDSD) and lack of positive correlation weighted by the standard deviations (LCS). ...
Book
The potential of mathematical models is widely acknowledged for examining components and interactions of natural systems, estimating the changes and uncertainties on outcomes, and fostering communication between scientists with different backgrounds and between scientists, managers and the community. For favourable reception of models, a systematic accrual of a good knowledge base is crucial for both science and decision-making.As the roles of models grow in importance, there is an increase in the need for appropriate methods with which to test their quality and performance. For biophysical models, the heterogeneity of data and the range of factors influencing usefulness of their outputs often make it difficult for full analysis and assessment. As a result, modelling studies in the domain of natural sciences often lack elements of good modelling practice related to model validation, that is correspondence of models to its intended purpose.Here we review validation issues and methods currently available for assessing the quality of biophysical models. The review covers issues of validation purpose, the robustness of model results, data quality, model prediction and model complexity. The importance of assessing input data quality and interpretation of phenomena is also addressed. Details are then provided on the range of measures commonly used for validation. Requirements for a methodology for assessment during the entire model-cycle are synthesised. Examples are used from a variety of modelling studies which mainly include agronomic modelling, e.g. crop growth and development, climatic modelling, e.g. climate scenarios, and hydrological modelling, e.g. soil hydrology, but the principles are essentially applicable to any area. It is shown that conducting detailed validation requiresmulti-faceted knowledge, and poses substantial scientific and technical challenges. Special emphasis is placed on using combined multiple statistics to expand our horizons in validation whilst also tailoring the validation requirements to the specific objectives of the application.
... Nous avons ensuite présenté l'erreur quadratique moyenne des modèles (Mean Square Error ou MSE) et sa décomposition comme proposée par Kobayashi et Salam (2000). ...
Thesis
Le changement climatique (CC) influence le développement des maladies des plantes et leur répartitiongéographique. Ainsi, pour adapter l’agriculture à ces nouvelles conditions, des outils de compréhensionet de prévision du fonctionnement d'une large gamme de pathogènes sont nécessaires. Ce travail a eucomme objectif de développer un modèle générique, mécaniste et dynamique de maladies fongiquesaériennes à coupler à un modèle de fonctionnement des cultures.Une analyse bibliographique a permis d’identifier les principaux types de réponses des pathogènes auxfacteurs du système plante-climat-pathogène. Selon les processus épidémiologiques concernés, desfonctions de réponse génériques ont été identifiées ou développées. Le modèle construit, MDMA, simulel’enchaînement de monocycles épidémiques à l’échelle du cycle cultural via une structure modulaire etcalcule la dynamique de sévérité de maladie (% de surface malade).Le modèle MDMA couple au modèle STICS a été ensuite appliqué à la rouille brune (Puccinia triticina)du blé (Triticum aestivum) et au mildiou (Plasmopara viticola) de la vigne (Vitis vinifera). Sa calibrationet évaluation sur des données observées ont montré son aptitude à reproduire des dynamiques de sévéritéde ces deux pathosystèmes.Enfin, une étude d’impacts du CC sur ces deux pathosystèmes a été réalisée sur trois sites représentatifsdes grands climats français, Avignon, Bordeaux et Dijon, à partir du scénario d’émission de gaz à effetde serre A1B. Le modèle a permis de comprendre l’évolution des principales étapes de la maladie, endistinguant l’effet direct de l’effet indirect (via la plante) du climat. Ainsi, une augmentation générale desévérité de rouille brune due à l’augmentation des températures favorisant la réalisation des cyclesépidémiques hivernaux est attendue. La sévérité du mildiou tend à diminuer en raison de la baisse depluviométrie. Cette étude a mis en évidence le besoin d’une connaissance approfondie du fonctionnementdes pathosytèmes face aux nouvelles conditions climatiques attendues.
... The contribution of each explanatory variable to the total variance of the model was calculated. The models were compared in terms of adjusted R 2 , residual means square errors of the model (RMSE), and the mean square error of prediction (MSPE) [7]: Relationships between carcass tissue composition and indicators are presented in Table 2. Hot carcass weight was more highly correlated with the muscle weight than with adipose tissue weight (Model 7 vs. 1) explaining a greater proportion of the variance (74 vs. 3%). Relationships were generally improved when indicators were included as additional covariate. ...
Book
The most accurate determination of beef carcass quality involves dissection of cut or entire carcass. This, however, is very costly and cumbersome. An alternative is to determine easily obtainable indicators. The objective of this study was to derive quantitative relationships between indirect carcass indicators and measured carcass tissue composition. A meta-analysis was applied on 25 published trials with cattle. The selected indicators were USDA yield grade, fat thickness, marbling, ribeye area and carcass conformation and fatness scores. The USDA yield grade was the most highly related to changes in carcass adipose tissue and muscle weights. Fat thickness was also related to changes in both adipose tissue and muscle weights. Other indicators were less correlated with changes in carcass tissue composition. The relationships obtained in this study depend on measurement accuracy and different deposition kinetics for adipose tissue.
Article
Full-text available
Verification and validation of numerical models of natural systems is impossible. This is because natural systems are never closed and because model results are always nonunique. Models can be confirmed by the demonstration of agreement between observation and prediction, but confirmation is inherently partial. Complete confirmation is logically precluded by the fallacy of affirming the consequent and by incomplete access to natural phenomena. Models can only be evaluated in relative terms, and their predictive value is always open to question. The primary value of models is heuristic.
Article
In dryland farming systems, opportunities to improve sunflower (Helianthus annuus L.) yields are mostly associated with management decisions made at planting. Dynamic crop simulation models can assist in making such decisions. This study reports the structure of QSUN, a simple and mechanistic crop model for sunflower, and how it accounts for the dynamic interaction of the crop with the soil and aerial environment. The model incorporates several recent approaches to simulation of crop growth in dryland conditions. QSUN estimates growth, development, and yield of a sunflower crop. Daily inputs of temperature and photoperiod drive a phenology submodel to predict stages of emergence, bud visibility, 50% anthesis, and maturity [...]
Article
Shattercane [Sorghum bicolor (L.) Moench] is a serious weed problem. Few methods are available to assess the impact of shattercane competition on crop growth and yield. This study evaluated model sensitivity to variations in light, water, and plant density and assessed the applicability of the SORKAM model for simulating forage sorghum (used to represent shattercane) growth and development. Sensitivity analysis of SORKAM to photosynthetically active radiation (50, 75, and 100% of actual); water (50, 75, and 100% of actual precipitation, and no water stress); and plant densities of 3, 6, 12, 24, 48, and 96 plants m-2 was performed. 'Rox Orange' forage sorghum (used to represent shattercane) was grown at three densities, under irrigated and dryland environments over a 3-yr period. Experiments were conducted on Eudora (coarse-silty, mixed, mesic Fluventic Hapludoll) and Kahola silt loam (fine- silty, mixed, mesic Cumulic Hapludoll) soils, in Manhattan, KS. The sensitivity analysis showed that SORKAM responded to a broad range of light, water, and plant density environments, with significant light by water interaction for biomass, and light by density interaction for biomass, leaf area index (LAI), and tillering. The SORKAM model simulated growth of Rox Orange from inputs of weather, soil, and plant data were compared with measured values. There was good agreement between simulated and measured LAI through 60 d after planting (r2 = 0.82, intercept not significantly different from 0, and slope = 0.83). Accurate estimation of LAI during this period of vegetative growth is important, because LAI determines the competition between a crop and shattercane for light and water. Agreement of measured and simulated biomass was good for low plant densities (slope and intercept not significantly different from I and 0 with high r2 = 0.78), with progressively greater underestimation at higher densities. Tiller number was generally underestimated, with no apparent correlation to model under- or overestimation of LAI or biomass. The sensitivity analysis and comparison of measured and simulated LAI indicates that the SORKAM model can be used to assess Rox Orange (shattercane) competition.
Article
Since submerged soils create a unique chemical environment from which roots take nutrients, models that estimate nutrient uptake for upland crops may differ from those for rice (Oryza sativa L.). A mathematical model that bad been used to predict nutrient uptake by corn (Zea mays L.), soybean [Glycine max. (L.) Merr.], and sorghum (Sorghum bicolor L.) was evaluated for predicting uptake by rice under flooded soil conditions. Our objectives were to evaluate such a model for rice and to conduct greenhouse and growth chamber studies to measure the soil and plant parameters required by the model. Sensitivity analyses were also conducted to determine the parameter most influencing N, P, and K uptake. Two soils (Crowley silt loam (Typic Alhaqualfs) and Calhoun silt loam (Typic Glossaqualfs)] and three rice cultivars (Lemont, Katy, and Mars) were selected for this study. Linear regression of predicted vs. observed uptake for N, P, and K gave satisfactoty fits to the data with R² values of 0.48, 0.99, and 0.88, respectively. The slopes of the lines for predicted vs. observed uptake for N and K were 1.01 and 0.94, respectively, and 1 the relationships did not depend upon either soil or cultivar. However, the prediction of P did depend on cultivar. The slopes of the lines for predicted vs. observed P uptake by Mars, Lemont, and Katy were 1.81, 1.08, and 1.02, respectively. The parameters most affecting N uptake were root growth rate, N concentration in the soil solution, and the half-distance between root axes. However, parameters most influencing P and K uptake were root growth rate, root radius, and maximal influx rate. Published with permission of the Arkansas Agric. Exp. Stn. Please view the pdf by using the Full Text (PDF) link under 'View' to the left. Copyright © . .
Article
Yield prediction is often one of the major intended uses of a crop simulation model. It is therefore important to evaluate how well a model performs as a predictor. The purpose of this study was to evaluate and analyze how well the SOYGRO model predicts soybean yield, using as a criterion the mean squared error of prediction (MSEP). The four target populations for prediction were irrigated or unirrigated plots at one location in France, for each of two varieties. The model parameters are estimated from the irrigated plots. The estimated MSEP values are on the order of 1(t ha−¹)² for all the target populations. For comparison, we defined an AVERAGE model. This model uses the average observed irrigated yield for each cultivar as the predictor of unobserved yields. AVERAGE was a better predictor than SOYGRO for the irrigated populations, while SOYGRO was better for the unirrigated populations. It seems that SOYGRO has sufficient built-in biological realism to extrapolate more reasonably than the AVERAGE model from irrigated to unirrigated conditions; however SOYGRO does not make as effective use of the data used for parameter estimation as does AVERAGE. Contribution from INRA with the support of the Centre Technique Interprofessionnel des Oleagineux Metropolitain and in cooperation with the Univ. of Florida Please view the pdf by using the Full Text (PDF) link under 'View' to the left. Copyright © . .
Article
The computer model described simulates changes in soil mineral nitrogen and crop uptake of nitrogen by computing on a daily basis the amounts of N leached, mineralized, nitrified and taken up by the crop. Denitrification is not included at present. The leaching submodel divides the soil into layers, each of which contains mobile and immobile water. It needs points from the soil moisture characteristic, measured directly or derived from soil survey data; it also needs daily rainfall and evaporation. The mineralization and nitrification submodel assumes pseudo-zero order kinetics and depends on the net mineralization rate in the topsoil and the daily soil temperature and moisture content, the latter being computed in the leaching submodel. The crop N uptake and dry-matter production submodel is a simple function driven by degree days of soil temperature and needs in addition only the sowing date and the date the soil returns to field capacity, the latter again being computed in the leaching submodel. A sensitivity analysis was made, showing the effects of 30% changes in the input variables on the simulated amounts of soil mineral N and crop N present in spring when decisions on N fertilizer rates have to be made. Soil mineral N was influenced most by changes in rainfall, soil water content, mineralization rate and soil temperature, whilst crop N was affected most by changes in soil temperature, rainfall and sowing date. The model has so far been applied only to winter wheat growing through autumn, winter and spring but it should be adaptable to other crops and to a full season.The model was validated by comparing its simulations with measurements of soil mineral N, dry matter and the amounts of N taken up by winter wheat in experiments made at seven sites during 5 years. The simulations were assessed graphically and with the aid of several statistical summaries of the goodness of fit. The agreement was generally very good; over all years 72% of all simulations of soil mineral N to 90 cm depth were within 20 kg N/ha of the soil measurements; also 78% of the simulations of crop nitrogen uptake were within 15 kg N/ha and 63% of the simulated yields of dry matter were within 25 g/m2 of the amounts measured. All correlation coefficients were large, positive, and highly significant, and on average no statistically significant differences were found between simulation and measurement either for soil mineral N or for crop N uptake.
Article
Model evaluation is an essential aspect of the process of development of system models. When the main purpose of the model is prediction, a reasonable criterion of model quality is the mean squared error of prediction. This criterion is defined here, and it is shown how it can be estimated from available data in a number of situations, including the situation where the parameters of the model are adjusted to the data. An example of the use of this criterion for choosing between alternative models is presented.
Article
The predictions of five simulation models were compared with data from a winter sown wheat experiment performed in a mobile automatic rainshelter at Lincoln, New Zealand in 1991/1992, where observed grain yields ranged from 3.6 to 9.9 t ha−1. Four of the five models predicted the yield of the fully irrigated treatment to within 10%, and SWHEAT underestimated by more than 20%. The same four models also predicted the grain yield response to varying water supply with reasonable accuracy, but SWHEAT again underestimated the yield reduction with increasing drought. However, the performance of all the models in predicting both the time course and final amount of aboveground biomass, of leaf area index (LAI) and evapotranspiration, varied substantially. These variations were associated with their diverging assumptions about the effects of root distribution and soil dryness on the ability of the crops to extract water, the value of the ratio of water supply to water demand at which stress begins to reduce leaf area development, and photosynthetic, or light-use efficiency (LUE). All the models predicted, to varying degrees, that reductions in photosynthetic efficiency or LUE was an important contributor to reductions in the rate of biomass accumulation. In contrast, analysis of the experimental data indicated that this factor was a minor contributor to the reduction, and variation in light interception, associated with changes in LAI, was the major cause.