Content uploaded by Moin U Salam
Author content
All content in this area was uploaded by Moin U Salam
Content may be subject to copyright.
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
345
lization in bean plants as influenced by phosphorus nutrition. Crop Ulrich, A., and F.J. Hills. 1990. Plant analysis as an aid in fertilizing
sugar beets. p. 430–448. In R.L. Westerman (ed.) Soil testing andSci. 36:929–935.
Steel, R.G.D., and J.H. Torrie. 1960. Principles and procedures of plant analysis. ASA, CSSA, and SSSA, Madison, WI.
Ware, G.O., K. Ohki, and L.C. Moon. 1982. The Mitscherlich plantstatistics. McGraw-Hill, New York.
Ulrich, A. 1952. Physiological basis for assessing the nutritional re- growth model for determining critical nutrient deficiency levels.
Agron. J. 74:88–91.quirements of plants. Annu. Rev. Plant Physiol. 3:207–228.
Ulrich, A., and W.L. Berry. 1961. Critical phosphorus levels for lima Wiggins, I. 1980. Flora of Baja, California. Stanford Univ., Palo
Alto, CA.bean growth. Plant Physiol. 36:626–632.
MODELING
Comparing Simulated and Measured Values Using Mean Squared
Deviation and its Components
Kazuhiko Kobayashi* and Moin Us Salam
ABSTRACT
between the calculated and measured values, and re-
gression of measured on calculated values. For example,
When output (x ) of a mechanistic model is compared with measure-
when Kiniry et al. (1997) compared measured and simu-
ment (y ), it is common practice to calculate the correlation coefficient
between x and y, and to regress y on x. There are, however, problems
lated yields of maize for 10 yr from 1983 to 1992 at nine
in this approach. The assumption of the regression, that y is linearly
locations in the USA, measured yields were plotted
related to x, is not guaranteed and is unnecessary for the x–y compari-
against simulated ones, the correlation coefficient was
son. The correlation and regression coefficients are not explicitly
calculated, and regression lines were fitted. This has
related to other commonly used statistics [e.g., root mean squared
been common practice in the comparison between cal-
deviation (RMSD)]. We present an approach based on the mean
culated and measured values (e.g., Teo et al., 1992;
squared deviation (MSD ⫽ RMSD
2
) and show that it is better suited
Chapman et al., 1993; and Retta et al., 1996). In this
to the x–y comparison than regression. Mean squared deviation is
approach, the correlation is a criterion of the predictive
the sum of three components: squared bias (SB), squared difference
accuracy of the model along with the requirements for
between standard deviations (SDSD), and lack of correlation weighted
the regression line (i.e., the intercept is not significantly
by the standard deviations (LCS). To show examples, the MSD-based
analysis was applied to simulation vs. measurement comparisons in
different from zero and the slope is not significantly
literature, and the results were compared with those from regression
different from unity). Statistical testing of these require-
analysis. The analysis of MSD clearly identified the simulation vs.
ments for the regression line has been established
measurement contrasts with larger deviation than others; the correla-
(Mayer et al., 1994).
tion–regression approach tended to focus on the contrasts with lower
The correlation–regression approach is very common
correlation and regression line far from the equality line. It was also
in fitting an empirical model to data obtained from
shown that results of the MSD-based analysis were easier to interpret
experiments or surveys. The model parameters are ad-
than those of regression analysis. This is because the three MSD
justed to give the best fit of the empirical model to
components are simply additive and all constituents of the MSD
measurement. Software packages are readily available
components are explicit. This approach will be useful to quantify the
to plot the data and fit the model. This approach is
deviation of calculated values obtained with this model from mea-
surements.
hence familiar and convenient for most scientists, and
may be a reason why this approach is adopted for the
comparison between calculated values and measure-
ment when the model is mechanistic rather than empiri-
cal. As shown below, however, regression is not ideal
I
n simulating crop growth with a process-based
for this type of comparison, where comparison between
model, comparison between the model output and
the calculated values and measurements rather than
the measurement is an important activity to test the
fitting of the model to the measurement is of concern.
model accuracy and locate room for further improve-
Henceforth, simulated value for a growth trait is de-
ments. The comparison is often based on correlation
noted as x, and measured value is denoted as y.Itis
K. Kobayashi, National Institute of Agro-Environmental Sciences,
3-1-1 Kannondai, Tsukuba, Ibaraki 305-8604, Japan; M.U. Salam,
Abbreviations: CV, coefficient of variation; KS, experiment location
Rice FACE Project, Japan Science and Technology Corp.–National
in Kansas; LCS, lack of correlation weighted by the standard devia-
Institute of Agro-Environmental Sciences, 3-1-1 Kannondai, Tsukuba,
tions; MD, mean of the deviations; MSD, mean squared deviation;
Ibaraki 305-8604, Japan. Research partly supported by Core Research
MSV, mean squared variation; NE, experiment location in Nebraska;
for Evolutional Science and Technology (CREST) of Japan Science
NY, experiment location in NY; RMSD, root mean squared deviation;
and Technology Corp. (JST). Received 25 Jan. 1999. *Corresponding
RMSE, root mean squared error; SB, squared bias; SD
m
, standard
author (clasman@niaes.affrc.go.jp).
deviation of the measurement; SD
s
, standard deviation of the simula-
tion; SDSD, squared difference between standard deviations.Published in Agron. J. 92:345–352 (2000).
346
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
assumed that y is the sum of the true mean () and summarized with statistics of the overall deviation. The
most commonly used statistics among those based onthe random error (ε) associated with the measurement,
namely the deviation is the RMSD (e.g., Jamieson et al.,
1998), namely
y ⫽ l ⫹ ε [1]
RMSD ⫽
冪
1
n
兺
n
i
⫽
1
(x
i
⫺ y
i
)
2
[11]
In regressing y on x, a linear relationship is assumed
between x and , namely
Another commonly used criterion is the MD, namely
l ⫽ ax ⫹ b [2]
MD ⫽
1
n
兺
n
i
⫽
1
(x
i
⫺ y
i
) [12]
where a is the slope and b is the y-intercept of the
regression line. Then, the null hypothesis
H
0
: a ⫽ 1 and b ⫽ 0 [3]
In literature, RMSD is often referred to as root mean
squared error (RMSE) (Retta et al., 1996), and MD is
is tested against the alternative hypothesis (Mayer et
often called bias (Retta et al., 1996; Jamieson et al.,
al., 1994)
1998). Of these two statistics, RMSD represents the
H
1
: a ⬆ 1 and/or b ⬆ 0 [4]
mean distance between simulation and measurement;
MD is the difference between the means of simulation
These hypotheses can be translated into the relation-
and measurement. Root mean squared deviation and
ships between x (simulated value) and (true mean
MD thus represent different aspects of the overall devia-
value)
tion, but the relationship between the two has not been
H
0
: l ⫽ x [5]
well defined.
In literature, these deviation-based statistics are often
H
1
: l ⫽ ax ⫹ b and (a ⬆ 1 and/or b ⬆ 0) [6]
used in conjunction with correlation and regression co-
By contrast, in the direct comparison between x and y,
efficients (Addiscott and Whitmore, 1987; Retta et al.,
the null (H
0
) and alternative (H
1
) hypotheses are
1996; Kiniry et al., 1997; Jamieson et al., 1998). Although
these different statistics may represent somewhat differ-
H
0
: l ⫽ x [7]
ent aspects of the model–measurement discrepancy, it
H
1
: l ⬆ x [8]
is not clear how the different statistics relate to each
other, and if these statistics cover all aspects of the
The difference between the regression and the direct
discrepancy sufficiently. It is also noteworthy that, as
comparison is in the alternative hypotheses (Eq. [6] vs.
shown before, the deviation-based statistics (e.g.,
[8]). The regression analysis assumes the linear relation-
RMSD) and the correlation-based statistics (e.g., the
ship between x and under the alternative as well as
correlation coefficient) are not really consistent with
the null hypotheses, but this assumption is not guaran-
each other in their assumptions.
teed and should not be taken for granted. If each mea-
Our objective is to present a framework for the simu-
surement is based on replicated measurements, the vari-
lation vs. measurement comparison. The framework is
ance of the error term (ε of Eq. [1]) can be estimated
based on the deviation (Eq. [9] and [10]), yet includes
independently from the assumption of the linear rela-
the correlation coefficient as a constituent.
tionship (Draper and Smith, 1981, p. 33–38). The error
variance is then used to test the assumption. If the linear
DERIVATION OF THE METHOD
assumption (Eq. [2]) is rejected, the linear regression is
inadequate. A curvilinear relationship may be sought,
The difference between the simulation and the mea-
but it is possible that no continuous function fits the
surement is calculated with the MSD as
relationship between x and y. Note, however, that the
user’s concern lies more in the comparison between x
MSD ⫽
1
n
兺
n
i
⫽
1
(x
i
⫺ y
i
)
2
[13]
and y than in the functional relationship between the
two. The direct comparison between x and y can always
Mean squared deviation is the square of RMSD (Eq.
be made by testing the equality hypothesis (Eq. [7])
[11]); i.e., MSD ⫽ RMSD
2
. The lower the value of MSD,
against the nonequality hypothesis (Eq. [8]).
the closer the simulation is to the measurement. The
A more relevant criterion for the direct comparison
MSD can be partitioned into two components, namely
than regression is the deviation (d) of the model output
(x) from the measurement (y), namely
MSD ⫽ (x
¯
⫺ y
¯
)
2
⫹
1
n
兺
n
i
⫽
1
[(x
i
⫺ x
¯
) ⫺ (y
i
⫺ y
¯
)]
2
[14]
d ⫽ x ⫺ y [9]
When the comparison is made for n measurements, d
where x
¯
and y
¯
are the means of x
i
and y
i
(i ⫽ 1, 2...n),
can be computed for each measurement, namely
respectively. The first term of the right side of Eq. [14]
represents the bias of the simulation from the measure-
d
i
⫽ x
i
⫺ y
i
, for i ⫽ 1, 2 ... n [10]
ment and is denoted as SB, namely
where x
i
and y
i
are the simulated and measured values,
respectively, for the i-th data. The n deviations can be SB ⫽ (x
¯
⫺ y
¯
)
2
[15]
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
347
Squared bias is the square of the MD (Eq. [12]); i.e., and the other components may be more influential in
determining MSD than r. For example, when the SB isSB ⫽ MD
2
.
The second term of the right side of Eq. [14] is the the major component of MSD, maximizing r does not
much improve the model accuracy.difference between the simulation and the measurement
with respect to the deviation from the means (i.e., x
i
⫺x
¯
In Eq. [24], it should be noted that SDSD (Eq. [21])
and y
i
⫺ y
¯
) and is denoted as mean squared variation
and LCS (Eq. [22]) are not entirely independent. They
(MSV), namely
share the same constituents, SD
s
(Eq. [17]) and SD
m
(Eq. [18]). Hence a bigger SD
s
would increase both
SDSD and LCS, if SD
s
⬎ SD
m
. Through some re-
MSV ⫽
1
n
兺
n
i
⫽
1
[(x
i
⫺ x
¯
) ⫺ (y
i
⫺ y
¯
)]
2
[16]
arrangement, the relative sizes of SDSD and LCS can
be evaluated (see Appendix A for details). Across a
A bigger MSV indicates that the model failed to simu-
major portion of the possible range of r and the ratio
late the variability of the measurement around the
(␣)ofSD
s
to SD
m
, LCS contributes more to MSD than
mean. Note that these two components, SB and MSV,
SDSD does, although there are some combinations of
are orthogonal and can be addressed separately.
r and ␣ that make SDSD greater than LCS (see Appen-
Mean squared variation can be further partitioned
dix A).
into two components as shown below. For the parti-
The above components of MSD can be calculated
tioning, standard deviation of the simulation is denoted
from the coefficients of regression. As in Eq. [2], a is
as SD
s
, that of the measurement is denoted as SD
m
,
the slope of the regression line and b is the y-intercept.
and correlation coefficient between the simulation and
Then, the components of MSD are given as
measurement is denoted as r, namely
SB ⫽
关
(1 ⫺ 1/a)x
¯
⫹ (b/a)
兴
2
[25]
SD
s
⫽
冪
1
n
兺
n
i
⫽
1
(x
i
⫺x
¯
)
2
[17]
SDSD ⫽
关
1 ⫺ (r/a)
兴
2
SD
2
m
[26]
LCS ⫽ 2(r/a)(1 ⫺ r)SD
2
m
[27]
SD
m
⫽
冪
1
n
兺
n
i
⫽
1
(y
i
⫺y
¯
)
2
[18]
In the special case when a ⫽ 1 and b ⫽ 0,
SB ⫽ 0 [28]
r⫽
冤
1
n
兺
n
i
⫽
1
(x
i
⫺ x
¯
)(y
i
⫺ y
¯
)
冥
/(SD
m
SD
s
) [19]
SDSD ⫽ (1 ⫺ r)
2
SD
2
m
[29]
After some rearrangement, MSV in Eq. [16] can be
LCS ⫽ 2r(1 ⫺ r)SD
2
m
[30]
rewritten as
MSD ⫽ (1 ⫺ r
2
)SD
2
m
[31]
MSV ⫽ (SD
s
⫺ SD
m
)
2
⫹ 2SD
s
SD
m
(1 ⫺ r) [20]
Thus, comparison based on correlation becomes equiva-
The first term of the right side of Eq. [20], called SDSD
lent to the comparison based on MSD in this special
here, is the difference in the magnitude of fluctuation
case, as long as the comparison is made within the same
between the simulation and measurement, namely
data set (i.e., fixed SD
m
).
SDSD ⫽ (SD
s
⫺ SD
m
)
2
[21]
Examples
A larger SDSD indicates that the model failed to simu-
late the magnitude of fluctuation among the n measure-
The MSD-based approach presented above is here
ments. The second term of the right side of Eq. [20] is
applied to published results of comparison between
essentially the lack of positive correlation weighted by
model simulation and measurement. Note that our in-
the standard deviations, and is denoted here as LCS,
tention is not to reanalyze the results against the original
namely
work, but to show examples of this approach compared
with the correlation–regression approach.
LCS ⫽ 2SD
s
SD
m
(1 ⫺ r) [22]
A bigger LCS means that the model failed to simulate
Example 1
the pattern of the fluctuation across the n mea-
surements.
The results of Kiniry et al. (1997), as mentioned be-
With all the above terms combined, the MSV and
fore, are analyzed below for the comparison of simu-
MSD can be written as
lated and measured maize yields across 10 yr within
each of the nine USA locations. The MSD and its com-
MSV ⫽ SDSD ⫹ LCS [23]
ponents were calculated from r, x
¯
, y
¯
, and the coefficient
MSD ⫽ SB ⫹ SDSD ⫹ LCS [24]
of variation (CV) published in their paper. Figure 1
illustrates the correlation-based comparison among the
where SB, SDSD, and LCS are given in Eq. [15], [21],
nine locations, and Fig. 2 is the MSD-based comparison.
and [22], respectively. Equation [22] indicates the role
For ease of comparison between the two approaches,
of the correlation coefficient, r, in LCS and hence in
Fig. 1 shows the lack of fit of the regression (1 ⫺ r
2
)
MSD. With all other components fixed, a larger r would
on the main vertical axis on the left, and the correlation
reduce MSD and hence increase the model accuracy.
However, r is only one of the components of MSD, coefficient (r) on the auxiliary vertical axis on the right.
348
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
Fig. 1. Comparison of the correlation between the simulated and mea-
sured maize yields at nine locations in different states of the USA.
The left vertical axis is the lack of fit of the regression (1 ⫺ r
2
),
Fig. 2. Comparison of the mean squared deviation (MSD) and its
and the right vertical axis is the correlation coefficient (r ). Results
components, lack of correlation weighted by the standard devia-
from (A ) CERES-Maize model and (B ) ALMANAC model are
tions (LCS), squared difference between standard deviations
shown for locations in the states of Minnesota (MN), New York
(SDSD), and squared bias (SB), for the (A ) CERES-Maize model
(NY), Iowa (IA), Illinois (IL), Nebraska (NE), Missouri (MO),
and (B ) ALMANAC model for nine USA locations. Locations
Kansas (KS), Louisiana (LA), and Texas (TX). Data are from
are in the states of Minnesota (MN), New York (NY), Iowa (IA),
Kiniry et al. (1997).
Illinois (IL), Nebraska (NE), Missouri (MO), Kansas (KS), Louisi-
ana (LA), and Texas (TX). Data are from Kiniry et al. (1997).
In the correlation-based comparison (Fig. 1), the loca-
tion in the state of New York (NY) shows the biggest
7 of Kiniry et al., 1997). It therefore appears that the
difference between the simulation and measurement for
models fit the measurement better for this location com-
both models. The lack of fit of the regression (1 ⫺ r
2
)
pared with NY. Nevertheless, MSD for KS is larger than
is greatest (i.e., correlation is lowest) for this location.
that for NY, in particular with ALMANAC (Fig. 2).
Slope of the regression lines is significantly ⬍1 for both
This is because both SD
m
(1.86) and SD
s
(1.68 in
models: a ⫽ 0.22 for CERES-Maize and a ⫽ 0.09 for
CERES-Maize and 2.34 in ALMANAC) are larger for
ALMANAC (Tables 6 and 7 of Kiniry et al., 1997).
KS than for NY; hence LCS and MSD are also larger,
These statistics show clear differences between the
with MSD components SB and SDSD being almost neg-
model outputs and measurements; hence, causes of the
ligible. Thus, despite the relatively high correlation be-
difference were further investigated (Kiniry et al., 1997).
tween the simulation and measurement, MSD is larger
By contrast, on the basis of MSD, NY is among those
for this location than for NY. The above contrast be-
with smaller MSD (Fig. 2) than other locations. This is
tween KS and NY indicates that the overall deviation
because SD
m
is smallest (0.58) for NY among the nine
is not only dependent on the correlation, but also on
locations, and SD
s
is also small (0.59 in CERES-Maize
the variability of the measurement and simulation.
and 0.93 in ALMANAC). With these small SD
m
and
In most locations, LCS is the major component of
SD
s
values, LCS (Eq. [22]) is small and so is MSD (Fig.
MSD (Fig. 2), but there are exceptions. The location in
2). The other MSD components, SB (Eq. [15]) and
Nebraska (NE) shows clear distinction between the two
SDSD (Eq. [21]), are negligible (Fig. 2). In short, the
models. For CERES-Maize, MSD is smallest for NE
model–measurement deviation for this location is small,
among the nine locations, but second largest for ALMA-
because both measured and simulated yields show only
NAC (Fig. 2). This is partly because of the larger SD
s
small variability across the 10 yr, and the model simu-
(2.27) in ALMANAC than SD
m
(1.14), and hence the
lated the measured yields with little bias. The low corre-
large SDSD (Eq. [21]). This indicates that, for NE, AL-
lation may be a result only of the small year-to-year
MANAC is overly sensitive to the environmental fluctu-
variability rather than a deviation between the model
ations responsible for the year-to-year variability of the
outputs and measurements.
maize yield.
The location in the state of Kansas (KS) is in contrast
Thus, the MSD-based comparison enables the user
with NY. The lack of regression fit is smaller (i.e., the
to locate the simulation vs. measurement contrasts that
correlation coefficient is much larger) (0.825 in CERES-
have larger deviations than others, and to further ana-
Maize and 0.714 in ALMANAC), for KS than for NY
lyze causes of the large deviations. By contrast, the
(Fig. 1). The slope of the regression line is not signifi-
correlation-based approach tends to focus on the low
cantly different from 1, and the y-intercept is not signifi-
cantly different from 0 for either model (Tables 6 and correlation and the deviation of the regression line from
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
349
the equality line rather than the deviation of the model
outputs from the measurement. Kiniry et al. (1997) cal-
culated RMSD (Eq. [11]) and MD (Eq. [12]) in addition
to the correlation and regression coefficients. Although
they used those deviation-based statistics only to say
that the models simulated the measurements reasonably
well, they could have used RMSD to evaluate the devia-
tion of the simulated values from the measurements.
Example 2
Jamieson et al. (1998) compared outputs from five
different models of wheat growth with measurements
under different irrigation regimes in a field experiment
in New Zealand. Their published values of aboveground
biomass and grain weights at the end of the season are
used in the comparison between the simulation and
measurement below. Possibly because of the rounding
Fig. 3. Mean squared deviation (MSD) and its components, lack of
error in the published values, the correlation coefficient
correlation weighted by the standard deviations (LCS), squared
difference between standard deviations (SDSD), and squared bias
r and other statistics calculated here might be somewhat
(SB), in a comparison of five wheat models in simulating biomass
different from those published (Jamieson et al., 1998).
dry weight under different irrigation regimes in a field experiment
Note, however, that our purpose in using these data is
in New Zealand. Data are from Jamieson et al. (1998).
not to reanalyze the published results, but to give an
example of MSD-based analysis compared with regres-
al. (1998) compared MD (Eq. [12]) and RMSD to show
sion analysis of the simulation vs. measurement com-
that the models’ underestimation does account for the
parison.
major part of RMSD. This is in effect the same as the
Aboveground biomass dry weight has a large SB com-
comparison based on MSD and SB, since RMSD
2
⫽
ponent (Eq. [15]), which is the major component of
MSD and MD
2
⫽ SB as shown earlier.
MSD for all the models except Sirius (Fig. 3). This is
The results for grain yield (Fig. 4) are in sharp contrast
especially true for the model AFRCWHEAT2, which
to the aboveground biomass results. The comparison
shows the largest MSD among the models. The mean
on grain weight indicated that SDSD (Eq. [21]) and LCS
aboveground biomass estimated with this model is only
(Eq. [22]), rather than SB, are the major components of
14.2 t ha
⫺
1
; the mean measured value is 20.1 t ha
⫺
1
.
MSD. Among the models, SWHEAT shows the largest
For comparison with the above results from MSD
MSD, for which SDSD (Eq. [21]) is the dominant com-
analysis, the measured aboveground biomass weight was
ponent. This is because, in this model, SD
s
is only 0.80
regressed on the simulated weight using the REG proce-
tha
⫺
1
, which is ⬍ SD
m
⫽ 2.15 t ha
⫺
1
. It is obvious that
dure of the SAS/STAT software (SAS Inst., 1988). The
this model is not sensitive enough to the difference
results show that the AFRCWHEAT2 model had the
among the irrigation regimes in the field experiment.
highest correlation coefficient (0.92) among the models,
all of which are highly correlated (r ⬎ 0.88) with the
measurement. Nevertheless, the null hypothesis
(slope ⫽ 1 and intercept ⫽ 0) (Eq. [3]) of the regression
of the measurement on the simulation is rejected for
AFRCWHEAT2 (P ⫽ 0.0002), CERES-Wheat (P ⫽
0.007), SUCROS2 (P ⫽ 0.003), and SWHEAT (P ⫽
0.019). The null hypothesis is maintained only for the
Sirius model (P ⫽ 0.484). These results of regression
analysis indicated that all the models except Sirius have
difficulty simulating the measurements. The cause of
this discrepancy becomes clearer by testing the slope
and the intercept separately. For the AFRCWHEAT2
model, the intercept is ⬎0(P ⫽ 0.001) and the slope is
⬍1(P ⫽ 0.024). The intercept is also ⬎0(P ⫽ 0.015)
with the SUCROS2 model. The intercept is also some-
what ⬎0(P ⫽ 0.075) with the CERES-Wheat model.
The ⬎0 intercepts suggest that these models underesti-
Fig. 4. Mean squared deviation (MSD) and its components, lack of
mated the measurement.
correlation weighted by the standard deviations (LCS), squared
The regression results are thus consistent with the
difference between standard deviations (SDSD), and squared bias
results of the MSD-based analysis, but it is not clear
(SB), in a comparison of five wheat models in simulating grain
from the regression how much of this underestimation
yield under different irrigation regimes in a field experiment in
New Zealand. Data are from Jamieson et al. (1998).
contributed to the large RMSD (Eq. [11]). Jamieson et
350
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
By comparison, in the correlation–regression ap-
For SUCROS2, which shows the second largest MSD,
proach, multiple criteria—correlation coefficient, the
by comparison, SB and LCS, rather than SDSD, are
slope and y-intercept of the regression line, and, often,
responsible for the large MSD.
RMSD (Eq. [11]) and MD (Eq. [12])—are presented
The above analysis, based on MSD and its compo-
simultaneously (e.g., Retta et al., 1996; Kiniry et al.,
nents, gives essentially the same results as the Jamieson
1997). Analysis based on these statistics of deviation
et al. (1998) analysis, which was based on RMSD and
along with regression analysis may give results similar
comparison of the range of the simulated and measured
to those obtained by the MSD-based analysis, if the
grain yields. This makes sense since both the range and
interpretation is made carefully. It is not easy, however,
the standard deviation are measures of dispersion of
to use these multiple criteria in combination, because
data, and, hence, the comparison of the ranges should
they are not explicitly related to each other.
give results similar to those based on the comparison
Thus, for direct comparison between model output
of the standard deviations (SDSD). Note, however, that
and measurement, the MSD-based analysis is better
SDSD and MSD are explicitly related to each other in
suited than the commonly practiced correlation–
Eq. [24], whereas the range and RMSD are not.
regression analysis. In some cases, however, the user’s
The measured grain yield was regressed on simulated
concern may not lie in the direct comparison but in
grain yield (Eq. [2]) using the REG procedure of the
a different aspect of the simulation vs. measurement
SAS/STAT system (SAS Inst., 1988). The results
contrast. When variability of the simulation around the
showed that all the models were highly correlated with
mean is of more concern than deviation (Eq. [9]), then
the measurement r ⬎ 0.92, but that the regression lines
MSV (Eq. [16]) should be the criterion of the compari-
for some models deviated from the equality line (y ⫽
son. Since MSV is the sum of SDSD and LCS, differ-
x). The null hypothesis (slope ⫽ 1 and intercept ⫽ 0)
ences in MSV can be analyzed with respect to the two
(Eq. [3]) was rejected for SWHEAT (P ⫽ 0.020) and
components. Squared difference between standard devi-
SUCROS2 (P ⫽ 0.050). Sirius was close to the rejection
ations and LCS can be further analyzed with their con-
(P ⫽ 0.064). Investigating the slope (a) and the intercept
stituents, SD
s
,SD
m
and r. Or, if the pattern of the fluctua-
(b) (Eq. [2]) separately, it is found that a ⬆ 1 with
tion is the major concern, r is the primary measure of the
SWHEAT (P ⫽ 0.008) and that b ⬆ 0 with SWHEAT
comparison between the simulation and measurement.
(P ⫽ 0.008) and SUCROS2 (P ⫽ 0.029). For SWHEAT,
Thus, MSD or a part of it can be used for comparison
the slope (2.56) being ⬎1 suggests that the model is less
of model output and measurement depending on the
sensitive to the variability among the irrigation regimes
user’s major concern.
than the measurement. This is consistent with the results
We have not addressed the statistical test of the equal-
of the MSD-based comparison. Although the intercept
ity hypothesis (Eq. [7]) against the nonequality hypothe-
(⫺10.73) for SWHEAT was significantly ⬍0, this should
sis (Eq. [8]). This can be done, if one has an estimate
not be misinterpreted as indicating overestimation of
of the error variance from repeated measurements (see
the model. This negative intercept is only a result of
Appendix B). The error variance could also be esti-
the slope being much larger than unity. With the MSD-
mated with the protocol based on resampling, as pro-
based comparison, no such misinterpretation is likely,
posed by Wallach and Goffinet (1989) and applied to
since each component was distinct from others in its
simulation vs. measurement comparison by Colson et
meaning.
al. (1995).
In this example, unlike in the previous example, the
However, it must be noted that, on most occasions,
simulation vs. measurement comparisons were made
we know the null hypothesis is wrong, no matter what
within the same data set. The results are therefore simi-
the significance test indicates. The model does deviate
lar whether the comparison is based on MSD or correla-
from the reality because of the simplifications inherent
tion and regression. However, the interpretation of the
in any simulation model. Such simplifications or omis-
results is more straightforward in the MSD-based com-
sion of details are inevitable or even necessary in the
parison than in the correlation–regression analysis.
modeling of a complex real system. The deviation of
the model from the reality would result in the difference
DISCUSSION
between the simulated and the true values. If this differ-
ence is smaller than the measurement error, the null
As shown in the examples, the simulation vs. measure-
hypothesis (Eq. [7]) is maintained. However, theoreti-
ment comparison based on MSD is straightforward, with
cally, we could reduce the measurement error by in-
MSD indicating the overall deviation of the model out-
creasing the number of replications, and could eventu-
put from the measurement, and the MSD components
ally detect the difference between the simulated and
representing different aspects of the overall deviation.
the true values, and reject the null hypothesis. There-
The components are simply additive (Eq. [24]); hence
fore, the relevant question is not whether the model is
the user can identify the major component and investi-
right or wrong, but how much the model output differs
gate it further with its constituents. Standard deviations
from the measurement and why. The model’s perfor-
of the simulation and the measurement are useful for
mance can be discussed only relatively, not absolutely
investigating SDSD and LCS. Correlation coefficient
(Oreskes et al., 1994). The MSD-based analysis we pres-
(r) is also important when LCS is the major contributor
ent here would be useful to quantify the deviation of
to MSD. For a large SB, the user would compare the
model output from measurement, and to locate possible
mean of the simulated values x
¯
with that of the measured
values y
¯
. cause(s) of the deviation.
KOBAYASHI & SALAM: COMPARING SIMULATION AND MEASUREMENT USING MEAN SQUARED DEVIATION
351
ACKNOWLEDGMENTS
The authors gratefully acknowledge Dr. T. Miwa of Na-
tional Institute of Agro-Environmental Sciences, Tsukuba,
Japan, and Dr. L. Thalib of Griffith University, Nathan, Aus-
tralia, for their comments on the first draft of this paper. The
two anonymous reviewers are also gratefully acknowledged
for their comments, which helped the authors improve the
manuscript.
APPENDIX A
By denoting the ratio of SD
s
to SD
m
as ␣, namely
a ⫽ SD
s
/SD
m
[A-1]
MSV (Eq. [20] and [23]) can be rewritten as
MSV ⫽ SDSD ⫹ LCS [23]
⫽ (SD
s
⫺ SD
m
)
2
⫹ 2SD
s
SD
m
(1 ⫺ r) [20]
Fig. A-1. Ratio of squared difference between standard deviations
⫽ SD
2
m
a[(1 ⫺ a)
2
/a ⫹ 2(1 ⫺ r)] [A-2]
(SDSD) to lack of correlation weighted by the standard deviations
(LCS) as a function of standard deviation of the simulation (SD
s
)/
The ratio of SDSD to LCS is equal to the ratio of the
standard deviation of the measurement (SD
m
) and correlation coef-
two terms in the brackets of the right side of Eq. [A-
ficient (r ). ␣ Indicates SD
s
/SD
m
.
2], namely
兺
n
i
⫽
1
兺
m
j
⫽
1
(z
ij
⫺ y
i
)
2
/
2
⫽ SSE/
2
苲
2
[n(m ⫺ 1)] [B-5]
SDSD/LCS ⫽ [(1 ⫺ a)
2
/a]/[2(1 ⫺ r)] [A-3]
Figure A-1 depicts the ratio SDSD/LCS on the ␣⫺r
coordinate in the range
On the other hand, sum of the squared deviation of
y
i
from the true mean,
i
, divided by its variance,
2
/m
⫺0.2 ⬍ r ⬍ 1 and 0.2 ⬍ SD
s
/SD
m
⬍ 3.2
(Eq. [B-3]), also has a chi-squared distribution of n
which should cover practically the whole domain of the
degrees of freedom, namely
␣⫺r combination. Note that change of the ratio is
quite nonuniform, but that the ratio is ⬍1 (i.e., SDSD
兺
n
i
⫽
1
(
i
⫺ y
i
)
2
/(
2
/m) 苲
2
(n) [B-6]
⬍ LCS) for a major portion of the domain. Exceptions
are the region with high correlation (e.g., r ⬎ 0.9) and
Under the equality hypothesis (Eq. [7]) (i.e., H
0
:
i
⫽
the region with ␣⬍0.3 or ␣⬎2.8 (Fig. A-1).
x
i
), Eq. [B-6] can be written as
兺
n
i
⫽
1
(x
i
⫺ y
i
)
2
/(
2
/m) 苲
2
(n) [B-7]
APPENDIX B
It is assumed that each of the n measurements (y
i
)is
The left side terms of both Eq. [B-5] and [B-7] have
the mean of m replicated observations, and that each
chi-square distributions and are independent from each
observation (z
ij
) in i-th measurement is a sample from
other. The ratio of the two terms divided by their de-
a normal distribution with mean
i
and variance
2
,
grees of freedom is distributed as an F distribution with
namely
n and n(m ⫺ 1) degrees of freedom (Mood et al., 1974,
p. 246–249), namely
z
ij
苲 N(
i
,
2
) [B-1]
y
i
⫽
兺
m
j
⫽
1
z
ij
/m i ⫽ 1, 2 ... n [B-2]
兺
n
i
⫽
1
(x
i
⫺ y
i
)
2
/(n
2
/m)
SSE/[n
2
(m ⫺ 1)]
苲 F[n, n(m ⫺ 1)] [B-8]
Then y
i
has a normal distribution with mean
i
and
variance
2
/m, and the sum of squared deviation for
The n
2
in both numerator and denominator is omitted,
each measurement divided by the error variance has a
and the square sum of x
i
⫺ y
i
is replaced with n MSD
chi-squared distribution with m ⫺ 1 degrees of freedom
(Eq. [13]) to yield
(Mood et al., 1974, p. 240–246), namely
MSD
SSE/[nm(m ⫺ 1)]
苲 F[n, n(m ⫺ 1)] [B-9]
y
i
苲 N(
i
,
2
/m), [B-3]
and
This value is compared to the critical values {e.g.,
F [0.05, n, n(m ⫺ 1)]} to test the null hypothesis.
兺
m
j
⫽
1
(z
ij
⫺ y
i
)
2
/
2
苲
2
(m ⫺ 1) [B-4]
The sum of squared error (SSE) across the n measure-
REFERENCES
ments divided by the error variance also has a chi-
Addiscott, T.M., and A.P. Whitmore. 1987. Computer simulation of
squared distribution with n(m ⫺ 1) degrees of freedom,
changes in soil mineral nitrogen and crop nitrogen during autumn,
winter and spring. J. Agric. Sci. (Cambridge) 109:141–157.
namely
352
AGRONOMY JOURNAL, VOL. 92, MARCH–APRIL 2000
Chapman, S.C., G.L. Hammer, and H. Meinke. 1993. A sunflower Mood, A.M., F.A. Graybill, and D.C. Boes. 1974. Introduction to the
simulation model: I. Model development. Agron. J. 85:725–735.
theory of statistics. 3rd ed. McGraw-Hill, New York.
Colson, J., D. Wallach, A. Bouniols, J.B. Denis, and J.W. Jones.
Oreskes, N., K. Shrader-Frechette, and K. Belitz. 1994. Verification,
1995. Mean squared error of yield prediction by SOYGRO. Agron.
validation, and confirmation of numerical models in the earth sci-
J. 87:397–402.
ences. Science (Washington, DC) 263:641–646.
Draper, N., and H. Smith. 1981. Applied regression analysis. 2nd ed.
Retta, A., R.L. Vanderlip, R.A. Higgins, and L.J. Moshier. 1996.
John Wiley & Sons, New York.
Application of SORKAM to simulate shattercane growth using
Jamieson, P.D., J.R. Porter, J. Goudriaan, J.T. Ritchie, H. van Keulen,
forage sorghum. Agron. J. 88:596–601.
and W. Stol. 1998. A comparison of the models AFRCWHEAT2,
SAS Institute. 1988. SAS/STAT user’s guide. 6.03 ed. SAS Inst.
CERES-Wheat, Sirius, SUCROS2 and SWHEAT with measure-
Cary, NC.
ments from wheat grown under drought. Field Crops Res. 55:23–44.
Teo, Y.H., C.A. Beyrouty, and E.E. Gbur. 1992. Evaluating a model
Kiniry, J.R., J.R. Williams, R.L. Vanderlip, J.D. Atwood, D.C. Rei-
for predicting nutrient uptake by rice during vegetative growth.
cosky, J. Mulliken, W.J. Cox, H.J. Mascagni, S.E. Hollinger, and
Agron. J. 84:1064–1070.
W.J. Wiebold. 1997. Evaluation of two models for nine U.S. loca-
Wallach, D., and B. Goffinet. 1989. Mean squared error of predication
tions. Agron. J. 89:421–426.
as a criterion for evaluating and comparing system models. Ecol.
Mayer, D.G., M.A. Stuart, and A.J. Swain. 1994. Regression of real-
Modell. 44:299–306.
world data on model output: An appropriate overall test of validity.
Agric. Syst. 45:93–104.

























