ArticlePDF Available

Root mean square error (RMSE) or mean absolute error (MAE)?

Authors:

Abstract

Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error and thus the MAE would be a better metric for that purpose. Their paper has been widely cited and may have influenced many researchers in choosing MAE when presenting their model evaluation statistics. However, we contend that the proposed avoidance of RMSE and the use of MAE is not the solution to the problem. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric.
Geosci. Model Dev., 7, 1247–1250, 2014
www.geosci-model-dev.net/7/1247/2014/
doi:10.5194/gmd-7-1247-2014
© Author(s) 2014. CC Attribution 3.0 License.
Root mean square error (RMSE) or mean absolute error (MAE)? –
Arguments against avoiding RMSE in the literature
T. Chai1,2 and R. R. Draxler1
1NOAA Air Resources Laboratory (ARL), NOAA Center for Weather and Climate Prediction,
5830 University Research Court, College Park, MD 20740, USA
2Cooperative Institute for Climate and Satellites, University of Maryland, College Park, MD 20740, USA
Correspondence to: T. Chai (tianfeng.chai@noaa.gov)
Received: 10 February 2014 – Published in Geosci. Model Dev. Discuss.: 28 February 2014
Revised: 27 May 2014 – Accepted: 2 June 2014 – Published: 30 June 2014
Abstract. Both the root mean square error (RMSE) and the
mean absolute error (MAE) are regularly employed in model
evaluation studies. Willmott and Matsuura (2005) have sug-
gested that the RMSE is not a good indicator of average
model performance and might be a misleading indicator of
average error,and thus the MAE would be a better metric for
that purpose. While some concerns over using RMSE raised
by Willmott and Matsuura (2005) and Willmott et al. (2009)
are valid, the proposed avoidance of RMSE in favor of MAE
is not the solution. Citing the aforementioned papers, many
researchers chose MAE over RMSE to present their model
evaluation statistics when presenting or adding the RMSE
measures could be more beneficial. In this technical note, we
demonstrate that the RMSE is not ambiguous in its mean-
ing, contrary to what was claimed by Willmott et al. (2009).
The RMSE is more appropriate to represent model perfor-
mance than the MAE when the error distribution is expected
to be Gaussian. In addition, we show that the RMSE satis-
fies the triangle inequality requirement for a distance metric,
whereas Willmott et al. (2009) indicated that the sums-of-
squares-based statistics do not satisfy this rule. In the end, we
discussed some circumstances where using the RMSE will be
more beneficial. However, we do not contend that the RMSE
is superior over the MAE. Instead, a combination of metrics,
including but certainly not limited to RMSEs and MAEs, are
often required to assess model performance.
1 Introduction
The root mean square error (RMSE) has been used as a stan-
dard statistical metric to measure model performance in me-
teorology, air quality, and climate research studies. The mean
absolute error (MAE) is another useful measure widely used
in model evaluations. While they have both been used to
assess model performance for many years, there is no con-
sensus on the most appropriate metric for model errors. In
the field of geosciences, many present the RMSE as a stan-
dard metric for model errors (e.g., McKeen et al., 2005;
Savage et al., 2013; Chai et al., 2013), while a few others
choose to avoid the RMSE and present only the MAE, cit-
ing the ambiguity of the RMSE claimed by Willmott and
Matsuura (2005) and Willmott et al. (2009) (e.g., Taylor
et al., 2013; Chatterjee et al., 2013; Jerez et al., 2013). While
the MAE gives the same weight to all errors, the RMSE pe-
nalizes variance as it gives errors with larger absolute values
more weight than errors with smaller absolute values. When
both metrics are calculated, the RMSE is by definition never
smaller than the MAE. For instance, Chai et al. (2009) pre-
sented both the mean errors (MAEs) and the rms errors (RM-
SEs) of model NO2column predictions compared to SCIA-
MACHY satellite observations. The ratio of RMSE to MAE
ranged from 1.63 to 2.29 (see Table 1 of Chai et al., 2009).
Using hypothetical sets of four errors, Willmott and
Matsuura (2005) demonstrated that while keeping the MAE
as a constant of 2.0, the RMSE varies from 2.0 to 4.0.
They concluded that the RMSE varies with the variability
of the the error magnitudes and the total-error or average-
error magnitude (MAE), and the sample size n. They further
Published by Copernicus Publications on behalf of the European Geosciences Union.
1248 T. Chai and R. R. Draxler: RMSE or MAE
demonstrated an inconsistency between MAEs and RMSEs
using 10 combinations of 5 pairs of global precipitation data.
They summarized that the RMSE tends to become increas-
ingly larger than the MAE (but not necessarily in a mono-
tonic fashion) as the distribution of error magnitudes be-
comes more variable. The RMSE tends to grow larger than
the MAE with n1
2since its lower limit is fixed at the MAE
and its upper limit (n1
2·MAE) increases with n1
2. Further,
Willmott et al. (2009) concluded that the sums-of-squares-
based error statistics such as the RMSE and the standard er-
ror have inherent ambiguities and recommended the use of
alternates such as the MAE.
As every statistical measure condenses a large number of
data into a single value, it only provides one projection of the
model errors emphasizing a certain aspect of the error char-
acteristics of the model performance. Willmott and Matsuura
(2005) have simply proved that the RMSE is not equivalent
to the MAE, and one cannot easily derive the MAE value
from the RMSE (and vice versa). Similarly, one can readily
show that, for several sets of errors with the same RMSE, the
MAE would vary from set to set.
Since statistics are just a collection of tools, researchers
must select the most appropriate tool for the question being
addressed. Because the RMSE and the MAE are defined dif-
ferently, we should expect the results to be different. Some-
times multiple metrics are required to provide a complete
picture of error distribution. When the error distribution is
expected to be Gaussian and there are enough samples, the
RMSE has an advantage over the MAE to illustrate the error
distribution.
The objective of this note is to clarify the interpretation of
the RMSE and the MAE. In addition, we demonstrate that
the RMSE satisfies the triangle inequality requirement for a
distance metric, whereas Willmott and Matsuura (2005) and
Willmott et al. (2009) have claimed otherwise.
2 Interpretation of RMSE and MAE
To simplify, we assume that we already have nsamples of
model errors calculated as (ei,i=1,2,...,n). The un-
certainties brought in by observation errors or the method
used to compare model and observations are not considered
here. We also assume the error sample set is unbiased. The
RMSE and the MAE are calculated for the data set as
MAE =1
n
n
X
i=1|ei|(1)
RMSE =v
u
u
t
1
n
n
X
i=1
e2
i.(2)
The underlying assumption when presenting the RMSE is
that the errors are unbiased and follow a normal distribution.
Table 1. RMSEs and MAEs of randomly generated pseudo-errors
with a zero mean and unit variance Gaussian distribution. Five sets
of errors of size nare generated with different random seeds.
nRMSEs MAEs
4 0.92, 0.65, 1.48, 1.02, 0.79 0.70, 0.57, 1.33, 1.16, 0.76
10 0.81, 1.10, 0.83, 0.95, 1.01 0.65, 0.89, 0.72, 0.84, 0.78
100 1.05, 1.03, 1.03, 1.00, 1.04 0.82, 0.81, 0.79, 0.78, 0.78
1000 1.04, 0.98, 1.01, 1.00, 1.00 0.82, 0.78, 0.80, 0.80, 0.81
10 000 1.00, 0.98, 1.01, 1.00, 1.00 0.79, 0.79, 0.79, 0.81, 0.80
100 000 1.00, 1.00, 1.00, 1.00, 1.00 0.80, 0.80, 0.80, 0.80, 0.80
1000 000 1.00, 1.00, 1.00, 1.00, 1.00 0.80, 0.80, 0.80, 0.80, 0.80
Thus, using the RMSE or the standard error (SE)1helps to
provide a complete picture of the error distribution.
Table 1 shows RMSEs and MAEs for randomly generated
pseudo-errors with zero mean and unit variance Gaussian
distribution. When the sample size reaches 100 or above, us-
ing the calculated RMSEs one can re-construct the error dis-
tribution close to its “truth” or “exact solution”, with its stan-
dard deviation within 5 % to its truth (i.e., SE =1). When
there are more samples, reconstructing the error distribution
using RMSEs will be even more reliable. The MAE here is
the mean of the half-normal distribution (i.e., the average of
the positive subset of a population of normally distributed
errors with zero mean). Table 1 shows that the MAEs con-
verge to 0.8, an approximation to the expectation of q2
π. It
should be noted that all statistics are less useful when there
are only a limited number of error samples. For instance, Ta-
ble 1 shows that neither the RMSEs nor the MAEs are robust
when only 4 or 10 samples are used to calculate those values.
In those cases, presenting the values of the errors themselves
(e.g., in tables) is probably more appropriate than calculat-
ing any of the statistics. Fortunately, there are often hundreds
of observations available to calculate model statistics, unlike
the examples with n=4 (Willmott and Matsuura, 2005) and
n=10 (Willmott et al., 2009).
Condensing a set of error values into a single number, ei-
ther the RMSE or the MAE, removes a lot of information.
The best statistics metrics should provide not only a perfor-
mance measure but also a representation of the error distribu-
tion. The MAE is suitable to describe uniformly distributed
errors. Because model errors are likely to have a normal dis-
tribution rather than a uniform distribution, the RMSE is a
better metric to present than the MAE for such a type of data.
1For unbiased error distributions, the standard error (SE) is
equivalent to the RMSE as the sample mean is assumed to be
zero. For an unknown error distribution, the SE of mean is the
square root of the “bias-corrected sample variance”. That is, SE =
s1
n1
n
P
i=1(ei)2, where =1
n
n
P
i=1ei.
Geosci. Model Dev., 7, 1247–1250, 2014 www.geosci-model-dev.net/7/1247/2014/
T. Chai and R. R. Draxler: RMSE or MAE 1249
3 Triangle inequality of a metric
Both Willmott and Matsuura (2005) and Willmott et al.
(2009) emphasized that sums-of-squares-based statistics do
not satisfy the triangle inequality. An example is given in
a footnote of Willmott et al. (2009). In the example, it is
given that d(a,c) =4, d (a, b) =2, and d(b, c) =3, where
d(x , y) is a distance function. The authors stated that d(x , y)
as a “metric” should satisfy the “triangle inequality” (i.e.,
d(a , c) d(a, b) +d (b, c)). However, they did not specify
what a,b, and crepresent here before arguing that the sum
of squared errors does not satisfy the “triangle inequality”
because 4 2+3, whereas 4222+32. In fact, this exam-
ple represents the mean square error (MSE), which cannot be
used as a distance metric, rather than the RMSE.
Following a certain order, the errors ei,i=1,...,n can
be written into a n-dimensional vector . The L1-norm and
L2-norm are closely related to the MAE and the RMSE, re-
spectively, as shown in Eqs. (3) and (4):
||1= n
X
i=1|ei|!=n·MAE (3)
||2=v
u
u
t n
X
i=1
e2
i!=n·RMSE.(4)
All vector norms satisfy |X+Y|≤|X|+|Y|and |X| =
|X|(see, e.g., Horn and Johnson, 1990). It is trivial to
prove that the distance between two vectors measured by
Lp-norm would satisfy |XY|p≤ |X|p+|Y|p. With three
n-dimensional vectors, X,Y, and Z, we have
|XY|p= |(XZ)(YZ)|p≤ |XZ|p+|YZ|p.(5)
For n-dimensional vectors and the L2-norm, Eq. (5) can
be written as
v
u
u
t
n
X
i=1
(xiyi)2v
u
u
t
n
X
i=1
(xizi)2+v
u
u
t
n
X
i=1
(yizi)2,(6)
which is equivalent to
v
u
u
t
1
n
n
X
i=1
(xiyi)2v
u
u
t
1
n
n
X
i=1
(xizi)2
+v
u
u
t
1
n
n
X
i=1
(yizi)2.(7)
This proves that RMSE satisfies the triangle inequality re-
quired for a distance function metric.
4 Summary and discussion
We present that the RMSE is not ambiguous in its meaning,
and it is more appropriate to use than the MAE when model
errors follow a normal distribution. In addition, we demon-
strate that the RMSE satisfies the triangle inequality required
for a distance function metric.
The sensitivity of the RMSE to outliers is the most com-
mon concern with the use of this metric. In fact, the exis-
tence of outliers and their probability of occurrence is well
described by the normal distribution underlying the use of the
RMSE. Table 1 shows that with enough samples (n100),
including those outliers, one can closely re-construct the er-
ror distribution. In practice, it might be justifiable to throw
out the outliers that are several orders larger than the other
samples when calculating the RMSE, especially if the num-
ber of samples is limited. If the model biases are severe, one
may also need to remove the systematic errors before calcu-
lating the RMSEs.
One distinct advantage of RMSEs over MAEs is that RM-
SEs avoid the use of absolute value, which is highly unde-
sirable in many mathematical calculations. For instance, it
might be difficult to calculate the gradient or sensitivity of
the MAEs with respect to certain model parameters. Further-
more, in the data assimilation field, the sum of squared er-
rors is often defined as the cost function to be minimized by
adjusting model parameters. In such applications, penalizing
large errors through the defined least-square terms proves to
be very effective in improving model performance. Under the
circumstances of calculating model error sensitivities or data
assimilation applications, MAEs are definitely not preferred
over RMSEs.
An important aspect of the error metrics used for model
evaluations is their capability to discriminate among model
results. The more discriminating measure that produces
higher variations in its model performance metric among dif-
ferent sets of model results is often the more desirable. In this
regard, the MAE might be affected by a large amount of aver-
age error values without adequately reflecting some large er-
rors. Giving higher weighting to the unfavorable conditions,
the RMSE usually is better at revealing model performance
differences.
In many of the model sensitivity studies that use only
RMSE, a detailed interpretation is not critical because varia-
tions of the same model will have similar error distributions.
When evaluating different models using a single metric, dif-
ferences in the error distributions become more important.
As we stated in the note, the underlying assumption when
presenting the RMSE is that the errors are unbiased and fol-
low a normal distribution. For other kinds of distributions,
more statistical moments of model errors, such as mean, vari-
ance, skewness, and flatness, are needed to provide a com-
plete picture of the model error variation. Some approaches
that emphasize resistance to outliers or insensitivity to non-
normal distributions have been explored by other researchers
(Tukey, 1977; Huber and Ronchetti, 2009).
As stated earlier, any single metric provides only one pro-
jection of the model errors and, therefore, only emphasizes
a certain aspect of the error characteristics. A combination
www.geosci-model-dev.net/7/1247/2014/ Geosci. Model Dev., 7, 1247–1250, 2014
1250 T. Chai and R. R. Draxler: RMSE or MAE
of metrics, including but certainly not limited to RMSEs and
MAEs, are often required to assess model performance.
Acknowledgements. This study was supported by NOAA grant
NA09NES4400006 (Cooperative Institute for Climate and
Satellites – CICS) at the NOAA Air Resources Laboratory in
collaboration with the University of Maryland.
Edited by: R. Sander
References
Chai, T., Carmichael, G. R., Tang, Y., Sandu, A., Heckel, A.,
Richter, A., and Burrows, J. P.: Regional NOxemission inversion
through a four-dimensional variational approach using SCIA-
MACHY tropospheric NO2column observations, Atmos. Env-
iron., 43, 5046–5055, 2009.
Chai, T., Kim, H.-C., Lee, P., Tong, D., Pan, L., Tang, Y., Huang,
J., McQueen, J., Tsidulko, M., and Stajner, I.: Evaluation of the
United States National Air Quality Forecast Capability experi-
mental real-time predictions in 2010 using Air Quality System
ozone and NO2measurements, Geosci. Model Dev., 6, 1831–
1850, doi:10.5194/gmd-6-1831-2013, 2013.
Chatterjee, A., Engelen, R. J., Kawa, S. R., Sweeney, C., and Micha-
lak, A. M.: Background error covariance estimation for atmo-
spheric CO2data assimilation, J. Geophys. Res., 118, 10140–
10154, 2013.
Horn, R. A. and Johnson, C. R.: Matrix Analysis, Cambridge Uni-
versity Press, 1990.
Huber, P. and Ronchetti, E.: Robust statistics, Wiley New York,
2009.
Jerez, S., Pedro Montavez, J., Jimenez-Guerrero, P., Jose Gomez-
Navarro, J., Lorente-Plazas, R., and Zorita, E.: A multi-physics
ensemble of present-day climate regional simulations over the
Iberian Peninsula, Clim. Dynam., 40, 3023–3046, 2013.
McKeen, S. A., Wilczak, J., Grell, G., Djalalova, I., Peck-
ham, S., Hsie, E., Gong, W., Bouchet, V., Menard, S., Mof-
fet, R., McHenry, J., McQueen, J., Tang, Y., Carmichael, G. R.,
Pagowski, M., Chan, A., Dye, T., Frost, G., Lee, P., and
Mathur, R.: Assessment of an ensemble of seven real-
time ozone forecasts over eastern North America during
the summer of 2004, J. Geophys. Res., 110, D21307,
doi:10.1029/2005JD005858, 2005.
Savage, N. H., Agnew, P., Davis, L. S., Ordóñez, C., Thorpe, R.,
Johnson, C. E., O’Connor, F. M., and Dalvi, M.: Air quality mod-
elling using the Met Office Unified Model (AQUM OS24-26):
model description and initial evaluation, Geosci. Model Dev., 6,
353–372, doi:10.5194/gmd-6-353-2013, 2013.
Taylor, M. H., Losch, M., Wenzel, M., and Schroeter, J.: On the
sensitivity of field reconstruction and prediction using empirical
orthogonal functions derived from gappy data, J. Climate, 26,
9194–9205, 2013.
Tukey, J. W.: Exploratory Data Analysis, Addison-Wesley, 1977.
Willmott, C. and Matsuura, K.: Advantages of the Mean Abso-
lute Error (MAE) over the Root Mean Square Error (RMSE)
in assessing average model performance, Clim. Res., 30, 79–82,
2005.
Willmott, C. J., Matsuura, K., and Robeson, S. M.: Ambiguities
inherent in sums-of-squares-based error statistics, Atmos. Env-
iron., 43, 749–752, 2009.
Geosci. Model Dev., 7, 1247–1250, 2014 www.geosci-model-dev.net/7/1247/2014/
... where y i , x i , and n are predicted values by the population member, the true (output) dataset value, and the number of dataset samples [18]. To calculate the MAE, first, the population member must be evaluated, i.e., the values of the input variables must be provided to calculate the output. ...
... In this paper, to evaluate the SEs, three performance metrics were used, i.e., the coefficient of determination (R 2 ) [25], the mean absolute error MAE [18] (Equation (2)), and the root-mean-squared error (RMSE) [26]. The R 2 is in the 0 to 1 range, where 0 represents the worst-possible value, while the value of 1 represents the best-possible value and is aimed for. ...
Article
Full-text available
The Super Cryogenic Dark Matter Search (SuperCDMS) experiment is used to search for Weakly Interacting Massive Particles (WIMPs)-candidates for dark matter particles. In this experiment, the WIMPs interact with nuclei in the detector; however, there are many other interactions (background interactions). To separate background interactions from the signal, it is necessary to measure the interaction energy and to reconstruct the location of the interaction between WIMPs and the nuclei. In recent years, some research papers have been investigating the reconstruction of interaction locations using artificial intelligence (AI) methods. In this paper, a genetic programming-symbolic regression (GPSR), with randomly tuned hyperparameters cross-validated via a five-fold procedure, was applied to the SuperCDMS experiment to estimate the interaction locations with high accuracy. To measure the estimation accuracy of obtaining the SEs, the mean and standard deviation (σ) values of R 2 , the root-mean-squared error (RMSE), and finally, the mean absolute error (MAE) were used. The investigation showed that using GPSR, SEs can be obtained that estimatethe interaction locations with high accuracy. To improve the solution, the five best SEs were combined from the three best cases. The results demonstrated that a very high estimation accuracy can be achieved with the proposed methodology.
... CNN-LSTM model architecture with filter and stride size of 4 distance metrics. For example, while the RMSE is sensitive to extreme values (outliers) [61], the MAE metric is a good choice where the error distribution is not Gaussian [62]. On the contrary, the use of absolute values in MAE calculation is considered to be a disadvantage against RMSE, as in many mathematical calculations the absolute value is not favored [62]. ...
... For example, while the RMSE is sensitive to extreme values (outliers) [61], the MAE metric is a good choice where the error distribution is not Gaussian [62]. On the contrary, the use of absolute values in MAE calculation is considered to be a disadvantage against RMSE, as in many mathematical calculations the absolute value is not favored [62]. In this study, the above-mentioned distance metrics are used to evaluate the performance of the models. ...
Article
Full-text available
This paper investigates the effect of the architectural design of deep learning models in combination with a feature engineering approach considering the temporal variation of the features in the case of tropospheric ozone forecasting. Although deep neural network models have shown successful results by extracting features automatically from raw data, their performance in the domain of air quality forecasting is influenced by different feature analysis approaches and model architectures. This paper proposes a simple but effective analysis of tropospheric ozone time series data that can reveal temporal phases of the ozone evolution process and assist neural network models to reflect these temporal variations. We demonstrate that addressing the ozone evolution phases when developing the model architecture improves the performance of deep neural network models. As a result, we evaluated our approach on the CNN model and showed that not only it improves the performance of the CNN model, but also the CNN model in combination with our approach boosts the performance of the other deep neural network models such as LSTM. Development of the CNN, LSTM-CNN, and CNN-LSTM models using the proposed approach improved the prediction performance of the models by 3.58%, 1.68%, and 3.37%, respectively.
... To test the prediction accuracy of the model, the mean absolute percentage error (MAPE), the mean absolute error (MAE), and the root mean square error (RMSE) are used to evaluate the prediction results [27]. It is calculated as: ...
Article
Full-text available
To plan the work of power generation equipment, it is necessary to ensure that the power supply is sufficient and to achieve the minimum cost to ensure the safety and economy of the microgrid. Based on back propagation neural network–local mean decomposition–long short-term memory (BPNN–LMD–LSTM) load prediction, the design is based on a fixed-time consistency algorithm with random delay to predict the economic dispatch of microgrids. Firstly, the initial power load prediction sequence is obtained by continuous training of the back propagation neural network (BPNN); the residual sequence with other influencing factors is decomposed by local mean decomposition (LMD); and the long short-term memory neural network (LSTM) is used to predict the output prediction residual sequence, and the final short-term power load prediction is obtained. Based on predicting load, the fixed-time consistency algorithm with random delay is used to add supply and demand balance constraints to optimize the power distribution of the power generation units of the distributed microgrid and reduce the power generation cost of the microgrid. The results show that the prediction model has better prediction accuracy, and the scheduling algorithm based on the prediction model has a faster convergence rate to reach the lowest power generation cost.
... The model with the highest coefficients of determination for calibration (R 2 C) and cross-validation (R 2 V) and the lowest root mean square errors for calibration (RMSEC) and cross-validation (RMSEV) was selected as the best-fit model for each mineral nutrient. The R 2 and RMSE values were calculated using Equations (3) and (4), respectively [72]: ...
Article
Full-text available
Tree crop yield is highly dependent on fertiliser inputs, which are often guided by the assessment of foliar nutrient levels. Traditional methods for nutrient analysis are time-consuming but hyperspectral imaging has potential for rapid nutrient assessment. Hyperspectral imaging has generally been performed using the adaxial surface of leaves although the predictive performance of spectral data has rarely been compared between adaxial and abaxial surfaces of tree leaves. We aimed to evaluate the capacity of laboratory-based hyperspectral imaging (400-1000 nm wavelengths) to predict the nutrient concentrations in macadamia leaves. We also aimed to compare the prediction accuracy from adaxial and abaxial leaf surfaces. We sampled leaves from 30 macadamia trees at 0, 6, 10 and 26 weeks after flowering and captured hyperspectral images of their adaxial and abaxial surfaces. Partial least squares regression (PLSR) models were developed to predict foliar nutrient concentrations. Coefficients of determination (R 2 P) and ratios of prediction to deviation (RPDs) were used to evaluate prediction accuracy. The models reliably predicted foliar nitrogen (N), phosphorus (P), potassium (K), calcium (Ca), copper (Cu), manganese (Mn), sulphur (S) and zinc (Zn) concentrations. The best-fit models generally predicted nutrient concentrations from spectral data of the adaxial surface (e.g., N: R 2 P = 0.55, RPD = 1.52; P: R 2 P = 0.77, RPD = 2.11; K: R 2 P = 0.77, RPD = 2.12; Ca: R 2 P = 0.75, RPD = 2.04). Hyperspectral imaging showed great potential for predicting nutrient status. Rapid nutrient assessment through hyperspectral imaging could aid growers to increase orchard productivity by managing fertiliser inputs in a more-timely fashion.
... It was required to measure the goodness of fit of the data driven models in the ecological impacts modeling. Hence, two known indices including the Nash-Sutcliffe efficiency index (NSE) and root means square error (RMSE) were selected in this regard (Chai and Draxler 2014). Equations 2 and 3 show these indices where O is observed data and M is modelled data. ...
Article
Full-text available
The present study proposes an applicable method to determine the population carrying capacity of urban areas in which ecological impacts of river ecosystem as the source of water supply and sustainable population growth are linked. A multiobejctive optimization method was developed in which two objectives were considered: 1) minimizing the fish population loss as the environmental index of the river ecosystem and 2) minimizing the difference between initial population carrying capacity and the sustainable population carrying capacity. The ecological impacts of the river ecosystem were assessed through the potential fish population as an environmental index using several artificial intelligence and regression models. Based on case study results, the initial plan of development is not reliable because ecological impacts on the river ecosystem are remarkable. The proposed method is able to reduce the ecological impacts. However, the sustainable population carrying capacity is considerably lower than the initial planned population. It is needed to reduce the planned population more than 45% in the case study. Habitat loss is less than 35% which means the optimization model is able to find an optimal solution for balancing environmental requirements and humans’ needs. In other words, the optimization model balances the needs of environment and water supply by reducing 45% of population and decreasing habitat loss to 35%.
... Therefore, the length ratios of the shear walls at both ends of a PSW graph edge are the output features of the GNN models, which have two dimensions. The L1 loss function, i.e., mean absolute error [62], as expressed in Eq. (3) is adopted. ...
Article
Structural scheme design of shear wall structures is important because it is the first stage that guides the project along its entire structural design process and significantly impacts the subsequent design stages. Design methods for shear wall layouts based on deep generative algorithms have been proposed and achieved some success. However, current generative algorithms rely on pixel images to design shear wall layouts, which have many model parameters and require intensive calculations. Moreover, it is challenging to use pixel image-based methods to reflect the topological characteristics of structures and connect them with the subsequent design stages. The above defects can be effectively solved by representing a shear wall structure in graph data form and adopting graph neural networks (GNNs), which have a robust topological-characteristic-extraction capability. However, there is no existing research using GNN methods in the design of shear wall structures owing to the lack of graph representation methods and high-quality structural graph data for shear walls. Therefore, this study develops an intelligent design method for shear wall layouts based on GNNs. Two graph representation methods for a shear wall structure—graph edge representation and graph node representation—are examined. A data augmentation method for shear wall structures in graph data form is established to enhance the universality of the GNN performance. An evaluation method for both graph representation methods is developed. Case studies show that the shear wall layout designed using the established GNN method is highly similar to the design by experienced engineers.
Article
Full-text available
Permasalahan dalam menentukan keputusan strategi pemasaran dan manajemen persediaan sering terjadi pada setiap perusahaan. Pada penyedia layanan kecantikan permasalahan ini mengakibatkan penimbunan persediaan bahan treatment. Hal ini disebabkan oleh tidak rasionalnya keputusan decision maker dalam manajemen persediaan, terlalu ambisi dalam memenuhi permintaan sebanyak-banyaknya, sedangkan gairah pelanggan sedang mengalami lesu kemudian bertolak belakang dengan tawaran penyedia layanan kecantikan. Efek penimbunan bahan treatment dapat mengancam kualitas bahan treatment kemudian berimbas pada kerugian perusahaan. Adapun hal yang dapat mencegah penimbunan terjadi adalah pengetahuan peramalan permintaan. Pengetahuan peramalan permintaan dapat memberikan informasi tentang estimasi permintaan pelanggan pada masa depan. Kemudian decision maker dapat membuat keputusan terbaik dalam manajemen persediaan secara efisien, sehingga aplikasi pendukung keputusan sangat dibutuhkan. Data permintaan merupakan time-series dengan pola horizontal, sehingga metode peramalan yang digunakan adalah Simple Moving Average dan Single Exponential Smoothing. Aplikasi pendukung keputusan ini berbasis web aplikasi dengan bahasa pemrograman PHP orientasi objek dan framework tambahan Laravel. Hasil penelitian yang telah dilakukan didapati bahwa metode Simple Moving Average dan Single Exponential Smoothing dengan nilai interval=2 dan nilai alpha=0.2 dapat menghasilkan nilai akurasi peramalan RMSE paling baik. Dan juga didapati bahwa metode Simple Moving Average lebih unggul pada peramalan permintaan BB Glow Platinum, Eyelash Single, dan Facial Acne.
Article
This study aimed to validate a 7-sensor inertial measurement unit system against optical motion capture to estimate bilateral lower-limb kinematics. Hip, knee, and ankle sagittal plane peak angles and range of motion (ROM) were compared during bodyweight squats and countermovement jumps in 18 participants. In the bodyweight squats, left peak hip flexion (intraclass correlation coefficient [ICC] = .51), knee extension (ICC = .68) and ankle plantar flexion (ICC = .55), and hip (ICC = .63) and knee (ICC = .52) ROM had moderate agreement, and right knee ROM had good agreement (ICC = .77). Relatively higher agreement was observed in the countermovement jumps compared to the bodyweight squats, moderate to good agreement in right peak knee flexion (ICC = .73), and right (ICC = .75) and left (ICC = .83) knee ROM. Moderate agreement was observed for right ankle plantar flexion (ICC = .63) and ROM (ICC = .51). Moderate agreement (ICC > .50) was observed in all variables in the left limb except hip extension, knee flexion, and dorsiflexion. In general, there was poor agreement for peak flexion angles, and at least moderate agreement for joint ROM. Future work will aim to optimize methodologies to increase usability and confidence in data interpretation by minimizing variance in system-based differences and may also benefit from expanding planes of movement.
Article
Full-text available
The on-line air quality model AQUM (Air Quality in the Unified Model) is a limited-area forecast configuration of the Met Office Unified Model which uses the UKCA (UK Chemistry and Aerosols) sub-model. AQUM has been developed with two aims: as an operational system to deliver regional air quality forecasts and as a modelling system to conduct air quality studies to inform policy decisions on emissions controls. This paper presents a description of the model and the methods used to evaluate the performance of the forecast system against the automated UK surface network of air quality monitors. Results are presented of evaluation studies conducted for a year-long period of operational forecast trials and several past cases of poor air quality episodes. The results demonstrate that AQUM tends to over-predict ozone (~8 μg m−3 mean bias for the year-long forecast), but has a good level of responsiveness to elevated ozone episode conditions – a characteristic which is essential for forecasting poor air quality episodes. AQUM is shown to have a negative bias for PM10, while for PM2.5 the negative bias is much smaller in magnitude. An analysis of speciated PM2.5 data during an episode of elevated particulate matter (PM) suggests that the PM bias occurs mainly in the coarse component. The sensitivity of model predictions to lateral boundary conditions (LBCs) has been assessed by using LBCs from two different global reanalyses and by comparing the standard, single-nested configuration with a configuration having an intermediate European nest. We conclude that, even with a much larger regional domain, the LBCs remain an important source of model error for relatively long-lived pollutants such as ozone. To place the model performance in context we compare AQUM ozone forecasts with those of another forecasting system, the MACC (Monitoring Atmospheric Composition and Climate) ensemble, for a 5-month period. An analysis of the variation of model skill with forecast lead time is presented and the insights this provides to the relative sources of error in air quality modelling are discussed.
Article
Full-text available
This work assesses the influence of the model physics in present-day regional climate simulations. It is based on a multi-phyiscs ensemble of 30-year long MM5 hindcasted simulations performed over a complex and climatically heterogeneous domain as the Iberian Peninsula. The ensemble consists of eight members that results from combining different parametrization schemes for modeling the Planetary Boundary Layer, the cumulus and the microphysics processes. The analysis is made at the seasonal time scale and focuses on mean values and interannual variability of temperature and precipitation. The objectives are (1) to evaluate and characterize differences among the simulations attributable to changes in the physical options of the regional model, and (2) to identify the most suitable parametrization schemes and understand the underlying mechanisms causing that some schemes perform better than others. The results confirm the paramount importance of the model physics, showing that the spread among the various simulations is of comparable magnitude to the spread obtained in similar multi-model ensembles. This suggests that most of the spread obtained in multi-model ensembles could be attributable to the different physical configurations employed in the various models. Second, we obtain that no single ensemble member outperforms the others in every situation. Nevertheless, some particular schemes display a better performance. On the one hand, the non-local MRF PBL scheme reduces the cold bias of the simulations throughout the year compared to the local Eta model. The reason is that the former simulates deeper mixing layers. On the other hand, the Grell parametrization scheme for cumulus produces smaller amount of precipitation in the summer season compared to the more complex Kain-Fritsch scheme by reducing the overestimation in the simulated frequency of the convective precipitation events. Consequently, the interannual variability of precipitation (temperature) diminishes (increases), which implies a better agreement with the observations in both cases. Although these features improve in general the accuracy of the simulations, controversial nuances are also highlighted.
Article
Full-text available
The National Air Quality Forecast Capability (NAQFC) project provides the US with operational and experimental real-time ozone predictions using two different versions of the three-dimensional Community Multi-scale Air Quality (CMAQ) modeling system. Routine evaluation using near-real-time AIRNow ozone measurements through 2011 showed better performance of the operational ozone predictions. In this work, quality-controlled and-assured Air Quality System (AQS) ozone and nitrogen dioxide (NO2) observations are used to evaluate the experimental predictions in 2010. It is found that both ozone and NO2 are overestimated over the contiguous US (CONUS), with annual biases of +5.6 and +5.1 ppbv, respectively. The annual root mean square errors (RMSEs) are 15.4 ppbv for ozone and 13.4 ppbv for NO2. For both species the overpredictions are most pronounced in the summer. The locations of the AQS monitoring sites are also utilized to stratify comparisons by the degree of urbanization. Comparisons for six predefined US regions show the highest annual biases for ozone predictions in Southeast (+10.5 ppbv) and for NO2 in the Lower Middle (+8.1 ppbv) and Pacific Coast (+7.1 ppbv) regions. The spatial distributions of the NO2 biases in August show distinctively high values in the Los Angeles, Houston, and New Orleans areas. In addition to the standard statistics metrics, daily maximum eight-hour ozone categorical statistics are calculated using the current US ambient air quality standard (75 ppbv) and another lower threshold (70 ppbv). Using the 75 ppbv standard, the hit rate and proportion of correct over CONUS for the entire year are 0.64 and 0.96, respectively. Summertime biases show distinctive weekly patterns for ozone and NO2. Diurnal comparisons show that ozone overestimation is most severe in the morning, from 07:00 to 10:00 local time. For NO2, the morning predictions agree with the AQS observations reasonably well, but nighttime concentrations are overpredicted by around 100%.
Article
Full-text available
In any data assimilation framework, the background error covariance statistics play the critical role of filtering the observed information and determining the quality of the analysis. For atmospheric CO2 data assimilation, however, the background errors cannot be prescribed via traditional forecast or ensemble-based techniques as these fail to account for the uncertainties in the carbon emissions and uptake, or for the errors associated with the CO2 transport model. We propose an approach where the differences between two modeled CO2 concentration fields, based on different but plausible CO2 flux distributions and atmospheric transport models, are used as a proxy for the statistics of the background errors. The resulting error statistics: (1) vary regionally and seasonally to better capture the uncertainty in the background CO2 field, and (2) have a positive impact on the analysis estimates by allowing observations to adjust predictions over large areas. A state-of-the-art four-dimensional variational (4D-VAR) system developed at the European Centre for Medium-Range Weather Forecasts (ECMWF) is used to illustrate the impact of the proposed approach for characterizing background error statistics on atmospheric CO2 concentration estimates. Observations from the Greenhouse gases Observing SATellite “IBUKI” (GOSAT) are assimilated into the ECMWF 4D-VAR system along with meteorological variables, using both the new error statistics and those based on a traditional forecast-based technique. Evaluation of the four-dimensional CO2 fields against independent CO2 observations confirms that the performance of the data assimilation system improves substantially in the summer, when significant variability and uncertainty in the fluxes are present.
Article
Full-text available
The on-line air quality model AQUM (Air Quality in the Unified Model) is a limited-area forecast configuration of the Met Office Unified Model which uses the UKCA (UK Chemistry and Aerosols) sub-model. AQUM has been developed with two aims: as an operational system to deliver regional air quality forecasts and as a modelling system to enable air quality studies to be conducted to inform policy decisions relating to emissions controls. This paper presents a description of the model and the methods used to evaluate the performance of the forecast system. Results are presented of evaluation studies conducted for a year-long period of operational forecast trials and several past cases of poor air quality episodes. To place the model performance in context we compare AQUM ozone forecasts with those of another forecasting system, the MACC ensemble, for a 5-month period. The results demonstrate that AQUM has a large dynamic range of modelled ozone levels and has a good level of responsiveness to elevated ozone episode conditions - a characteristic which is essential for forecasting poor air quality episodes. An analysis of the variation of model skill with forecast lead-time is presented and the insights this provides to the relative sources of error in air quality modelling are discussed.
Article
Full-text available
Empirical Orthogonal Function (EOF) Analysis is commonly used in the climate sciences and elsewhere to describe, reconstruct, and predict highly dimensional data fields. When data contain a high percentage of missing values (i.e. “gappy”), alternate approaches must be used in order to correctly derive EOFs. The aims of this paper are to assess the accuracy of several EOF approaches in the reconstruction and prediction of gappy data fields, using the Galapagos Archipelago as a case study example. EOF approaches included least-squares estimation via a covariance matrix decomposition (LSEOF), “Data Interpolating Empirical Orthogonal Functions” (DINEOF), and a novel approach called “Recursively-Subtracted Empirical Orthogonal Functions” (RSEOF). Model-derived data of historical surface Chlorophyll a concentrations and sea surface temperature, combined with a mask of gaps from historical remote sensing estimates, allowed for the creation of “true” and “observed” fields by which to gauge the performance of EOF approaches. Only DINEOF and RSEOF were found to be appropriate for gappy data reconstruction and prediction. DINEOF proved to be the superior approach in terms of accuracy, especially for noisy data with a high estimation error, although RSEOF may be preferred for larger data fields due to its relatively faster computation time.
Article
Full-text available
The real-time forecasts of ozone (O3) from seven air quality forecast models (AQFMs) are statistically evaluated against observations collected during July and August of 2004 (53 days) through the Aerometric Information Retrieval Now (AIRNow) network at roughly 340 monitoring stations throughout the eastern United States and southern Canada. One of the first ever real-time ensemble O3 forecasts, created by combining the seven separate forecasts with equal weighting, is also evaluated in terms of standard statistical measures, threshold statistics, and variance analysis. The ensemble based on the mean of the seven models and the ensemble based on the median are found to have significantly more temporal correlation to the observed daily maximum 1-hour average and maximum 8-hour average O3 concentrations than any individual model. However, root-mean-square errors (RMSE) and skill scores show that the usefulness of the uncorrected ensembles is limited by positive O3 biases in all of the AQFMs. The ensembles and AQFM statistical measures are reevaluated using two simple bias correction algorithms for forecasts at each monitor location: subtraction of the mean bias and a multiplicative ratio adjustment, where corrections are based on the full 53 days of available comparisons. The impact the two bias correction techniques have on RMSE, threshold statistics, and temporal variance is presented. For the threshold statistics a preferred bias correction technique is found to be model dependent and related to whether the model overpredicts or underpredicts observed temporal O3 variance. All statistical measures of the ensemble mean forecast, and particularly the bias-corrected ensemble forecast, are found to be insensitive to the results of any particular model. The higher correlation coefficients, low RMSE, and better threshold statistics for the ensembles compared to any individual model point to their preference as a real-time O3 forecast.
Article
Full-text available
Commonly used sums-of-squares-based error or deviation statistics—like the standard deviation, the standard error, the coefficient of variation, and the root-mean-square error—often are misleading indicators of average error or variability. Sums-of-squares-based statistics are functions of at least two dissimilar patterns that occur within data. Both the mean of a set of error or deviation magnitudes (the average of their absolute values) and their variability influence the value of a sum-of-squares-based error measure, which confounds clear assessment of its meaning. Interpretation problems arise, according to Paul Mielke, because sums-of-squares-based statistics do not satisfy the triangle inequality. We illustrate the difficulties in interpreting and comparing these statistics using hypothetical data, and recommend the use of alternate statistics that are based on sums of error or deviation magnitudes.
Article
[1] In any data assimilation framework, the background error covariance statistics play the critical role of filtering the observed information and determining the quality of the analysis. For atmospheric CO2 data assimilation, however, the background errors cannot be prescribed via traditional forecast or ensemble-based techniques as these fail to account for the uncertainties in the carbon emissions and uptake, or for the errors associated with the CO2 transport model. We propose an approach where the differences between two modeled CO2 concentration fields, based on different but plausible CO2 flux distributions and atmospheric transport models, are used as a proxy for the statistics of the background errors. The resulting error statistics: (1) vary regionally and seasonally to better capture the uncertainty in the background CO2 field, and (2) have a positive impact on the analysis estimates by allowing observations to adjust predictions over large areas. A state-of-the-art four-dimensional variational (4D-VAR) system developed at the European Centre for Medium-Range Weather Forecasts (ECMWF) is used to illustrate the impact of the proposed approach for characterizing background error statistics on atmospheric CO2 concentration estimates. Observations from the Greenhouse gases Observing SATellite “IBUKI” (GOSAT) are assimilated into the ECMWF 4D-VAR system along with meteorological variables, using both the new error statistics and those based on a traditional forecast-based technique. Evaluation of the four-dimensional CO2 fields against independent CO2 observations confirms that the performance of the data assimilation system improves substantially in the summer, when significant variability and uncertainty in the fluxes are present.
Article
The relative abilities of 2, dimensioned statistics-the root-mean-square error (RMSE) and the mean absolute error (MAE) -to describe average model-performance error are examined. The RMSE is of special interest because it is widely reported in the climatic and environmental literature; nevertheless, it is an inappropriate and misinterpreted measure of average error. RMSE is inappropriate because it is a function of 3 characteristics of a set of errors, rather than of one (the average error). RMSE varies with the variability within the distribution of error magnitudes and with the square root of the number of errors (n(1/2)), as well as with the average-error magnitude (MAE). Our findings indicate that MAE is a more natural measure of average error, and (unlike RMSE) is unambiguous. Dimensioned evaluations and inter-comparisons of average model-performance error, therefore, should be based on MAE.