Content uploaded by Tianfeng Chai

Author content

All content in this area was uploaded by Tianfeng Chai on Feb 17, 2015

Content may be subject to copyright.

Geosci. Model Dev., 7, 1247–1250, 2014

www.geosci-model-dev.net/7/1247/2014/

doi:10.5194/gmd-7-1247-2014

© Author(s) 2014. CC Attribution 3.0 License.

Root mean square error (RMSE) or mean absolute error (MAE)? –

Arguments against avoiding RMSE in the literature

T. Chai1,2 and R. R. Draxler1

1NOAA Air Resources Laboratory (ARL), NOAA Center for Weather and Climate Prediction,

5830 University Research Court, College Park, MD 20740, USA

2Cooperative Institute for Climate and Satellites, University of Maryland, College Park, MD 20740, USA

Correspondence to: T. Chai (tianfeng.chai@noaa.gov)

Received: 10 February 2014 – Published in Geosci. Model Dev. Discuss.: 28 February 2014

Revised: 27 May 2014 – Accepted: 2 June 2014 – Published: 30 June 2014

Abstract. Both the root mean square error (RMSE) and the

mean absolute error (MAE) are regularly employed in model

evaluation studies. Willmott and Matsuura (2005) have sug-

gested that the RMSE is not a good indicator of average

model performance and might be a misleading indicator of

average error,and thus the MAE would be a better metric for

that purpose. While some concerns over using RMSE raised

by Willmott and Matsuura (2005) and Willmott et al. (2009)

are valid, the proposed avoidance of RMSE in favor of MAE

is not the solution. Citing the aforementioned papers, many

researchers chose MAE over RMSE to present their model

evaluation statistics when presenting or adding the RMSE

measures could be more beneﬁcial. In this technical note, we

demonstrate that the RMSE is not ambiguous in its mean-

ing, contrary to what was claimed by Willmott et al. (2009).

The RMSE is more appropriate to represent model perfor-

mance than the MAE when the error distribution is expected

to be Gaussian. In addition, we show that the RMSE satis-

ﬁes the triangle inequality requirement for a distance metric,

whereas Willmott et al. (2009) indicated that the sums-of-

squares-based statistics do not satisfy this rule. In the end, we

discussed some circumstances where using the RMSE will be

more beneﬁcial. However, we do not contend that the RMSE

is superior over the MAE. Instead, a combination of metrics,

including but certainly not limited to RMSEs and MAEs, are

often required to assess model performance.

1 Introduction

The root mean square error (RMSE) has been used as a stan-

dard statistical metric to measure model performance in me-

teorology, air quality, and climate research studies. The mean

absolute error (MAE) is another useful measure widely used

in model evaluations. While they have both been used to

assess model performance for many years, there is no con-

sensus on the most appropriate metric for model errors. In

the ﬁeld of geosciences, many present the RMSE as a stan-

dard metric for model errors (e.g., McKeen et al., 2005;

Savage et al., 2013; Chai et al., 2013), while a few others

choose to avoid the RMSE and present only the MAE, cit-

ing the ambiguity of the RMSE claimed by Willmott and

Matsuura (2005) and Willmott et al. (2009) (e.g., Taylor

et al., 2013; Chatterjee et al., 2013; Jerez et al., 2013). While

the MAE gives the same weight to all errors, the RMSE pe-

nalizes variance as it gives errors with larger absolute values

more weight than errors with smaller absolute values. When

both metrics are calculated, the RMSE is by deﬁnition never

smaller than the MAE. For instance, Chai et al. (2009) pre-

sented both the mean errors (MAEs) and the rms errors (RM-

SEs) of model NO2column predictions compared to SCIA-

MACHY satellite observations. The ratio of RMSE to MAE

ranged from 1.63 to 2.29 (see Table 1 of Chai et al., 2009).

Using hypothetical sets of four errors, Willmott and

Matsuura (2005) demonstrated that while keeping the MAE

as a constant of 2.0, the RMSE varies from 2.0 to 4.0.

They concluded that the RMSE varies with the variability

of the the error magnitudes and the total-error or average-

error magnitude (MAE), and the sample size n. They further

Published by Copernicus Publications on behalf of the European Geosciences Union.

1248 T. Chai and R. R. Draxler: RMSE or MAE

demonstrated an inconsistency between MAEs and RMSEs

using 10 combinations of 5 pairs of global precipitation data.

They summarized that the RMSE tends to become increas-

ingly larger than the MAE (but not necessarily in a mono-

tonic fashion) as the distribution of error magnitudes be-

comes more variable. The RMSE tends to grow larger than

the MAE with n1

2since its lower limit is ﬁxed at the MAE

and its upper limit (n1

2·MAE) increases with n1

2. Further,

Willmott et al. (2009) concluded that the sums-of-squares-

based error statistics such as the RMSE and the standard er-

ror have inherent ambiguities and recommended the use of

alternates such as the MAE.

As every statistical measure condenses a large number of

data into a single value, it only provides one projection of the

model errors emphasizing a certain aspect of the error char-

acteristics of the model performance. Willmott and Matsuura

(2005) have simply proved that the RMSE is not equivalent

to the MAE, and one cannot easily derive the MAE value

from the RMSE (and vice versa). Similarly, one can readily

show that, for several sets of errors with the same RMSE, the

MAE would vary from set to set.

Since statistics are just a collection of tools, researchers

must select the most appropriate tool for the question being

addressed. Because the RMSE and the MAE are deﬁned dif-

ferently, we should expect the results to be different. Some-

times multiple metrics are required to provide a complete

picture of error distribution. When the error distribution is

expected to be Gaussian and there are enough samples, the

RMSE has an advantage over the MAE to illustrate the error

distribution.

The objective of this note is to clarify the interpretation of

the RMSE and the MAE. In addition, we demonstrate that

the RMSE satisﬁes the triangle inequality requirement for a

distance metric, whereas Willmott and Matsuura (2005) and

Willmott et al. (2009) have claimed otherwise.

2 Interpretation of RMSE and MAE

To simplify, we assume that we already have nsamples of

model errors calculated as (ei,i=1,2,...,n). The un-

certainties brought in by observation errors or the method

used to compare model and observations are not considered

here. We also assume the error sample set is unbiased. The

RMSE and the MAE are calculated for the data set as

MAE =1

n

n

X

i=1|ei|(1)

RMSE =v

u

u

t

1

n

n

X

i=1

e2

i.(2)

The underlying assumption when presenting the RMSE is

that the errors are unbiased and follow a normal distribution.

Table 1. RMSEs and MAEs of randomly generated pseudo-errors

with a zero mean and unit variance Gaussian distribution. Five sets

of errors of size nare generated with different random seeds.

nRMSEs MAEs

4 0.92, 0.65, 1.48, 1.02, 0.79 0.70, 0.57, 1.33, 1.16, 0.76

10 0.81, 1.10, 0.83, 0.95, 1.01 0.65, 0.89, 0.72, 0.84, 0.78

100 1.05, 1.03, 1.03, 1.00, 1.04 0.82, 0.81, 0.79, 0.78, 0.78

1000 1.04, 0.98, 1.01, 1.00, 1.00 0.82, 0.78, 0.80, 0.80, 0.81

10 000 1.00, 0.98, 1.01, 1.00, 1.00 0.79, 0.79, 0.79, 0.81, 0.80

100 000 1.00, 1.00, 1.00, 1.00, 1.00 0.80, 0.80, 0.80, 0.80, 0.80

1000 000 1.00, 1.00, 1.00, 1.00, 1.00 0.80, 0.80, 0.80, 0.80, 0.80

Thus, using the RMSE or the standard error (SE)1helps to

provide a complete picture of the error distribution.

Table 1 shows RMSEs and MAEs for randomly generated

pseudo-errors with zero mean and unit variance Gaussian

distribution. When the sample size reaches 100 or above, us-

ing the calculated RMSEs one can re-construct the error dis-

tribution close to its “truth” or “exact solution”, with its stan-

dard deviation within 5 % to its truth (i.e., SE =1). When

there are more samples, reconstructing the error distribution

using RMSEs will be even more reliable. The MAE here is

the mean of the half-normal distribution (i.e., the average of

the positive subset of a population of normally distributed

errors with zero mean). Table 1 shows that the MAEs con-

verge to 0.8, an approximation to the expectation of q2

π. It

should be noted that all statistics are less useful when there

are only a limited number of error samples. For instance, Ta-

ble 1 shows that neither the RMSEs nor the MAEs are robust

when only 4 or 10 samples are used to calculate those values.

In those cases, presenting the values of the errors themselves

(e.g., in tables) is probably more appropriate than calculat-

ing any of the statistics. Fortunately, there are often hundreds

of observations available to calculate model statistics, unlike

the examples with n=4 (Willmott and Matsuura, 2005) and

n=10 (Willmott et al., 2009).

Condensing a set of error values into a single number, ei-

ther the RMSE or the MAE, removes a lot of information.

The best statistics metrics should provide not only a perfor-

mance measure but also a representation of the error distribu-

tion. The MAE is suitable to describe uniformly distributed

errors. Because model errors are likely to have a normal dis-

tribution rather than a uniform distribution, the RMSE is a

better metric to present than the MAE for such a type of data.

1For unbiased error distributions, the standard error (SE) is

equivalent to the RMSE as the sample mean is assumed to be

zero. For an unknown error distribution, the SE of mean is the

square root of the “bias-corrected sample variance”. That is, SE =

s1

n−1

n

P

i=1(ei−)2, where =1

n

n

P

i=1ei.

Geosci. Model Dev., 7, 1247–1250, 2014 www.geosci-model-dev.net/7/1247/2014/

T. Chai and R. R. Draxler: RMSE or MAE 1249

3 Triangle inequality of a metric

Both Willmott and Matsuura (2005) and Willmott et al.

(2009) emphasized that sums-of-squares-based statistics do

not satisfy the triangle inequality. An example is given in

a footnote of Willmott et al. (2009). In the example, it is

given that d(a,c) =4, d (a, b) =2, and d(b, c) =3, where

d(x , y) is a distance function. The authors stated that d(x , y)

as a “metric” should satisfy the “triangle inequality” (i.e.,

d(a , c) ≤d(a, b) +d (b, c)). However, they did not specify

what a,b, and crepresent here before arguing that the sum

of squared errors does not satisfy the “triangle inequality”

because 4 ≤2+3, whereas 4222+32. In fact, this exam-

ple represents the mean square error (MSE), which cannot be

used as a distance metric, rather than the RMSE.

Following a certain order, the errors ei,i=1,...,n can

be written into a n-dimensional vector . The L1-norm and

L2-norm are closely related to the MAE and the RMSE, re-

spectively, as shown in Eqs. (3) and (4):

||1= n

X

i=1|ei|!=n·MAE (3)

||2=v

u

u

t n

X

i=1

e2

i!=√n·RMSE.(4)

All vector norms satisfy |X+Y|≤|X|+|Y|and |−X| =

|X|(see, e.g., Horn and Johnson, 1990). It is trivial to

prove that the distance between two vectors measured by

Lp-norm would satisfy |X−Y|p≤ |X|p+|Y|p. With three

n-dimensional vectors, X,Y, and Z, we have

|X−Y|p= |(X−Z)−(Y−Z)|p≤ |X−Z|p+|Y−Z|p.(5)

For n-dimensional vectors and the L2-norm, Eq. (5) can

be written as

v

u

u

t

n

X

i=1

(xi−yi)2≤v

u

u

t

n

X

i=1

(xi−zi)2+v

u

u

t

n

X

i=1

(yi−zi)2,(6)

which is equivalent to

v

u

u

t

1

n

n

X

i=1

(xi−yi)2≤v

u

u

t

1

n

n

X

i=1

(xi−zi)2

+v

u

u

t

1

n

n

X

i=1

(yi−zi)2.(7)

This proves that RMSE satisﬁes the triangle inequality re-

quired for a distance function metric.

4 Summary and discussion

We present that the RMSE is not ambiguous in its meaning,

and it is more appropriate to use than the MAE when model

errors follow a normal distribution. In addition, we demon-

strate that the RMSE satisﬁes the triangle inequality required

for a distance function metric.

The sensitivity of the RMSE to outliers is the most com-

mon concern with the use of this metric. In fact, the exis-

tence of outliers and their probability of occurrence is well

described by the normal distribution underlying the use of the

RMSE. Table 1 shows that with enough samples (n≥100),

including those outliers, one can closely re-construct the er-

ror distribution. In practice, it might be justiﬁable to throw

out the outliers that are several orders larger than the other

samples when calculating the RMSE, especially if the num-

ber of samples is limited. If the model biases are severe, one

may also need to remove the systematic errors before calcu-

lating the RMSEs.

One distinct advantage of RMSEs over MAEs is that RM-

SEs avoid the use of absolute value, which is highly unde-

sirable in many mathematical calculations. For instance, it

might be difﬁcult to calculate the gradient or sensitivity of

the MAEs with respect to certain model parameters. Further-

more, in the data assimilation ﬁeld, the sum of squared er-

rors is often deﬁned as the cost function to be minimized by

adjusting model parameters. In such applications, penalizing

large errors through the deﬁned least-square terms proves to

be very effective in improving model performance. Under the

circumstances of calculating model error sensitivities or data

assimilation applications, MAEs are deﬁnitely not preferred

over RMSEs.

An important aspect of the error metrics used for model

evaluations is their capability to discriminate among model

results. The more discriminating measure that produces

higher variations in its model performance metric among dif-

ferent sets of model results is often the more desirable. In this

regard, the MAE might be affected by a large amount of aver-

age error values without adequately reﬂecting some large er-

rors. Giving higher weighting to the unfavorable conditions,

the RMSE usually is better at revealing model performance

differences.

In many of the model sensitivity studies that use only

RMSE, a detailed interpretation is not critical because varia-

tions of the same model will have similar error distributions.

When evaluating different models using a single metric, dif-

ferences in the error distributions become more important.

As we stated in the note, the underlying assumption when

presenting the RMSE is that the errors are unbiased and fol-

low a normal distribution. For other kinds of distributions,

more statistical moments of model errors, such as mean, vari-

ance, skewness, and ﬂatness, are needed to provide a com-

plete picture of the model error variation. Some approaches

that emphasize resistance to outliers or insensitivity to non-

normal distributions have been explored by other researchers

(Tukey, 1977; Huber and Ronchetti, 2009).

As stated earlier, any single metric provides only one pro-

jection of the model errors and, therefore, only emphasizes

a certain aspect of the error characteristics. A combination

www.geosci-model-dev.net/7/1247/2014/ Geosci. Model Dev., 7, 1247–1250, 2014

1250 T. Chai and R. R. Draxler: RMSE or MAE

of metrics, including but certainly not limited to RMSEs and

MAEs, are often required to assess model performance.

Acknowledgements. This study was supported by NOAA grant

NA09NES4400006 (Cooperative Institute for Climate and

Satellites – CICS) at the NOAA Air Resources Laboratory in

collaboration with the University of Maryland.

Edited by: R. Sander

References

Chai, T., Carmichael, G. R., Tang, Y., Sandu, A., Heckel, A.,

Richter, A., and Burrows, J. P.: Regional NOxemission inversion

through a four-dimensional variational approach using SCIA-

MACHY tropospheric NO2column observations, Atmos. Env-

iron., 43, 5046–5055, 2009.

Chai, T., Kim, H.-C., Lee, P., Tong, D., Pan, L., Tang, Y., Huang,

J., McQueen, J., Tsidulko, M., and Stajner, I.: Evaluation of the

United States National Air Quality Forecast Capability experi-

mental real-time predictions in 2010 using Air Quality System

ozone and NO2measurements, Geosci. Model Dev., 6, 1831–

1850, doi:10.5194/gmd-6-1831-2013, 2013.

Chatterjee, A., Engelen, R. J., Kawa, S. R., Sweeney, C., and Micha-

lak, A. M.: Background error covariance estimation for atmo-

spheric CO2data assimilation, J. Geophys. Res., 118, 10140–

10154, 2013.

Horn, R. A. and Johnson, C. R.: Matrix Analysis, Cambridge Uni-

versity Press, 1990.

Huber, P. and Ronchetti, E.: Robust statistics, Wiley New York,

2009.

Jerez, S., Pedro Montavez, J., Jimenez-Guerrero, P., Jose Gomez-

Navarro, J., Lorente-Plazas, R., and Zorita, E.: A multi-physics

ensemble of present-day climate regional simulations over the

Iberian Peninsula, Clim. Dynam., 40, 3023–3046, 2013.

McKeen, S. A., Wilczak, J., Grell, G., Djalalova, I., Peck-

ham, S., Hsie, E., Gong, W., Bouchet, V., Menard, S., Mof-

fet, R., McHenry, J., McQueen, J., Tang, Y., Carmichael, G. R.,

Pagowski, M., Chan, A., Dye, T., Frost, G., Lee, P., and

Mathur, R.: Assessment of an ensemble of seven real-

time ozone forecasts over eastern North America during

the summer of 2004, J. Geophys. Res., 110, D21307,

doi:10.1029/2005JD005858, 2005.

Savage, N. H., Agnew, P., Davis, L. S., Ordóñez, C., Thorpe, R.,

Johnson, C. E., O’Connor, F. M., and Dalvi, M.: Air quality mod-

elling using the Met Ofﬁce Uniﬁed Model (AQUM OS24-26):

model description and initial evaluation, Geosci. Model Dev., 6,

353–372, doi:10.5194/gmd-6-353-2013, 2013.

Taylor, M. H., Losch, M., Wenzel, M., and Schroeter, J.: On the

sensitivity of ﬁeld reconstruction and prediction using empirical

orthogonal functions derived from gappy data, J. Climate, 26,

9194–9205, 2013.

Tukey, J. W.: Exploratory Data Analysis, Addison-Wesley, 1977.

Willmott, C. and Matsuura, K.: Advantages of the Mean Abso-

lute Error (MAE) over the Root Mean Square Error (RMSE)

in assessing average model performance, Clim. Res., 30, 79–82,

2005.

Willmott, C. J., Matsuura, K., and Robeson, S. M.: Ambiguities

inherent in sums-of-squares-based error statistics, Atmos. Env-

iron., 43, 749–752, 2009.

Geosci. Model Dev., 7, 1247–1250, 2014 www.geosci-model-dev.net/7/1247/2014/