Page 1

Kobe University Repository : Kernel

TitleTesting homogeneity of Japanese CPI forecasters

Author(s)Ashiya, Masahiro

CitationJournal of Forecasting, 29(5): 435-441

Issue date2010-08

Resource TypeJournal Article / 学術雑誌論文

Resource Versionauthor

URLhttp://www.lib.kobe-u.ac.jp/handle_kernel/90001225

Create Date: 2011-12-31

Page 2

Testing homogeneity of Japanese CPI forecasters*

Masahiro ASHIYA +

February 2009

JEL Classification Codes: E37; C53; E17.

Keywords: Macroeconomic Forecast; Forecast evaluation; Analysis of variance.

* I gratefully acknowledge financial supports from Grant-in-Aid for Encouragement of Young

Scientists from the Japanese Ministry of Education, Science and Culture.

+ Faculty of Economics, Kobe University, 2-1 Rokko-dai, Nada, Kobe, 657-8501, Japan;

E-mail: ashiya@econ.kobe-u.ac.jp

Page 3

1

Testing homogeneity of Japanese CPI forecasters

February 2009

This paper investigates whether some forecasters consistently outperform others using

Japanese CPI forecast data of 42 forecasters over the past 18 quarters. It finds that the

accuracy rankings of zero, one, two, and five-month forecasts are significantly different from

those that might be expected when all forecasters had equal forecasting ability. Moreover,

their rankings of the relative forecast levels are also significantly different from a random one.

JEL Classification Codes: E37; C53; E17.

Keywords: Macroeconomic Forecast; Forecast evaluation; Analysis of variance.

Page 4

2

1. Introduction

The world economy has gone through tumultuous changes over the past few years; the rise

and fall of the commodity prices, and the boom and bust of the financial markets. These

unprecedented shocks have made the task of macroeconomic forecasters formidable. As a

result, many of them have failed to foresee the volatile fluctuation of output and prices. This

experience leads us to the following question: Were all forecasters equally successful (or

unsuccessful) in this period? Was there any significant difference in their forecast accuracy?

We answer this question using the monthly data of 42 Japanese consumer price index (CPI)

forecasters from April 2004 through August 2008.

Section 2 explains the data. Section 3 evaluates the homogeneity of forecast accuracy. It

finds that forecasting ability is unequal among the forecasters. More precisely, the accuracy

rankings of zero, one, two, and five-month forecasts are significantly different from those that

might be expected when all forecasters had equal forecasting ability. This result is contrary to

the findings of Batchelor (1990), Batchelor and Dua (1990a, b), Kolb and Stekler (1996), and

Ashiya (2006).

Section 4 investigates the biases of the relative forecast level of the individual forecasters.

It finds that the rankings of the relative forecast levels are significantly different from a

random one for three, four, and five-month forecasts. Namely, forecasters differ

systematically in their forecast levels. This result is consistent with the findings of Batchelor

and Dua (1990b) and Ashiya (2006). Section 5 concludes the paper.

2. Data

The Economic Planning Association has conducted a monthly survey of professional

forecasters, “ESP Forecast Survey,” since April 2004. We use the forecast data of the

consumer price index (CPI) through August 2008. We select the data of 42 forecasters (out of

44 forecasters), who participated in 18 surveys or more (the excluded forecasters participated

in five surveys).

Let

t

CPI be the CPI of month t . Then the rate of change over the year,

tp , is

computed by the following equation:

100

12

12×

−

≡

−

−

t

tt

t

CPI

CPICPI

p

.

The quarterly average change over the year is calculated as the simple arithmetic mean of

tp .

More specifically, the quarterly average change over the year from month

2

−

t

to month t,

Page 5

3

tq , is defined as

() 3

12

tttt

pppq

++≡

−−

.

Let

i

−

tktf

,

be the k-month-ahead forecast of forecaster i with respect to

tq , which is

released in month

kt − . The forecast error is defined as

t

i

−

tkt

i

tkt

qfFE

−≡

−

,,

. Its absolute

value is

t

i

−

tkt

i

tkt

qfAFE

−≡

−

,,

.

We analyze zero through five-month-ahead forecasts in this paper. Zero-month forecasts

and three-month forecasts are released in March, June, September, and December. One-month

forecasts and four-month forecasts are released in February, May, August, and November.

Two-month forecasts and five-month forecasts are released in January, April, July, and

October. The sample period of each forecast series is as follows: from the first quarter of 2004

through the second quarter of 2008 (18 quarters) for zero-month forecast; from the second

quarter of 2004 through the second quarter of 2008 (17 quarters) for one-month forecast; from

the second quarter of 2004 through the third quarter of 2008 (18 quarters) for two-month

forecast and three-month forecast; from the third quarter of 2004 through the third quarter of

2008 (17 quarters) for four-month forecast and five-month forecast.

Table 1 presents the values of several traditional measures of forecast accuracy for the

individual forecasters. The first row in Table 1 shows the summary statistics (average,

standard deviation, minimum, and maximum) of the mean absolute error (MAE). As for

zero-month forecast, the average of the MAE among forecasters is 0.072 percentage points,

and the MAE of the best forecaster is zero. The second row shows the summary statistics of

the root mean square error (RMSE).

The third row of Table 1 shows the summary statistics of modified Theil’s U, constructed

as the ratio of the RMSE of each forecaster to the RMSE of the “same-as-the-last-month”

forecast. More specifically, define

i

−

tktT

,

as the set of quarters in which forecaster i released

i

−

tktf

,

. Let

i

tkt

U

,

−

be the Theil’s U of forecaster i for

i

−

tktf

,

. Then

i

tkt

U

,

−

is defined as

=

−

i

tkt

U

,

()

()

∑∑

−−

∈

−−

∈

−

−−

i

tkt

i

tkt

Tt

tkt

Tt

t

i

tkt

qpqf

,,

2

1

2

,

for

5 ,, 1 , 0 L

=

k

.

If

1

,>

t

−

i

kt

U

, then forecaster i is inferior to the “same-as-the-last-month” forecast. Table 1

shows that

i

tkt

U

,

−

is on average 2.269 for zero-month forecast and 1.104 for one-month

forecast.

These descriptive statistics seem to indicate that there are some differences in forecasting

Page 6

4

ability among the forecasters. However, they are not appropriate measures of forecast

accuracy for the following reason. To evaluate the ability of the forecasters, we must take into

account that some periods are more difficult to forecast than others. The variance of the

forecast errors tends to be larger in these difficult-to-forecast periods. It follows that the level

of the MAE (or the RMSE) is mainly determined by the performance in the

difficult-to-forecast periods. The next section employs a ranking-based test to deal with this

problem.

3. Tests for homogeneity of forecast accuracy

This section evaluates whether all forecasters had equal forecasting ability. Following Kolb

and Stekler (1996), we first employ the non-parametric test of ranking developed by Skillings

and Mack (1981), which is robust to changes in the variance of the forecast variables. Then

we consider the alternative method of O’Brien (1990). Both methods produce the same result:

forecasters are not equal in their forecasting abilities.

3.1 The methodology of the ranking test

To test whether all forecasters had equal forecasting ability, we consider the accuracy ranking

of the forecasters. Skillings and Mack (1981) generalize Friedman’s (1937) distribution-free

test and develop the following non-parametric test applicable to unbalanced panels.

Suppose the panel data consists of N forecasters and M quarters. Let

t N (

{

, 1L

N

≤

) be the

}

). Let

number of forecasters that release forecasts in the t-th quarter (

{}

tt

Nr

,, 1L

be the rank of the absolute forecast error of forecaster i in the t-th quarter. If

Mt

,

∈

i

∈

ties occur, we use average ranks. If forecaster i does not participate in the t-th quarter, we

()

tt

Nr

+=

15 . 0 .

assume

i

We define the adjusted rank of forecaster i in the t-th quarter,

i

tA , as

⎟⎠

⎞

⎜⎝

⎛

+

−

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

+

≡

2

1

1

12

5 . 0

t

i

t

t

i

t

N

r

N

A

. (1)

The first term of

i

tA compensates for the difference in observations. The second term

measures relative performance. A negative (positive)

i

tA indicates that the rank of forecaster

i in the t-th quarter is above (below) the median.

The sum of the adjusted ranks,

Page 7

5

∑=

≡

M

t

i

t

i

AA

1

(2)

indicates the relative performance over the sample period. If

iA is close to zero, forecast

accuracy of forecaster i is on average similar to others. If

iA is significantly smaller (larger)

than zero, the forecast accuracy of forecaster i is on average better (worse) than that of others.

(

≡

AAA

,,

. If A is significantly different from the zero vector, then it

Let

)′

N

1L

indicates that the forecasters vary in forecasting ability. To test this hypothesis, we consider

the covariance matrix, V, of the random vector A. Define

ij

m as the number of quarters

containing forecasts from forecasters i and j. Then the elements of V,

ij

σ , are defined as

⎪⎩

⎪⎨

⎧

=

≠−

=∑≠ik

ik

ij

ij

jim

jim

if

if

σ

.

Let

11

V be the upper left

1

−

)′

N

by

1

−

N

submatrix of V, and let

1

11

−

V

be the inverse of

11

V . Define

(

≡

−11

,,

ˆ

A

N

AA

L

. Skillings and Mack (1981) show that under the null

hypothesis there is no difference in forecasting ability, the statistic

A

ˆ

VA

ˆ

S

1

11

−

′

≡

(3)

has an asymptotic chi-squared distribution with

1

−

N

degrees of freedom. A significantly

large S indicates that forecasters were not equal in their forecasting abilities.

3.2 The results of the ranking test

Table 2 presents the values of the S-statistic calculated from the absolute forecast errors of 42

forecasters. The first row is the result of zero-month forecast. We obtain

98. 113

=

S

, which is

significant at the 0.01 level (the P-value is in the second column). The second row shows that

64.55

=

S

for one-month forecast, which is significant at the 0.10 level. The third row shows

that

75. 59

=

S

for two-month forecast, which is significant at the 0.05 level. The sixth row

shows that

05.54

=

S

for five-month forecast, which is significant at the 0.10 level.

These results demonstrate that the accuracy rankings of zero, one, two, and five-month

forecasts are significantly different from those that might be expected when all forecasters had

equal forecasting ability. Namely, we find that forecasting ability is unequal among the

forecasters. This conclusion is in stark contrast to the results of Batchelor (1990), Batchelor

and Dua (1990a, b), Kolb and Stekler (1996), and Ashiya (2006).

3.3 The regression test for homogeneity

Page 8

6

To confirm the above result, we estimate the following fixed-effects model used by O’Brien

( ) j dumi

(

, 1L

j

=

=

otherwise0

( ) sdumt

(

17, , 1L

s

) be the quarter dummy:

(1990) and Ashiya (2006). Let

41,

=

) be the individual dummy:

( )

j

⎩

⎨

⎧

if1

ji

dumi

.

Let

=

( )

s

⎩

⎨

⎧

=

=

otherwise0

if1

ts

dumt

.

The regression we consider is

( )

j

( )

s

i

tt

s

ts

j

i

j

i

tkt

udumdum AFE

,

17

1

41

1

,

+⋅+⋅+=

∑∑

==

−

γβα

. (4)

If

j

β is significantly smaller (larger) than zero, the absolute forecast error of forecaster j is

smaller (larger) than that of others conditional on the quarter-specific effects. The null

hypothesis is

0

411

===ββ

L

, i.e., forecasters are homogeneous in average forecast

accuracy.

Table 3 presents the result of the F-test on the coefficients of the individual dummies of

equation (4). It shows that the coefficient of the individual effect is significant at the 0.05

level for every forecast span (i.e. zero through five-month forecasts). Hence we have clear

evidence that forecasters differ systematically in forecast accuracy.

4. The test for homogeneity of the forecast level

This section examines whether some forecasters consistently release extremely large (or

extremely small) forecasts. To address this question, the observed distribution of the level of

their forecasts is compared with the distribution expected if their relative forecast levels each

quarter were purely random.

First, we employ the ranking-based test of Skillings and Mack (1981). Table 4 shows the

values of the S-statistic calculated from equation (3). We find that the rankings of the relative

forecast levels are significantly different from a random one for three, four, and five-month

forecasts.

Next we consider the fixed-effects model of equation (4), substituting

i

−

tktf

,

for

i

tkt

AFE

,

− . Table 5 shows the results of the F-test on the coefficients of the individual dummies.

We find that the coefficient of the individual effect is significant at the 0.05 level for two,

three, four, and five-month forecasts. These results indicate that some forecasters tend to

produce relatively large forecasts, while others tend to produce relatively small forecasts

Page 9

7

during the sample period.

5. Conclusions

This paper has used the monthly survey of 42 Japanese CPI forecasters from April 2004 to

August 2008, and has tested the hypothesis that all forecasters are equal in forecasting ability.

This hypothesis was rejected by the ranking test for zero, one, two, and five-month forecasts.

Furthermore, the hypothesis was rejected by the panel regression for zero through five-month

forecasts. This result presents a striking contrast to the past literature.

One qualification is that our result relies on the crucial assumption that forecasters aim to

minimize their forecast errors. There are various reasons for rational forecasters to announce

forecasts different from the conditional expected value. Ashiya (2009) finds that the Japanese

GDP forecasters in industries that emphasize publicity tend to make less accurate but more

extreme forecasts in order to gain publicity for their firms. Whether this “publicity effect” can

explain our result is an important topic for future research.

Page 10

8

References

Ashiya, M. (2006) “Forecast Accuracy and Product Differentiation of Japanese Institutional

Forecasters.” International Journal of Forecasting, 22, 395-401.

Ashiya, M. (2009) “Strategic Bias and Professional Affiliations of Macroeconomic

Forecasters.” Journal of Forecasting, 28, 120-130.

Batchelor. R.A. (1990) “All Forecasters Are Equal.” Journal of Business and Economic

Statistics, 8, 143-144.

Batchelor, R.A., and Dua, P. (1990a) “Forecaster Ideology, Forecasting Technique, and the

Accuracy of Economic Forecasts.” International Journal of Forecasting, 6, 3-10.

Batchelor, R.A., and Dua, P. (1990b) “Product Differentiation in the Economic Forecasting

Industry.” International Journal of Forecasting, 6, 311-316.

Friedman, M. (1937) “The Use of Ranks to Avoid the Assumption of Normality Implicit in the

Analysis of Variance.” Journal of the American Statistical Association, 32, 675-701.

Kolb, R.A., and Stekler, H.O. (1996) “How Well Do Analysts Forecast Interest Rates?”

Journal of Forecasting, 15, 385-394.

O’Brien, P.C. (1990) “Forecast Accuracy of Individual Analysts in Nine Industries.” Journal

of Accounting Research, 28(2), 286-304.

Skillings, J.H., and Mack, G.A. (1981) “On the Use of a Friedman-Type Statistic in Balanced

and Unbalanced Block Designs.” Technometrics, 23, 171-177.

Page 11

9

Table 1: The descriptive statistics

Zero-month forecast

MAE

RMSE

i

tt

U,

One-month forecast

MAE

RMSE

i

tt

U

, 1

Two-month forecast

MAE

RMSE

i

tt

U

, 2

−

Three-month forecast

MAE

RMSE

i

tt

U

, 3

−

Four-month forecast

MAE

RMSE

i

tt

U

, 4

−

Five-month forecast

MAE

RMSE

i

tt

U

, 5

−

MAE: mean absolute error.

RMSE: root mean square error.

∑

−

∈

kt

Tt

Average

0.072

0.108

2.269

Std. Dev.

0.056

0.071

1.522

Minimum

0.000

0.000

0.000

Maximum

0.270

0.383

8.573

Average

0.107

0.149

1.104

Std. Dev.

0.054

0.071

0.899

Minimum

0.033

0.058

0.608

Maximum

0.325

0.453

5.228

−

Average

0.184

0.250

1.062

Std. Dev.

0.072

0.090

0.307

Minimum

0.067

0.082

0.577

Maximum

0.471

0.615

2.051

Average

0.225

0.300

0.959

Std. Dev.

0.061

0.089

0.273

Minimum

0.100

0.114

0.546

Maximum

0.341

0.450

1.874

Average

0.268

0.361

0.811

Std. Dev.

0.075

0.105

0.200

Minimum

0.083

0.108

0.510

Maximum

0.380

0.501

1.519

Average

0.362

0.478

0.962

Std. Dev.

0.122

0.151

0.170

Minimum

0.133

0.183

0.558

Maximum

0.817

0.979

1.395

=

−

i

tkt

U

,

()

()

∑

−

∈

−−−

−−

i

tkt

i

t

Tt

tktt

i

tkt

qpqf

,,

2

1

2

,

for

5 ,, 1 , 0 L

=

k

.

Page 12

10

Table 2: The ranking test for homogeneity of the absolute forecast error

k S

P-value

0 113.98 0.000 ***

1 55.64 0.063 *

2 59.75 0.029 **

3 47.62 0.221

4 43.00 0.386

5 54.05 0.083 *

***: Significant at the 0.01 level.

**: Significant at the 0.05 level.

*: Significant at the 0.10 level.

Page 13

11

Table 3: The regression test for homogeneity of the absolute forecast error

k

F-test

P-value

0 6.020 0.000 ***

1 3.326 0.000 ***

2 3.086 0.000 ***

3 1.946 0.001 ***

4 1.603 0.012 **

5 2.194 0.000 ***

***: Significant at the 0.01 level.

**: Significant at the 0.05 level.

*: Significant at the 0.10 level.

Page 14

12

Table 4: The ranking test for homogeneity of the forecast level

k S

P-value

0 32.73 0.818

1 26.82 0.958

2 45.20 0.301

3 60.85 0.024 **

4 64.54 0.011 **

5 90.62 0.000 ***

***: Significant at the 0.01 level.

**: Significant at the 0.05 level.

*: Significant at the 0.10 level.

Page 15

13

Table 5: The regression test for homogeneity of the forecast level

k

F-test

P-value

0 0.886 0.675

1 0.877 0.690

2 1.360 0.071 *

3 1.352 0.075 *

4 1.697 0.005 ***

5 2.515 0.000 ***

***: Significant at the 0.01 level.

**: Significant at the 0.05 level.

*: Significant at the 0.10 level.