Content uploaded by Ray Chambers

Author content

All content in this area was uploaded by Ray Chambers on Jan 17, 2014

Content may be subject to copyright.

Proceedings of Statistics Canada Symposium 2001

Achieving Data Quality in a Statistical Agency: A Methodological Perspective

EVALUATION OF SMALL AREA ESTIMATION METHODS – AN

APPLICATION TO UNEMPLOYMENT ESTIMATES FROM THE

UK LFS

Gary Brown1, Ray Chambers2, Patrick Heady1, Dick Heasman1

ABSTRACT

This paper describes joint research by the ONS and Southampton University on the evaluation of several

different approaches to the local estimation of ILO unemployment. The need to compare estimators with

different underlying assumptions has led to a focus on evaluation methods that are (partly at least) model-

independent. Model fit diagnostics that have been considered include various residual procedures, cross-

validation, predictive validation, consistency with marginals, and consistency with direct estimates within single

cells. These have been used to compare different model-based estimators with each other and with direct

estimators.

KEY WORDS:Diagnostics; Estimates; Bias; Standarderrors; Confidence intervals.

1. INTRODUCTION

A small area estimation methodology can be thought of as a model plus fitting method for the small area

values of interest coupled with an estimation method based on the fitted model. Basic properties that we

require of such a methodology are:

1. The expected values defined by the model underlying the small area values should be “good”. That is,

they should explain a significant proportion of the variation in the small area values of interest. Note

that for models that include random effects, these do not contribute to the expected value.

2. The values for the model-based estimates derived from the fitted model should be consistent with the

unbiased direct survey estimates, where these are available. That is, they should provide an

approximation to the direct estimates that is consistent with these values being "close" to the expected

values of the direct estimates.

3. The model-based small area estimates should have mean squared errors significantly lower than the

variances of corresponding direct estimates.

4. The changes over time in the model-based estimates for a particular small area should be more stable

than the corresponding changes in the direct estimates over the same time.

5. The model-based estimates for a particular small area should be acceptable to informed users from that

small area.

This paper does not attempt to cover all of the above agenda. Clearly, standard model-fitting diagnostics

can be used to assess property 1 – and we have also restricted the discussion to indicators that relate to a

single point in time, thus excluding point 4. The most important omission, however, is point 5. Despite the

fact that user-consultation is not discussed here, ONS takes the process of consultation very seriously

indeed – both as a way of ensuring public acceptance, and as a valuable input to improving the estimates

themselves.

1Office for National Statistics, 1 Drummond Gate, London, SW1V 2QQ, U.K.

2Department of Social Statistics, University of Southampton, SO17 1BJ, U.K.

2

This paper is about the preliminary internal evaluation work needed to select a suitable small area estimator

in situations where there are a number of competing small area models that are not necessarily nested and

there is some doubt about the assumptions underpinning all of these models. In particular, we discuss four

diagnostics that we have found useful in this regard. These assess the bias and goodness of fit of the

estimation method, the coverage of the confidence intervals generated by the method and the calibration

error of the method. All are based on the crucial assumption that the direct estimates of the small area

values of interest are unbiased (but highly variable) and the confidence intervals associated with these

estimates achieve their nominal coverage levels.

These diagnostics have been developed in the process of investigating small area estimators for both

unemployment and a range of other socio-economic variables. The theory behind these estimators is

described in Ambler et al (2001) and ONS (2001). In the following section we describe the diagnostics in

more detail, applying them to a small area unemployment estimator from Ambler et al (2001). In practice

several diagnostics are used at each stage of the model selection process.

2. DIAGNOSTICS

2.1 A bias diagnostic

-{The direct estimates are unbiased estimates of the “truth” – if the truth were known and plotted on

the X axis of a graph, with direct estimates as Y, the regression line would fall on 45o.Weplotthe

model estimates as X, in place of the “truth”, and see how close the regression line is to Y=X. This

provides a visual illustration of bias, and by comparing the regression line with Y=X, a parametric

significance test for the bias of the model estimates3.}

The diagnostic is based on the following idea. If the model-based estimates are "close" to the small area

values of interest, then unbiased direct estimators should behave like random variables whose expected

values correspond to the values of the model-based estimates. That is, the model-based estimates should be

unbiased predictors of the direct estimates. As a check for such predictive (i.e. conditional) bias in the

model-based estimates, we plot appropriately scaled values of these estimates (X-axis) against similarly

scaled direct estimates (Y-axis) and then test whether the OLS (ordinary least squares) regression line fitted

to these points is significantly different from the identity line3.

When there is significant variation in small area sizes this test typically requires an initial transformation of

both the direct and model-based estimates so that the homoskedasticity assumption underpinning the OLS

fitting method is satisfied. Such a transformation can be identified using standard methods. In our

unemployment example below, a square root transformation was used, since the estimates relate to counts

of unemployed people in the small areas of interest.

The use of this diagnostic is straightforward when the focus of interest is on small area totals since

unbiased direct estimators of such totals are typically available. The use of transformations to stabilise the

residual variance in the plot will of course introduce a slight bias, but we feel that this is acceptable.

However the issue becomes more complex when the focus of interest is on small area proportions, because

the denominator of the direct estimator of such a proportion is typically a random variable and so the

proportion is in effect a ratio estimator and hence biased. We have adopted two different strategies in this

case: (I) Concentrate on the numerator of the estimated proportion - the estimated small area total -

since this can be estimated without bias.

(II) Compare the direct and model-based estimators of the proportion and accept that the resulting

ratio bias may slightly distort the interpretation of the diagnostic.

3The calculated significance values do not allow for the fact that the X values are derived from the same

data as the Y values. This will often make the true rejection probability of the test lower than its face value

– in which case an apparent rejection would be more significant than it seemed.

3

The relative attractiveness of these two options depends in part on characteristics of the sample and of the

population of the small areas of interest. In the case of strategy (I) there is a danger that, if these population

sizes vary a great deal, the pattern shown in the scatterplot will owe more to this variability than to the

biasedness or otherwise of the model-based estimators of the small area proportions. In the case of strategy

(II) the lower the coefficient of variation of the denominator of the small area proportion, the lower the risk

of serious bias in the direct estimate of the proportion, and hence the more applicable the diagnostic.

Finally, if the model underlying the small area estimates is actually fitted using proportions, strategy (II)

can also be interpreted in the following way. It provides a way of looking for bias due to model

misspecification - but cannot be expected to discover any bias in the direct estimates which were used in

fitting the model.

Example

In this example the variable of interest is the number of individuals who are unemployed according to the

International Labour Organisation definition (“ILO unemployed”) in 406 LAD/UAs (Local Authority

Districts and Unitary Authorities) in Great Britain. Direct estimates for this variable are available annually

from the Local Area Data Base of the UK Labour Force Survey (LFS). A number of logistic models for the

probability of being unemployed in a LAD/UA were fitted to these data and used to define model-based

estimates for these small areas. See Ambler et al (2001) for further details. Here we focus on the modified

Fay-Herriot approach concentrated upon in that paper. The model includes covariates defined by the

claimant count in six age by sex cells within each LAD/UA (the claimant count is the number of people

who claim unemployment benefits), as well as age by sex effects, LAD/UA regional effects and LAD/UA

socio-economic classification effects. In addition, the model includes an extra area level effect, defined by

the logit of the total LAD/UA claimant count (as a proportion of the LAD/UA population), which is used a

measure of the overall economic activity within the LAD/UA, and consequently reflects an individual's

opportunity to obtain employment in that area.

The OLS regression parameters, with standard errors in brackets, from the bias scatterplot for the five years

1995/1996 to 1999/2000 are given in Table 1. None of the regression lines show a significant difference

from Y=X. A visual illustration is given in Figure 1 for 1999/2000, where the Y=X and regression lines

show very little disparity.

Intercept Slope

1995/1996 0.463 (1.062) 0.989 (0.014)

1996/1997 -1.354 (1.060) 1.010 (0.014)

1997/1998 -0.832 (1.108) 1.003 (0.016)

1998/1999 -0.615 (1.104) 0.999 (0.017)

1999/2000 -1.927 (1.135) 1.017 (0.018)

Table 1 OLS regression parameters from bias scatterplots

4

sqrt(model-based estimate)

250200150100500

sqrt(direct estimate)

250

200

150

100

50

0

Figure 1 Bias scatterplot for 1999/2000 with Y=X and regression lines fitted

The interpretation changes when scatterplots and regression lines are fitted for the estimated proportions of

individuals who are ILO unemployed for LAD/UAs. Figure 2 shows the scatterplot for 1999/2000, with a

regression line Y = –0.0145(0.009) + 1.059(0.048)X.This shows more disparity from Y=X and hence more

possible bias, although the evidence is still not statistically significant.

sqrt(model-based estimate)

.4.3.2.10.0

sqrt(direct estimate)

.4

.3

.2

.1

0.0

Figure 2 Bias scatterplot for proportions for 1999/2000 with Y=X and regression lines fitted

2.2 A goodness of fit diagnostic

-{We want our model estimates to be close to the direct estimates when the direct estimates are good.

We inversely weight their squared difference by their variance and sum over all areas – this sum gives

more weight to differences from good direct estimates than from bad. We test this sum against the

2

χ

distribution to provide a parametric significance test of bias of model estimates relative to their

precision.}

5

As a check for unconditional bias in the model-based estimates we use a Wald goodness of fit statistic to

test whether there is a significant difference between the expected values of the direct estimates and the

model-based estimates.

In order to describe this test, we assume that the variable of interest is unemployment status and the

available data consist of direct estimates (from the LFS) of the population proportion unemployed by age

and sex in each small area, together with model-based estimates of the population proportion unemployed

in each small area. Let i denote age-sex class and j denote small area. We assume that age-sex classes are

finely enough defined so that within an age-sex class in an small area there is little or no variation in the

sample weights. This allows us to define an "average" weight for all individuals in age-sex class i in small

area j,

(1)

ij

s(ij)kk

ij n

w

w

∑

∈

=

and to therefore approximate the direct estimate ij

ˆ

zof the population proportion unemployed in age-sex

class i in small area j by

(2) ij

ij

s(ij)kk

ijij

s(ij)kkij

ij p

ˆ

n

U

nw

Uw

z

ˆ==≈

∑

∑

∈∈

where s(ij) denotes the individuals in age-sex class i who are in the survey sample in small area j, Uktakes

the value 1 if individual k is unemployed and zero otherwise, and wkis the survey sample weight attached

to individual k. Note that

ˆ

pij isjust the sample proportion of interest in age-sex class i and small area j. The

corresponding approximation to the direct estimate of the proportion of unemployed in the small area is

then

(3) ∑

∑

∑

∑=≈

iijij

iijij

iijij

ij

iijij

jnw

uw

nw

p

ˆ

nw

z

ˆ

where ij

udenotes the total number of unemployed in sample in age-sex class i in small area j. Since the

sample design of the LFS is essentially simple random sampling within a small area, we can model the

sample counts {uij}and{n

ij} as realisations of correlated multinomial random variables. In particular, let

ij

π

be the probability that a randomly chosen individual in age-sex class i and small area j has the

characteristic of interest and ij

φ

be the probability that a randomly chosen individual in small area j is in

age-sex class i. Then

(4) ijijjij n)E(u

φπ

=,)-(1n)var(u ijijijijjij

φπφπ

=,jijiijijjjiij -n)u,cov(u ′′′ =

φπφπ

(5) ijjij n)nE(

φ

=,)1(n)var(n ijijjij

φφ

−= ,jiijjjiij -n)n,cov(n ′′ =

φφ

and

(6) )-(1n)n,cov(u ijijijjijij

φφπ

=,jiijijjjiij -n)n,cov(u ′′ =

φφπ

.

6

First order approximations to the expected value and variance of j

z

ˆare

(7) j

iijij

iijijij

jw

w

)z

ˆ

E(

ζ

φ

πφ

=≈ ∑

∑

,2

j

T

j

j

j

jj

T

j

j))nE(w(

w)nuvar(w

)z

ˆ

var(

ζ

−

≈

where

u

j,

n

jand

w

jare the vectors with components {uij}, {nij}and{w

ij} respectively. Furthermore, the

components of )n-uvar( j

jj

ζ

are given by

(8)

+−

−

−

−= 2

jijj

ij

ijijij

ijijjijjij 2

1

)1(

)1(n)n-var(u

ζπζ

φφππ

φφζ

and

(9) )-)(-(-n)n-u,n-cov(u jjijijjiijjjijjiijjij

ζπζπφφζζ

′′′′ =.

Typically, small area-level model-based and direct survey estimates will be approximately uncorrelated.

Consequently, a Wald statistic for testing the small area-level goodness-of-fit of a model-based set of

estimates of interest is

(10) ∑+

−

=jjj

2

jj

)

ˆ

(V

ˆ

)z

ˆ

(V

ˆ)

ˆ

z

ˆ

(

W

ζ

ζ

where j

ˆ

ζ

is the model-based estimate of the proportion of small area j population that are unemployed,

)

ˆ

(V

ˆj

ζ

is its estimated variance and

(11) 2

j

j

j

jj

T

j

2

j

T

j

j

j

jj

T

j

jN

ˆw)nu(V

ˆ

w

))n(E

ˆ

w(

w)nu(V

ˆ

w

)z

ˆ

(V

ˆ

ζζ

−

=

−

≈

where j

N

ˆis the survey estimate of the population of the small area and )nu(V

ˆj

j

j

ζ

−is a matrix with

diagonal components

(12)

+−

−

−

−2

jijj

ijj

ijijjij

j

ij

ij ˆ

ˆ

ˆ

2

nn

)n

ˆ

n(

ˆ

n

n

1n

ζπζ

ππ

and off-diagonal components

(13) )

ˆ

ˆ

)(

ˆ

ˆ

(

n

nn

jjijij

j

jiij

ζπζπ

−−− ′

′.

Here ij

ˆ

π

is the model-based estimate of the proportion unemployed in age-sex class i in small area j.

7

Under the hypothesis that the model-based estimates are equal to the expected values of the direct

estimates, and provided the sample sizes in the small areas are sufficient to justify central limit

assumptions, W will then have a 2

χ

distribution with degrees of freedom equal to the number of small

areas in the population.

Example

We continue with the model-based approach introduced earlier for the same five years. The goodness of fit

statistics are in Table 2. None of the statistics show evidence to reject a 2

χ

distribution, if fact the fit

seems almost too good. This may be an artefact of including estimated between LAD/UA variance in the

mean squared error of the model-based estimates, which errs on the side of caution.

1995/1996 1996/1997 1997/1998 1998/1999 1999/2000

349.84

[p-value 0.98] 358.80

[p-value 0.96] 376.21

[p-value 0.85] 349.59

[p-value 0.98] 377.85

[p-value 0.84]

Table 2 Goodness of fit statistic values with [p-values]

2.3 A coverage diagnostic

-{95% Confidence intervals for the direct estimates should contain the “truth” 95% of the time. So

should the confidence intervals surrounding model-based estimates. We adjust both sets of intervals,

so that their chance of overlapping should be 95% and count how often they actually do overlap.

Assuming that the estimated coverage of the direct confidence intervals is correct, comparing the

counts to the Binomial distribution provides a non-parametric significance test of the bias of model

estimates relative to their precision.}

This diagnostic evaluates the validity of the confidence intervals generated by the model-based small area

estimation procedure. It assumes that valid 95 percent confidence intervals for the small area values of

interest can be generated from the direct estimates. The basic idea then is to measure the overlap between

these direct confidence intervals and corresponding 95 percent confidence intervals generated by the

model-based estimation procedure. However, since the degree of overlap between two independent 95

percent confidence intervals for the same quantity will be higher than 95 percent, it is necessary to first

modify the nominal coverage levels of the confidence intervals being compared in order to ensure a

nominal 95 percent overlap.

This modification is based on the fact that if X and Y are two independent normal random variables, with

the same mean but with different standard deviations, X

σ

and Y

σ

respectively, and if z(

α

)issuchthat

the probability that a standard normal variable takes values greater than z(

α

)is 2/

α

, then a sufficient

condition for there to be probability of

α

that the two intervals X ± z(

β

)X

σ

and Y ± z(

β

)Y

σ

do not

overlap is when

(14) 2

Y

2

X

-1

Y

X11)z()z(

σ

σ

σ

σ

αβ

+

+= .

Consequently, this diagnostic takes z(

α

) = 1.96, calculates z(

β

) using the above formula, with X

σ

replaced by the estimated standard error of the model-based estimate and Y

σ

replaced by the estimated

standard error of the direct estimate and then computes the overlap proportion between the corresponding

8

z(

β

)-based confidence intervals generated by the two estimation methodologies. Nominally, for z(

α

)=

1.96, this overlap proportion should be 95 percent. Note that z(

β

)=z(

α

)when X

σ

=0.

This diagnostic can also be used to assess the need to include a small area random effect in the model, by

just looking at the proportion of direct estimate-based confidence intervals that cover the model-based

estimates of the expected values of the small area quantities of interest. Ideally, if the model-based

estimator is essentially the small area quantity of interest, then around 5% of the small areas will record

such noncoverage. However, if small area level random effects are present (i.e. a multilevel model is more

appropriate than a single level model) then more than 5% of small areas will necessarily show

noncoverage. Used in this way, this diagnostic can be interpreted in two ways, as a test for bias in a single

level model, or as a test for whether a multilevel model is needed – the interpretation depending on whether

a single level model is known to be sufficient or not.

Example

We continue withthe model-based approach introducedearlier for the same five years. Non-coverage totals

and percentages are shown in Table 3 (we filter out zero direct estimates of unemployment). For 1997/1998

and 1999/2000 there is significant evidence to reject 5% non-coverage. However, this means we have

overcoverage, and the mean squared error of the model-based estimates is too large. As this is erring

towards giving conservative confidence intervals it is not a major cause for concern.

1995/1996 1996/1997 1997/1998 1998/1999 1999/2000

11 out of 406

(2.7%)

[p-value 0.03]

13 out of 406

(3.2%)

[p-value 0.11]

17 out of 406

(4.2%)

[p-value 0.54]

11 out of 406

(2.7%)

[p-value 0.03]

13 out of 406

(3.2%)

[p-value 0.11]

Table 3 Non-coverage totals with (percentages) and [p-values]

2.4 A calibration diagnostic

-{Calculating how much modelled estimates differ from direct estimates when aggregated to larger

domains shows us whether any particular larger domain is estimated worse than any other. For

example this may show how a model may be poorly estimating large urban areas, whilst estimating

large rural areas well. This provides some evidence regarding spatial bias/autocorrelation of model

estimates. However, the value of the evidence depends on the size of domains in question.}

The final diagnostic we consider is the amount of scaling required to calibration a set of model-based small

area estimates. This measure is based on what is typically a key requirement for small area estimates - that

they sum to direct estimates at appropriate levels of aggregation. We refer to this property as calibration.

The basis for this requirement is simple. Large sample sizes at higher levels of aggregation mean that the

direct estimates can be considered to accurate at these levels. Consequently, given two sets of model-based

estimates, one that agrees with the direct estimates under appropriate aggregation and one that does not, we

prefer the former. In practice, since model-based small area estimates are calibrated, usually by appropriate

scaling, checking calibration after such scaling is irrelevant. However, by calculating the relative difference

between the aggregated model-based estimates prior to this calibration and the aggregated direct estimates

we obtain a measure of how accurate the aggregated model-based estimates are, and provides a means to

compare different models.

An interesting issue to consider when using this diagnostic is deciding the calibration level. Since the

aggregated direct estimates to which the aggregated model-based estimates are being compared are

themselves subject to sampling variation, it is inappropriate to calibrate at too low a level of aggregation. It

is important to identify this "cut-off" size when considering what calibration to perform.

9

Example

We continue with the model-based approach introduced earlier for the same five years. Our model-based

estimates are required to be consistent with direct estimates at three margins: 6 National age-sex

breakdowns; 12 Government Office Regions; 7 Socio-Economic classifications. We calculate how deviant

the uncalibrated model-based estimates are from these margins, i.e. how much calibration is required to

achieve consistency. The results are in Table 4, in terms of the percentage increases needed to model-based

estimates in each margin. Although none of the percentages are major, the 5th category in the National age-

sex margin consistently requires the largest amount of calibration - this category is women aged 50+, for

whom the relationship between ILO unemployment and claimant count is known to be different from other

age-sex categories. Overall the calibration required is increasing over time (as can be seen by counting the

number of values over 1% per year). Clearly, future performance of the model will need to be monitored,

although at present these percentage differences are not excessive.

National age-sex Government Office

Region Socio-Economic

Classification

1995/1996 0.4 -0.6 0.3 0.3

1.8 -1.1 0.9 0.4 -0.1 1.1

1.2 -0.1 0.6 -0.1

0.7 0.3 0.2 0.3

0.3 -0.3 0.5 0.4

0.6 0.8 -0.5

1996/1997 -0.1 -0.4 0.5 0.3

2.2 -0.4 -0.8 0.5 -0.1 1.1

-0.5 1.4 1.5 0.8

-0.8 0.5 0.5 0.6

0.2 0.1 0.6 0.6

0.6 0.6 -0.3

1997/1998 -0.3 -1.2 0.0 0.6

2.8 0.8 0.7 0.5 -0.3 1.5

0.4 1.0 0.5 -0.8

0.0 1.3 0.2 0.0

0.6 0.3 0.0 -0.2

0.6 1.0 -2.1

1998/1999 -0.1 -0.4 0.7 1.1

2.2 -0.6 0.0 0.6 1.3 0.5

0.3 1.0 0.6 -0.2

1.6 1.2 0.1 0.0

1.3 1.0 1.2 -0.6

0.1 0.9 1.7

1999/2000 0.1 -0.9 0.7 0.8

3.9 1.6 1.8 1.1 1.2 0.7

0.0 0.7 0.0 1.0

0.9 2.1 -0.3 0.7

1.0 1.3 0.7 1.0

0.4 -0.2 2.3

Table 4 Percentage increases needed to model-based estimates, by margin, to achieve consistency with

direct estimates

3. CONCLUDING REMARKS

In the previous section we have presented four diagnostics that we have found useful for both assessing the

"fit" of a set of model-based small area estimates as well as comparing competing estimation methods (and

models). However, there are a number of other diagnostics that are currently under development. The most

relevant is a test of the robustness of the small area model to slight changes in the sample data. One

approach is via cross-validation, splitting the sample data into smaller subsets, fitting the same model to

each, and deriving a corresponding set of small area estimates. If the subsets are large enough to be

representative of the population we would expect similar models to result and similar estimates to be

obtained from each subset. The major problem here is deciding how to split the original sample data, and

whether reweighting is appropriate. With unit level sample data from each of the small areas of interest,

this should be reasonably straightforward. Unfortunately however, this is not always the case (e.g. the LFS

example, where the data consist of direct estimates by age, sex and small area). We would welcome

comments both on the methods we are currently using, and on ways in which we could add to our

diagnostic repertoire.

10

REFERENCES

Ambler, R., Caplan, D., Chambers, R., Kovacevic, M. and Wang, S. (2001), “Combining

unemployment benefits data and LFS data to estimate ILO unemployment for small areas: An

application of a modified Fay-Herriot method”, Proceedings of the International Association of

Survey Statisticians, Meeting of the International Statistical Institute, Seoul, August 2001.

ONS (2001), “Small Area Estimation Project”, unpublished report, London, U.K.: Office for National

Statistics.