Content uploaded by Jan A. van den Brakel

Author content

All content in this area was uploaded by Jan A. van den Brakel on Feb 10, 2014

Content may be subject to copyright.

METRON - International Journal of Statistics

2008, vol. LXVI, n. 1, pp. 21-49

HARM JAN BOONSTRA – JAN A. VAN DEN BRAKEL

BART BUELENS – SABINE KRIEG – MARC SMEETS

Towards small area estimation

at Statistics Netherlands

Summary -Ofﬁcial releases produced by Statistics Netherlands are predominantly

based on design-based or model-assisted estimation procedures. In the case of small

sample sizes, however, model-based procedures can be used to improve the precision

of these design-based estimators. In this paper two lines of model-based small area

estimation are discussed: the use of linear mixed models to borrow strength from

other areas and multivariate time series models to borrow strength from both previous

time periods and other areas. The different approaches are applied to the estimation

of annual and monthly unemployment ﬁgures. It is discussed how these model-based

approaches can be further developed before they are implemented in the regular survey

process to compile ofﬁcial releases.

Key Words - Labour Force Survey; Linear mixed models; Model-based estimation;

Ofﬁcial statistics; Structural time series models.

1. Introduction

The purpose of survey sampling is to obtain statistical information about

a ﬁnite population, by selecting a probability sample from this population,

measuring the required information about the units in this sample and estimating

ﬁnite population parameters such as means, totals and ratios. The statistical

inference in the traditional design-based approach is based on the stochastic

structure induced by the sampling design. Parameter and variance estimators

are derived under the concept of repeatedly drawing samples from a ﬁnite

population according to the same sampling design. A well known design-based

estimator is the Horvitz-Thompson (HT) estimator, developed by Narain (1951),

and Horvitz and Thompson (1952) for unequal probability sampling from ﬁnite

populations without replacement. The precision of the HT estimator can be

improved by exploiting correlations with auxiliary variables known for the

complete population, resulting in model-assisted generalized regression (GREG)

estimators (S¨arndal et al., 1992).

Received November 2007 and revised February 2008.

22 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

In the model-based approach the probability structure of the sampling de-

sign plays a less pronounced role, since the inference is based on the proba-

bility structure of an assumed statistical model, see e.g. Valliant et al. (2000).

Statistics Netherlands is, like many other European national statistical institutes

(NSIs), rather reserved in the application of such model-based estimation pro-

cedures. The prevailing opinion is that ofﬁcial statistics are preferably based on

empirical evidence and as little as possible on model assumptions. As a result,

traditional design-based or model-assisted procedures like GREG estimators are

generally applied for producing ofﬁcial statistics.

The property of (approximate) design-unbiasedness of design-based esti-

mators is useful for large sample sizes, giving a form of robustness to the

resulting estimates, but is often incompatible with reliable estimates for smaller

sample sizes. For small sample sizes, design-unbiasedness generally goes hand

in hand with large design-variances. The sampling design can be improved

to obtain more reliable estimates for some purposes, such as estimation for

small subpopulations, see e.g. Marker (2001) and Rao (2003), Section 2.6.

However, most surveys are multi-purpose, and there is often a trade-off be-

tween efﬁciencies for the various purposes. In any case, it appears that not

enough can be gained from adapting sampling designs. In addition, one needs

to bring in relevant information from other sources to improve the estima-

tion. This information can be included using a model-based procedure. The

estimation of subpopulation parameters for which insufﬁcient data are avail-

able to apply design-based or model-assisted procedures is the realm of small

area estimation (SAE). Rao (2003) gives a comprehensive overview of SAE

procedures.

Models can be used to borrow strength from various sources of information.

In a typical small area setting in ofﬁcial statistics there may be the following

sources of information:

• A survey is often conducted regularly, e.g. every month or every year.

Survey data from preceding periods, generally summarized in the form of

estimates, provide valuable information for estimation in the current period.

• The small area estimands exhibit some degree of similarity to each other,

i.e. information about the estimand in one area provides some information

about the estimands in other areas as well. In the case of geographic

areas, a model with a spatial correlation structure can account for the

relative locations of the areas, in the sense that nearby areas are expected

to be more similar than areas far apart.

• Auxiliary information related to the characteristics of interest is often avail-

able from registrations. The availability of data from registrations is rising.

Such information can be used in the form of covariates in the model. Aux-

iliary information may be available at the unit or area level, or both.

Towards small area estimation at Statistics Netherlands 23

Several developments make model-based procedures increasingly attractive

and relevant to NSIs for the production of ofﬁcial statistics (Chambers et al.,

2006). There is a growing demand for detailed statistics at a level where

sample sizes are small. Small sample sizes also arise in the production of

timely short-term economic indicators. Since there is a strong demand for

these timely indicators, many NSIs work with provisional releases based on

the data obtained in the ﬁrst part of the data collection period. Finally, there is

a persistent pressure for NSIs to reduce costs and response burden for businesses

by replacing survey data with register data.

This paper describes two applications of SAE to the Dutch Labour Force

Survey (LFS). A short description of the LFS is provided in Section 2. Sec-

tion 3 focuses on the use of mixed models to borrow strength from other areas

(geographic subpopulations) to produce annual municipal unemployment ﬁg-

ures. Section 4 focuses on the use of structural time series modeling to borrow

strength from other time periods and domains (socio-demographic subpopula-

tions) to estimate monthly unemployment rates. For small sample sizes there

can be no large sample robustness, and it becomes more important to evaluate

the assumptions underlying the model. Possible approaches are described in

Section 5. The paper concludes with a discussion about the use of model-based

procedures in ofﬁcial statistics in Section 6.

2. The Dutch Labour Force Survey

The objective of the LFS is to provide reliable information about the labour

market. Until September 1999, the LFS was conducted as a cross-sectional

survey. In October 1999, the LFS changed to a rotating panel design where

the respondents are interviewed ﬁve times at quarterly intervals.

Each month a sample of addresses is selected through a stratiﬁed two-

stage cluster design. Strata are formed by geographic regions. Municipalities

are considered as primary sampling units and addresses as secondary sampling

units. All households residing on an address, up to a maximum of three, are

included in the sample. During the period that the LFS was conducted as a

cross-sectional survey, the gross sample size averaged about 10,000 addresses

monthly. Since the LFS has to provide accurate outcomes on unemployment,

addresses that occur in the register of the Employment Exchange were over-

sampled. Furthermore, addresses with only persons aged 65 years and over are

undersampled, since most target parameters of the LFS concern people aged

15 through 64 years. Commencing the moment that the LFS is conducted as a

rotating panel design, the gross sample size has averaged about 8,000 addresses

monthly and the oversampling of addresses that occur in the register of the

Employment Exchange has stopped.

24 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

Under the cross-sectional design and in the ﬁrst wave of the panel, data are

collected by means of computer assisted personal interviewing (CAPI). In the

four subsequent waves of the panel, data are collected by means of computer

assisted telephone interviewing (CATI). During these re-interviews a condensed

questionnaire is applied to establish changes in the labour market position of

the household members aged 15 years and over. When a household member

cannot be contacted, proxy interviewing is allowed by members of the same

household in each wave.

The weighting procedure of the LFS is based on the GREG estimator

(S¨arndal et al., 1992). The inclusion probabilities reﬂect the oversampling and

undersampling of addresses described above as well as the different response

rates between geographic regions. The weighting scheme is based on a com-

bination of different socio-demographic categorical variables.

The population aged 15 through 65 is divided in three groups, namely

the employed labour force, the unemployed labour force and the group that

does not belong to the labour force. The population fractions belonging to

these groups are important parameters of the LFS. Another important parame-

ter is the unemployment rate, which is deﬁned as the ratio of the unemployed

labour force and the labour force. Because the monthly sample size of the

LFS is too small to publish reliable monthly ﬁgures using the GREG estima-

tor, moving averages over the preceding three months are published. Also the

yearly sample size is too small to produce reliable annual ﬁgures for separate

municipalities. Therefore, annual ﬁgures are only produced for large munici-

palities. In the next sections, model-based estimation procedures are applied to

improve annual municipal unemployment ﬁgures and monthly unemployment

ﬁgures.

3. Borrowing strength over space

As mentioned in the introduction, design-based or direct estimates for small

areas have large sampling variances and can be improved using explicit models

in which the individual areas are linked in some way. The resulting model-

based small area estimates thereby borrow strength from data about other areas.

Besides, relevant covariates at the area or unit level are important to include

in the model to further improve the estimates.

The focus in this section is on the estimation of municipal unemployment

fractions, based on a full year’s LFS data. The LFS data are reasonably well

spread out over the year, so these fractions can be thought of as time-averages

over the year. Several models and corresponding small area estimators are

compared in a simulation study. More information on these models, estimators

and their mean squared errors (MSEs) can be found in Rao (2003).

Towards small area estimation at Statistics Netherlands 25

3.1. The basic area level model

A popular model for SAE is the basic area level or type A model, also

known as the Fay-Herriot model (Fay and Herriot, 1979). The data in this

model are direct estimates

θ

i

, supposed to be (approximately) unbiased for the

municipal unemployment fractions θ

i

, and corresponding variance estimates ψ

i

.

The area index i runs from 1 to the number m of areas (municipalities). The

complete type A model is

θ

i

= θ

i

+ ǫ

i

,ǫ

i

ind

∼ N (0,ψ

i

), (1)

θ

i

= β

′

Z

i

+ v

i

,v

i

iid

∼ N (0,σ

2

v

), (2)

in which Z

i

is a vector of known covariates for the ith area, β is the corre-

sponding vector of ﬁxed effects, and v

i

are random area effects.

As

θ

i

, i = 1,...,m, are the input data for the area level model, (1) can

be viewed as a measurement equation with errors ǫ

i

, which in this case are

mainly due to sampling. The second line (2) of the model is the structural

part, which links the areas through the common coefﬁcients β. The model

parts can be combined into

θ

i

= β

′

Z

i

+ v

i

+ ǫ

i

, which can be recognized as

a linear mixed model, and estimation proceeds using the method of Empirical

Best Linear Unbiased Prediction (EBLUP). The EBLUPs for θ

i

based on the

type A model are

θ

A

i

=

ˆ

β

′

Z

i

+ˆv

i

=ˆγ

i

θ

i

+ (1 −ˆγ

i

)

ˆ

β

′

Z

i

,

ˆ

β =

m

i=1

ˆγ

i

Z

i

Z

′

i

−1

m

i=1

ˆγ

i

Z

i

θ

i

,

(3)

where ˆγ

i

=

σ

2

v

ψ

i

+σ

2

v

are the estimated ratios of model variance to total variance.

Various methods exist to estimate σ

2

v

(Rao, 2003). In this study the Fay-Herriot

moments estimator is used, which in Datta et al. (2005) is shown to perform

well compared to some other estimators, in particular with respect to the MSE

estimators for the small area predictors.

In contrast to direct estimators, model-based estimators are even deﬁned

for areas where there is no survey data at all. For such areas (3) reduces to

ˆ

β

′

Z

i

, which is the limit of

θ

A

i

as ψ

i

→∞.

The basic area level model is appealing to most survey statisticians since

complex sampling designs can be taken into account via the input estimates

θ

i

and ψ

i

. The

θ

i

can be design-unbiased HT estimators, or approximately

design-unbiased GREG estimators. The estimates (3) can then be viewed as

26 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

model-based improvements of the direct design-based estimates, and they inherit

the property of design-consistency, since ˆγ

i

→ 1asthe area sample size n

i

grows large. Design-consistency is a useful property for areas with relatively

large sample sizes.

When (part of the) auxiliary information is deﬁned and available at the

unit level, there is a choice between using population or sample aggregates

as covariates in (2). Since both response and covariate sample aggregates are

affected by the same sampling and non-response mechanisms, it seems more

natural to use the sample aggregates in the model. In that case, denoting by z

i

and Z

i

the vectors of sample and population aggregates for area i, respectively,

the EBLUP for θ

i

becomes

θ

A

∗

i

=

ˆ

β

′

Z

i

+ˆv

i

=ˆγ

i

θ

i

+

ˆ

β

′

(Z

i

− z

i

)

+ (1 −ˆγ

i

)

ˆ

β

′

Z

i

,

ˆ

β =

m

i=1

ˆγ

i

z

i

z

′

i

−1

m

i=1

ˆγ

i

z

i

θ

i

.

(4)

The direct component

θ

i

is effectively replaced by

θ

i

+

ˆ

β

′

(Z

i

−z

i

) with coefﬁ-

cients estimated at the area level. Provided that

θ

i

and z

i

are (approximately)

design-unbiased for θ

i

and Z

i

, respectively, this estimator, known as a survey

regression (SREG) estimator (Battese et al., 1988), is approximately design-

unbiased for θ

i

. Alternatively, one may use SREG estimators, ﬁtted at either

the area or unit level, directly as the input estimates

θ

i

in the type A model.

It is important to provide reliable variance estimates ψ

i

for the estimates

θ

i

,

since the weights ˆγ

i

in the EBLUPs are directly dependent on them. Individual

estimates of the design variances of the direct estimates may be highly unstable,

i.e. have large design variances themselves, due to the small area sample sizes.

A simple method to stabilize the variance estimates is to use a common pooled

sample variance S

2

p

of the response variable over areas, instead of individual

area variances S

2

i

,sothat

ψ

i

=

1 − n

i

/N

i

n

i

S

2

p

, where S

2

p

=

1

n − m

m

i=1

(n

i

− 1)S

2

i

,

where N

i

is the population size in area i, n

i

the sample size, and n =

m

i=1

n

i

.

3.2. Unit level models

Area estimates based on unit-level models are obtained by ﬁtting the model

to the data and using the ﬁtted model to predict the response for unobserved

Towards small area estimation at Statistics Netherlands 27

units. The data now consist of the binary unemployment variable y

ij

for persons

j ∈ s

i

, the sample in area i. The area estimates based on a model M are

θ

M

i

=

1

N

i

n

i

¯y

i

+

j∈r

i

ˆy

ij

, (5)

where ¯y

i

is the sample area mean, r

i

is the index set for unsampled units in

area i , and ˆy

ij

are predictions based on the ﬁtted model. Since measurement

errors are ignored, the response values y

ij

for j ∈ s

i

represent themselves in

the ﬁrst term of (5). However, the sampling fractions in the LFS are small

(on the order of 1 % for the data of a year), so that (5) is well approximated

by

1

N

i

i∈U

i

ˆy

ij

where U

i

= s

i

∪ r

i

denotes the population in area i . Optimal

predicted values under squared error loss are ˆy

ij

= E(y

ij

|y), the conditional

expectations given the data y. Restriction to the class of linear predictors gives

the BLUP, which is optimal under normality.

The basic unit level or type B model in SAE is the nested error regression

model of Battese et al. (1988). It is given by

y

ij

= β

′

x

ij

+ v

i

+ ǫ

ij

, i = 1,...,m and j = 1,...,N

i

,

v

i

iid

∼ N (0,σ

2

v

), ǫ

ij

iid

∼ N (0,σ

2

e

),

(6)

where x

ij

is a p-vector of covariates for individual j in area i, β is the

corresponding p-vector of ﬁxed effects, v

i

are random area effects, and ǫ

ij

are

residual errors. The type B model is again a linear mixed model, and EBLUPs

are given by

θ

B

i

=ˆγ

i

¯y

i

+

ˆ

β

′

X

i

−¯x

i

+ (1 −ˆγ

i

)

ˆ

β

′

X

i

,

ˆγ

i

=

σ

2

v

σ

2

v

+

σ

2

e

/n

i

,

ˆ

β =

X

′

ˆ

−1

X

−1

X

′

ˆ

−1

y .

(7)

Here ¯x

i

is a p-vector of sample means for area i , X

i

is the corresponding

vector of population means, X is the full n × p matrix of covariates, y is the

n-vector of response values, and

ˆ

= cov(y) =

σ

2

e

I

n

+

σ

2

v

⊕

m

i=1

J

n

i

, where

I

n

is the n-dimensional identity matrix, J

n

i

the n

i

× n

i

matrix with all ele-

ments 1, and ⊕

m

i=1

J

n

i

the block diagonal matrix with J

n

i

the block diagonal

elements. Maximum likelihood estimates for σ

2

e

and σ

2

v

are used. The hi-

erarchical Bayesian approach of integrating (7) over the posterior density for

the variance parameters (Datta and Ghosh, 1991) is an alternative. Due to the

large number of municipalities (over 400), this density is found to be quite

sharply peaked, and the resulting small area estimates are very similar to the

EBLUP estimates. This is also true for the corresponding MSE estimates. The

28 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

contribution of the uncertainty in the variance parameters to the MSEs of the

small area estimates is negligible in this application.

Note the similarity between (7) and (4), especially when the variance

estimates ψ

i

in the latter are pooled. The main difference is that the ﬁxed

effects in (7) are estimated at unit instead of area level. One advantage of

modeling at the unit level is that many more degrees of freedom are available

to ﬁt a model, so that more variables available from registrations can be used

as covariates.

Without the area effects v

i

, model (6) is a simple linear regression model.

The EBLUPs based on models without random effects are often called synthetic

estimators in the survey sampling literature. The synthetic estimate based on

the linear regression model for area i is

θ

S

i

=

ˆ

β

′

X

i

,

ˆ

β =

X

′

X

−1

X

′

y . (8)

Here, as in the last term of (7), the sample part n

i

¯y

i

in (5) has been approxi-

mated by the sum of the ﬁtted values.

Since the variable unemployed is binary, a unit level model for binary data

may be more appropriate. A possible model for binary data is

y

ij

ind

∼ Be

y

ij

( p

ij

),

logit( p

ij

) = β

′

x

ij

+ v

i

,

v

i

iid

∼ N (0,σ

2

v

),

(9)

where Be

z

( p) denotes the Bernoulli distribution for z with parameter 0 ≤ p ≤ 1,

and logit( p

ij

) ≡ log(

p

ij

1−p

ij

). This model is known as the logistic-normal model,

a member of the family of generalized linear mixed models. Early references

on the use of the logistic-normal model for SAE are MacGibbon and Tomberlin

(1989) and Malec et al. (1997). Empirical Bayes predictions

θ

LN

i

for the small

area means are

θ

LN

i

=

1

N

i

n

i

¯y

i

+

j∈r

i

p

ij

(

ˆ

β, ˆv)

,

p

ij

(

ˆ

β, ˆv) = logit

−1

(

ˆ

β

′

x

ij

+ˆv

i

) =

1

1 + exp(−

ˆ

β

′

x

ij

−ˆv

i

)

,

(10)

where estimates of β and v

i

are plugged in. The penalized quasi-likelihood

method (Breslow and Clayton, 1993) is used to ﬁt the logistic-normal model.

Prediction for non-linear models is computationally more cumbersome than

for linear models for two reasons. First, the ﬁtting of non-linear models is gen-

erally more difﬁcult. Second, the sum over the non-sampled units in (10) cannot

Towards small area estimation at Statistics Netherlands 29

be simpliﬁed in terms of population and sample means as in (7) and (8) for

linear models. It can at best be reduced to a sum over all unique conﬁgurations

of auxiliary vectors x

ij

occurring in the population of area i.

If the random effects v

i

in (9) are set to zero, a standard logistic regres-

sion model is obtained. The resulting small area estimates are called logistic

synthetic estimates.

3.3. Benchmarking

It is often desirable to benchmark the model-based small area estimates so

that they add up to direct estimates for large areas. For example, model-based

estimates can be benchmarked to the single national level GREG estimate. Also

the estimates for unemployment, employment and not in the labour force (nilf)

fractions, which are estimated using separate univariate models, are subject to

the constraint that they add up to 1. This results in the following restrictions:

1

N

m

i=1

N

i

θ

M;adj

i;a

=

θ

GREG

a

for a = 1, 2, 3 , (11)

3

a=1

θ

M;adj

i;a

= 1 for i = 1,...,m , (12)

where index a runs over the categories unemployed, employed and nilf.

The m × 3 table of small area estimates

θ

M

i;a

can be benchmarked to sat-

isfy (11) and (12) using the criterion of weighted least squares with the inverse

MSE estimates as weights (Battese et al., 1988). Stacking all 3m area estimates

in a vector

θ

M

and writing the 3+m −1 = m +2 restrictions (one is redundant)

as

θ

M;adj

= r with (m +2) ×3m matrix and (m +2) vector r , the method

of Lagrange multipliers gives

θ

M;adj

=

θ

M

+ V

′

(V

′

)

−1

(r −

θ

M

),

where V is a 3m ×3m covariance matrix, taken to be diagonal with the model-

based MSE estimates as elements.

3.4. Results from a simulation study

A simulation study is undertaken to compare the small area estimators

discussed. Population data for the simulation are constructed by replicating the

LFS sample data of one year, consisting of 44,687 households, to the population

level using the inclusion weights, see S¨arndal et al. (1992), Section 11.6. Only

ﬁrst wave (CAPI) data are considered. The coordination of monthly samples

30 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

is such that nearly all municipalities are sampled within a year. Therefore,

the sampling design over a year can be regarded as stratiﬁed with respect to

municipality. The simulation results are based on 100 samples. Each sample

is a simple random sample of 44,687 households stratiﬁed with respect to the

444 municipalities.

Unemployment fractions for all municipalities are estimated using both

design-based and model-based methods. The design-based estimators consid-

ered are HT, GREG, and SREG estimators. They share the property of (ap-

proximate) design-unbiasedness. The model-based estimators considered are

those based on the models described previously. The main covariate used in

this study is registered unemployment, which is a good explanatory variable for

unemployment as measured in the LFS, even though there are large differences

in deﬁnition between the two.

The main simulation results are displayed in Figure 1. Results are shown

for the SREG estimator, and the model-based estimators based on linear regres-

sion, type A, type A with variances pooled, type B, and logit-normal models.

The displayed simulation measures are (1) the square root of the MSE over

the 100 simulation runs, (2) the square root of the simulation mean of the

model-based MSE estimates, and (3) the coverage of estimated 95 % conﬁ-

dence intervals. These measures have been computed for all individual areas,

but shown are only their averages over three groups of municipalities, with

small, moderate and large population sizes, respectively.

With regard to all simulation measures considered, the SREG estimator

performs better than the other design-based estimators HT and GREG; the latter

are not shown in the ﬁgures. The simulation standard errors of both GREG and

HT estimates are approximately 10 % larger than those of the SREG estimates.

It is concluded that the model-based small area estimators, with the exception

of synthetic estimators, perform better than the design-based ones. As expected,

the difference is largest for the small municipalities. The synthetic estimates

perform somewhat better than the SREG estimates in terms of MSE. However,

the MSE estimates based on the ﬁxed effects model are far too low, and hence

coverage of estimated conﬁdence intervals is very low. This relatively poor

performance is partly due to the rather large dispersion of area unemployment

fractions in our simulation population, presumably much larger than in the real

population. This is a consequence of the way sample data have been used

to construct the population, even though municipalities with no unemployed

have been excluded. Nevertheless, standard model-based MSE estimates for

the synthetic small area estimates are clearly not very robust.

Due to the simple structure of auxiliary information used, the logistic

synthetic estimates are essentially the same as the synthetic estimates based on

the linear regression model. Also, the simulation results for the alternative type

A model estimates (4) are similar to those for the type A model. The use of

Towards small area estimation at Statistics Netherlands 31

0.0 0.005 0.010 0.015

average simulation RMSE

0 - 20000 inhabitants 20000-50000 inhabitants over 50000 inhabitants

Survey regression

Synthetic

Type A

Type A, pooled variances

Type B

Logistic-normal

0.0 0.005 0.010 0.015

average estimated RMSE

0 - 20000 inhabitants 20000-50000 inhabitants over 50000 inhabitants

Survey regression

Synthetic

Type A

Type A, pooled variances

Type B

Logistic-normal

87

16

78

87

93

89 93

15

89

93

96

95

93

15

93

95

94

95

Figure 1. Top: simulation root mean squared errors (RMSEs) for 6 estimators. Bottom: average MSE

estimates corresponding to the 6 estimators. The numbers on top of the bars denote 95 %

coverage percentages.

SREG estimates ﬁtted at the unit level as input estimates to the type A model

does reduce the simulation MSEs somewhat, and in combination with pooled

variance estimates the results are very similar to those of the type B model, as

expected.

Pooling the variances in the type A area level model has a clear positive

effect: MSEs are somewhat smaller, and the coverage is improved. Overall,

32 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

the models with random area effects yield the best results in this simulation

study, both in terms of MSEs of the point estimates and width and coverage

of estimated conﬁdence intervals. Type A (with pooled variances), type B and

logit-normal models perform quite similarly. In particular, in this study of

municipal unemployment fractions with registered unemployment as the only

covariate, there is no compelling reason for using the logistic-normal model

instead of a linear mixed model.

Figure 1 displays the simulation results before benchmarking. In a separate

step, small area estimates are benchmarked to satisfy (11) and (12). Since

aggregation yields estimates that are already in very close agreement with

the constraints, benchmarking turns out to have only very small effects on

the small area estimates. The only exception is the type A model without

variance pooling, which yields estimates with a signiﬁcant downward bias for

unemployment at the national level. This bias disappears after pooling the

variance estimates of the direct area estimates.

3.5. Software tool for SAE

To make model-based SAE available for the production of ofﬁcial statistics,

Statistics Netherlands has started to develop a tool to support the required

computations. This is a conveniently accessible tool, not a set of cumbersome

scripts. As a ﬁrst method the EBLUP based on the basic area level model is

implemented. This method is well accepted in the literature and sophisticated

enough to provide accurate small area estimates. Moreover, it can be deployed

in a process that naturally follows the weighting of survey data, from which

design-based estimates are derived as input.

The software tool can be launched from within SPSS and presents a graph-

ical user interface in which various settings can be entered, such as the option

to pool variance estimates. While the tool is implemented in the programming

languages Visual Basic and C#, it interacts with the SPSS software to pro-

vide an integrated system to the user. The software makes direct estimates,

ﬁts the model using the Fay-Herriot moments estimator, and calculates EBLUP

estimates, which, if requested, are made consistent with direct estimates at ag-

gregated levels. Besides the calibrated EBLUP small area estimates, the SPSS

output table contains MSE estimates, direct estimates and corresponding vari-

ance estimates. Estimated model parameters are given as well. At this stage the

tool is prototype software. It is anticipated that future research outcomes will

lead to modiﬁcations and enhancements of the software, such as the addition

of alternative models. Once the SAE methodology is an accepted approach in

the production of ofﬁcial statistics at Statistics Netherlands, the prototype can

be used as a base for building a production grade software tool to be used in

regular operations.

Towards small area estimation at Statistics Netherlands 33

4. Borrowing strength over time and space

Most surveys conducted by NSIs operate continuously in time and are

based on cross-sectional or rotating panel designs. SAE procedures that borrow

strength from data collected in the past as well as cross-sectional data from

other small domains are particularly interesting in such situations. The LFS, for

example, is conducted continuously in time and the monthly unemployment rate

is correlated with the unemployment rate in the preceding periods. Therefore it

is efﬁcient to use data observed in preceding periods to improve the estimator

for this parameter through time series modeling. This approach dates back

to Scott and Smith (1974), who proposed to consider the true value of the

ﬁnite population parameter as a realization of a stochastic process that can be

described with a time series model.

The common approach to borrow strength over time and space is to allow

for random domain and random time effects in a linear mixed model and

apply a composite estimator like the EBLUP. Rao and Yu (1994) extended the

area level model with an AR(1) model to combine cross-sectional data with

information observed in preceding periods. In EURAREA (2004) linear mixed

models that allow for spatial and temporal autocorrelation in the random terms

are proposed for area and unit level models. A different approach is followed

by Pfeffermann and Burck (1990) and Pfeffermann and Bleuer (1993). They

combine time series data with cross-sectional data by modeling the correlation

between the parameters of the separate domains in a multivariate structural

time series model. Pfeffermann and Burck (1990) show how the Kalman ﬁlter

recursions under particular state-space models can be restructured, like the

EBLUP estimators, as a weighted average of a design-based estimator and a

synthetic regression type estimator based on information observed in preceding

sample surveys and other small domains.

4.1. Multivariate structural time series models for monthly unemployment rates

In this section the state-space approach for repeated surveys of Pfeffermann

and Burck (1990) and Pfeffermann and Bleuer (1993) is applied to develop

model-based estimates for the monthly unemployment rates for a classiﬁcation

of gender by age in six domains. The state-space approach is applied because

it can handle a very ﬂexible and powerful class of models that account for

trend, seasonal effects and auxiliary information. This approach has a high

practical value for ofﬁcial statistics. First, Pfeffermann and Burck (1990) and

Pfeffermann and Tiller (2006) made this estimation procedure more robust

against model misspeciﬁcation by benchmarking the sum of the small area

estimates to the direct estimates at an aggregated level. Second, seasonally

adjusted parameter estimates and their estimation errors are obtained as a by-

34 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

product of this estimation procedure. Third, this approach accounts for the

complexity of the sampling design, since the GREG estimates are the input

data for the model. Finally, this approach can be extended to models that

account for the rotation group bias and for the autocorrelation between the

panels of a rotating panel design (Pfeffermann, 1991 and Pfeffermann et al.,

1998). Key references to early papers that develop the state-space approach for

repeated surveys are Tam (1987) and Tiller (1992). In this paper, a relatively

simple model is discussed, where only the ﬁrst wave of the rotating panel

design of the LFS is used, and where no auxiliary information is included. An

application of a model that uses all waves of the LFS is given in Van den

Brakel and Krieg (2007).

Let

θ

i,t

denote the GREG estimates for the true unemployment rate θ

i,t

of domain i and month t based on monthly samples for the following six

domains: (1) Men, 15-24 year, (2) Women, 15-24 year, (3) Men, 25-44 year,

(4) Women, 25-44 year, (5) Men, 45-64 year, (6) Women, 45-64 year. Each

month a vector

θ

t

= (

θ

1,t

,... ,

θ

6,t

)

′

is observed. The time series of this vector

is decomposed in a stochastic trend, a stochastic seasonal component and an

irregular component, i.e.

θ

t

= L

t

+ S

t

+ ǫ

t

, with L

t

= (L

1,t

,... ,L

6,t

)

′

the

vector with trends, S

t

= (S

1,t

,... ,S

6,t

)

′

the vector with seasonal effects and

ǫ

t

= (ǫ

1,t

,... ,ǫ

6,t

)

′

the vector with irregular components.

For the stochastic trends, the so-called smooth trend model is assumed, i.e.

L

i,t

= L

i,t−1

+ R

i,t−1

, R

i,t

= R

i,t−1

+ η

R,i,t

, E(η

R,i,t

) = 0,

Cov(η

R,i,t

,η

R,i

′

,t

′

) =

σ

2

R,i

if t = t

′

and i = i

′

ζ

R,i,i

′

if t = t

′

and i = i

′

0ift = t

′

.

(13)

The parameters L

i,t

and R

i,t

are referred to as the trend and the slope parameters

respectively. For the seasonal components a trigonometric model is assumed:

S

i,t

=

6

h=1

S

i,t,h

,

S

i,t,h

= S

i,t−1, h

cos(λ

h

) + S

∗

i,t−1, h

sin(λ

h

) + η

S,i,t,h

,

S

∗

i,t,h

=−S

i,t−1,h

sin(λ

h

) + S

∗

i,t−1, h

cos(λ

h

) + η

∗

S,i,t,h

,

λ

h

= hπ/6, for h = 1,... ,6,

with

E(η

S,i,t,h

) = 0,

Cov(η

S,i,t,h

,η

S,i

′

,t

′

,h

′

) =

σ

2

S,i, h

if t = t

′

and i = i

′

and h = h

′

0ift = t

′

or i = i

′

or h = h

′

.

(14)

Towards small area estimation at Statistics Netherlands 35

The assumptions of (14) also apply to the error terms η

∗

S,i,t,h

. The vari-

ances of the error terms within each harmonic are equal, i.e. σ

2

S,i, h

= σ

∗

2

S,i, h

,

where σ

∗

2

S,i, h

denotes the variance of η

∗

S,i,t,h

. Furthermore it is assumed that

Cov(η

S,i,t,h

,η

∗

S,i

′

,t

′

,h

′

) = 0 for all i , i

′

, t, t

′

, h and h

′

. The irregular compo-

nents ǫ

i,t

are modeled as uncorrelated white noise processes. These irregular

components contain the survey errors of the GREG estimates and the unex-

plained variation of the stochastic process used to model the true population

parameter θ

i,t

.

The trend and the seasonal components of the time series model describes

how the unemployment rate in month t is related to the unemployment rates

in the preceding months for each particular domain. This shows how sample

information obtained in preceding periods is used to improve the estimates

for the unemployment rates in month t. The model uses sample information

from other domains through the correlation between the slope parameters of

the trend. This ensures that the trend in the estimated unemployment rates for

the different domains change more or less simultaneously, depending on the

estimated correlation. In a similar way, it is possible to model the correlation

between the seasonal components. Under the trigonometric model this will

result in a rather complex model with a large number of hyperparameters.

A more rigid approach, assuming that the seasonal components for different

domains are equal, can be applied instead.

The standard way to proceed, is to express the model in state-space repre-

sentation, assume that the error terms are normally distributed, and apply the

Kalman ﬁlter to obtain optimal estimates for the monthly unemployment rates

as well as the trend and the seasonal components in these series, see Harvey

(1989) or Durbin and Koopman (2001) for details. The analysis is conducted

with software developed in Ox in combination with the subroutines of SsfPack

(beta 3) (Doornik, 1998 and Koopman et al., 1999).

4.2. Results

In this section, two models are applied to the series of the GREG estimates

of the monthly unemployment rates from 1996 through 2006. The ﬁrst model,

abbreviated as TSM1, only borrows strength over time, since it assumes a

univariate model for each domain by taking ζ

R,i,i

′

= 0in(13). The second

model, abbreviated as TSM2, assumes equal correlations between the slopes

of the smooth trend model, thus ζ

R,i,i

′

= ρ

R

σ

R,i

σ

R,i

′

in (13), and therefore

borrows strength both over time and space.

In Figure 2, the smoothed Kalman ﬁlter estimates for the slopes of the

six domains are plotted for TSM1 and TSM2. The ﬁgures illustrate that the

trend in the unemployment rate is declining during the ﬁrst three years, since

the slope parameters take negative values. From 2001 until 2005 the trend

36 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

1998 2000 2002 2004 2006

-0.003 -0.001 0.001 0.003

Year

Unemployment rate

Domain 1

Domain 2

Domain 3

Domain 4

Domain 5

Domain 6

(a) TSM1

1998 2000 2002 2004 2006

-0.003 -0.001 0.001 0.003

Year

Unemployment rate

Domain 1

Domain 2

Domain 3

Domain 4

Domain 5

Domain 6

(b) TSM2

Figure 2. Smoothed estimates of slope parameters for models TSM1 and TSM2.

gradually increases, since the slope parameters take positive values during this

period. From the second half of 2005 the trend in the unemployment rate is

decreasing again. It follows from Figure 2a that the slopes in the univariate

models for the six domains move more or less simultaneously up and down.

Towards small area estimation at Statistics Netherlands 37

Model TSM2 takes advantage of this, by explicitly allowing for correlation

between the slopes. The maximum likelihood estimate for the correlation, ρ

R

,

equals 0.85. As can be seen in Figure 2b, the slopes for the domains are more

similar under this model.

The estimated seasonal patterns are almost equal under both models. For

domains 1, 4, 5, and 6, the models ﬁnd rather ﬂexible seasonal patterns. For

domains 2 and 3, the seasonal patterns are more or less time independent.

The seasonal patterns of domains 1 and 3 under model TSM1 are plotted as

examples in Figure 3.

1998 2000 2002 2004 2006

-0.02 0.0 0.02

Year

Seasonal

(a) Domain 1

1998 2000 2002 2004 2006

-0.005 0.0 0.005

Year

Seasonal

(b) Domain 3

Figure 3. Smoothed estimates of seasonal components for domains 1 and 3.

In Figure 4 the GREG estimates and the ﬁltered estimates based on the two

models are compared for domains 2, 5, and 6. The ﬁltered estimates are

used because they are based on the complete set of information that would be

available in the regular production process to produce a model-based estimate

for month t, directly after ﬁnishing the data collection for that month. The

ﬁltered estimates under both models partly follow the ﬂuctuations in the GREG

series, since these ﬂuctuations are considered as time dependent seasonal effects

under the assumed model. A substantial part of the irregularities in the series

of the GREG estimates, however, are ﬂattened out, since they are considered

as survey errors under both models. The series of the ﬁltered estimates are

at the level of the series of the GREG estimates indicating that there is no

obvious bias in the ﬁltered estimates.

Comparing the ﬁltered estimates obtained under both models shows that

the time dependent seasonal patterns are the same. Small differences occur

in the level of the series. The correlation between the slopes results in small

adjustments in the level of the estimated monthly unemployment rates. For

example, the estimated monthly unemployment rates in domain 6 are slightly

adjusted downward during the last two years of the series.

38 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

1998 2000 2002 2004 2006

0.0 0.05 0.10 0.15

Year

Unemployment rate

GREG

STM1

(a) Domain 2

1998 2000 2002 2004 2006

0.0 0.05 0.10 0.15

Year

Unemployment rate

STM1

STM2

(b) Domain 2

1998 2000 2002 2004 2006

0.0 0.02 0.04 0.06

Year

Unemployment rate

GREG

STM1

(c) Domain 5

1998 2000 2002 2004 2006

0.0 0.02 0.04 0.06

Year

Unemployment rate

STM1

STM2

(d) Domain 5

1998 2000 2002 2004 2006

0.0 0.02 0.06 0.10

Year

Unemployment rate

GREG

STM1

(e) Domain 6

1998 2000 2002 2004 2006

0.0 0.02 0.06 0.10

Year

Unemployment rate

STM1

STM2

(f) Domain 6

Figure 4. GREG estimates and ﬁltered estimates of TSM1 and TMS2 for domains 2, 5, and 6.

In Figure 5 the standard errors of the ﬁltered estimates are compared with

the standard errors of the GREG estimates for domains 2 and 6. In Table I

the averages of the standard errors over the 12 months of 2006 are given for

the six domains. The variance of the GREG estimates is approximated with

the variance of the ratio of the GREG estimators for total unemployment and

total labour force under multistage sampling where the households are used

as the PSUs. The standard errors of the ﬁltered estimates of TSM1 are much

smaller than the standard errors of the GREG estimates. This illustrates that

Towards small area estimation at Statistics Netherlands 39

1998 2000 2002 2004 2006

0.0 0.01 0.02 0.03

Year

Standard error

GREG

STM1

STM2

(a) Domain 2

1998 2000 2002 2004 2006

0.0 0.005 0.010 0.015 0.020

Year

Standard error

GREG

STM1

STM2

(b) Domain 6

Figure 5. Standard errors of GREG estimates and ﬁltered estimates for domains 2 and 6.

borrowing strength from preceding time periods increases the precision of the

estimates substantially.

Comparing the standard errors of TSM1 and TSM2 shows that the preci-

sion can be further improved by using information from other domains. The

additional gain, however, is relatively small compared to the reduction in the

standard errors that is obtained by borrowing strength from the past. The stan-

40 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

Table I: Standard error GREG and ﬁltered estimates for two models, mean of 2006.

Estimator Domain 1 Domain 2 Domain 3 Domain 4 Domain 5 Domain 6

GREG 0.0189 0.0239 0.0060 0.0080 0.0070 0.0101

TSM1 0.0097 0.0108 0.0031 0.0043 0.0033 0.0056

TSM2 0.0094 0.0096 0.0029 0.0041 0.0031 0.0054

dard errors of the Kalman ﬁlter estimates do not reﬂect the bias due to model

misspeciﬁcation. This requires a careful model evaluation and selection, which

is addressed in Section 5.

5. Model evaluation and selection

It is always important to assess the assumptions underlying an estimation

procedure. Model assessment is particularly important in small area estima-

tion, since the data are sparse so that estimates are more sensitive to model

choice. A large literature exists on model comparison; see e.g. the overview

in Sorensen (2004) and references therein. The literature on model compari-

son and diagnostics speciﬁc to small area estimation is smaller, but growing.

Section 6 of Jiang and Lahiri (2006) provides a short overview and some key

references.

Measures for model selection usually weigh some form of goodness-of-ﬁt

against model complexity. However, goodness-of-ﬁt measures may not always

reﬂect the actual purpose of a study. Some types of lack of ﬁt may not

be important for the purpose of the study while other types may yield poor

results even for what seems “a small amount of lack of ﬁt”. For example,

models that yield good estimates for the overall population mean or total may

not be adequate for small area estimates. Cross-validation (CV) is a more

direct measure of the predictive power of a model that can be used to compare

models. Individual models can also be assessed using certain model diagnostics

that indicate whether a model is appropriate in some relevant aspect. Brown

et al. (2001) describe several diagnostics speciﬁc to small area estimation.

Harvey (1989) and Durbin and Koopman (2001) discuss several diagnostics for

state-space time series models. Model diagnostics should ideally indicate in

what directions a deﬁcient model can be improved.

Not all knowledge about a system and the data collection mechanism can

be accounted for in a model; the model would become too complicated. For

example, measurement errors are usually ignored in survey sampling. Neglect-

ing measurement errors in the model building process may have a large impact

on the results, however. Consider the realistic situation that each small area

is assigned to a single interviewer. In a study with large interviewer effects,

a model with random area effects would capture such effects as real whereas

Towards small area estimation at Statistics Netherlands 41

in a model without area effects the interviewer effects would largely average

out. In this situation, a synthetic estimator may actually work better than an

EBLUP based on a mixed model.

Models should be evaluated not only with regard to the small area estimates

they produce but also with regard to their standard errors. For example, models

without area effects may yield reasonable small area estimates, but sometimes

produce far too small standard errors, as observed for the synthetic estimates

in the simulation study of Section 3.4.

These considerations indicate that model evaluation and selection is a com-

plicated process, for which no single standard procedure exists. In the context

of small area estimation, and survey sampling in general, this process not

only involves subject matter knowledge about the variables of interest, but also

knowledge about the data collection process. Information used in the sampling

design is usually relevant to the main characteristics of interest. Design-based

estimators incorporate this information directly, most importantly by using in-

verse inclusion probabilities as weights. In order to avoid selection bias in

model-based procedures, such information should also be taken into account in

the modeling effort. If possible, relevant variables related to inclusion proba-

bilities or non-response propensities should be included as covariates, see e.g.

Gelman et al. (2003), Chapter 7. In the following two subsections, some pre-

liminary results about the use of model selection measures and diagnostics are

described for the application of SAE to the LFS.

5.1. Mixed models

A simulation study as described in Section 3.4 is a useful instrument to

select an appropriate model, especially when qualitatively different models are

compared, such as normal linear against speciﬁc discrete data models, and

unit against area level models. However, such studies are time consuming and

it is generally impossible to create study populations that are realistic in all

important aspects. One usually has to rely on model diagnostics and model

comparison measures, evaluated using the sample data at hand.

For the selection of a model within a certain class, such as the class of

normal linear mixed models, or the selection of a suitable set of covariates,

several model comparison measures exist. Widely used model comparison

measures are AIC and BIC, which combine a measure of goodness-of-ﬁt (the

log-likelihood) and a penalty term for the complexity of the model, which in

the case of linear regression models is simply the number of parameters p in

the model. These measures are given by

AIC =−2

L + 2 p , BIC =−2L + log(n) p , (15)

where

L is the log-likelihood at the parameter estimates and n is the number

of observations (units or areas, depending on the model). In mixed models

42 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

the number p of model degrees of freedom is more difﬁcult to determine,

because random effects should contribute something between 0 and their total

number as their variance varies from 0 (complete pooling) to ∞ (no pooling).

A practical solution to this problem in the case of linear mixed models is to

use the effective number of degrees of freedom deﬁned by the trace of the

hat matrix H , which takes the data to the ﬁtted values, i.e. ˆy = Hy, see

Hastie et al. (2003), Chapter 7. This reduces to the number of ﬁxed effects

in the case of complete pooling and to the total number of ﬁxed plus random

effects in the case of no pooling, and is in between in the case of partial

pooling.

Another widely applicable measure is CV. For area level models the use

of CV is perhaps not so obvious, since in that case CV measures how well

the direct input estimates, themselves subject to large variances, are predicted

by the model. Nevertheless, CV appears to be a useful measure also for

type A models. In that case a CV measure is given by (see the notation of

Section 3.1)

CV =

m

i=1

w

i

θ

i

−

θ

A(−i)

i

2

=

m

i=1

w

i

θ

i

−

θ

A

i

1 − H

ii

2

, (16)

where

θ

A(−i)

i

= Z

′

i

ˆ

β

(−i)

with

ˆ

β

(−i)

coefﬁcients estimated from the data excluding

the i th area, w

i

are adjustable weights and H

ii

=ˆγ

i

+(1−ˆγ

i

) ˆγ

i

Z

′

i

(

j

ˆγ

j

Z

j

Z

′

j

)

−1

Z

i

is the i th diagonal element of the hat matrix. The second equality in (16) can

be derived using the matrix inversion lemma, provided that a common value

of

σ

2

v

(the one based on all areas) is used in all m “minus one” ﬁts. Its

practical value is that no additional model ﬁtting is necessary to compute the

CV measure.

In a case study, AIC, BIC and CV are compared. Municipal unemployment

fractions are estimated using the type A model with 43 different combinations

of certain demographic covariates, the smallest model being the constant and

the largest model ethnicity × (degree of urbanisation + household composition)

with a total of 56 ﬁxed effects. It is found that BIC is very similar to CV with

weights w

i

chosen proportional to the estimated variance ratio ˆγ

i

, deﬁned below

equation (3). Another ﬁnding is that AIC does not penalize enough, since it

chooses the most complex model, which actually is among the models with the

largest true MSEs. BIC and especially CV achieve a higher correlation with

the true errors over the set of models. An advantage of CV in this case is that

the weights w

i

can be chosen to better reﬂect the objectives. For example, the

choice of constant weights is appropriate when all areas are considered equally

important, whereas the aforementioned choice w

i

∝ˆγ

i

gives more weight to

areas that can be better estimated using direct estimates, which are the larger

municipalities in this case.

Towards small area estimation at Statistics Netherlands 43

Examining residuals is an important part of model diagnosis. For example,

in the case of municipal estimates one may ﬁnd residual spatial correlation,

indicating that the model would beneﬁt from including some form of spatial

correlation. In a further extension of the simulation study, small improvements

of municipal unemployment estimates are observed after adding an exponential

spatial correlation structure to the basic unit level model. The improvements

are largest (10 % reduction in standard errors, on average) when degree of

urbanisation is included as a covariate. However, if registered unemployment is

used as a covariate, the improvements due to the exponential spatial correlation

structure are found to be negligible.

In Brown et al. (2001), a coverage diagnostic is developed as a model

evaluation method for SAE procedures. From the design-based estimates and

their variance estimates, approximate 95 % conﬁdence intervals can be formed.

If the model estimates are similar to the true population values, these intervals

should cover the model estimates in around 95 % of the cases. Substantially

smaller rates can indicate that the model estimates are strongly biased. Larger

rates, on the other hand, can be an indication that the model overﬁts the data.

In the simulation study of Section 3.4 the synthetic estimator does not per-

form as well as the estimators based on models with random effects, especially

concerning the corresponding MSE estimates. Using design-based SREG esti-

mates and variance estimates to form approximate 95 % conﬁdence intervals,

the coverage diagnostic for the synthetic estimates is only 78 %, much lower

than for the estimates based on models with random effects. Here this indicates

overshrinkage of the synthetic estimates, which can be overcome by including

random area effects. In using such a coverage diagnostic, one should keep in

mind that the approximate 95 % design-based conﬁdence intervals themselves

may have lower than nominal coverage of the true population parameter due to

the small sample sizes, that the model-based estimates are also subject to un-

certainty, and that the design-based and model-based estimates can be strongly

correlated. Another diagnostic is the difference between model-based estimates

and direct estimates at an aggregate level. A large difference may be corrected

by benchmarking the model-based estimates, or by adjusting the model in an

appropriate way.

5.2. Structural time series models

Alternative variants of the time series models applied in Section 4 are

considered. Competing models assume that the variances for the harmonics

in the trigonometric seasonal model are equal or use the well known dummy

variable seasonal model, i.e.

11

h=0

S

i,t− h

= η

S,i,t

with E(η

S,i,t

) = 0. These

models result in less ﬂexible seasonal patterns in this particular application. The

more ﬂexible the seasonal patterns, the more past observations are discounted

44 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

in constructing a seasonal pattern in the model estimates for the unemployment

rate. The dummy variable seasonal model as well as the trigonometric model

with equal variances for the harmonics make more use of sample information

observed in the past, which results in smaller standard errors for the ﬁltered

estimates of the monthly unemployment rates. In Krieg and Van den Brakel

(2007), several models using the dummy variable seasonal model are applied

to the series of Section 4.2.

The underlying assumptions of the state-space model are that the distur-

bances of the measurement and system equations are normally distributed and

serially independent with constant variances. Under these assumptions, the

prediction errors (

θ

i,t

−

ˆ

θ

TSMx

i,t|t −1

), for x =1or 2, are also normally distributed

and serially independent with constant variance, where

ˆ

θ

TSMx

i,t|t −1

denotes the one

step forecast for time period t using the information observed until time pe-

riod t − 1. There are different diagnostics available in the literature to test to

what extent these assumptions are tenable, see Durbin and Koopman (2001),

Section 2.12. These diagnostic tests indicate that the prediction errors of the

dummy variable seasonal model contain more autocorrelation compared to the

trigonometric model with separate variances for the harmonics.

The coverage diagnostics described in Section 5.1 are almost equal for

TSM1 and TSM2 and vary between 93 % and 99 % for the individual domains.

Therefore this diagnostic does not indicate that the model estimates are biased

nor that they are too close to the GREG estimates. The coverage diagnostics

for the dummy variable seasonal models are similar, see Krieg and Van den

Brakel (2007). Coverage rates for subsets of the series can also provide valuable

information. For example coverage rates for the same months over the different

years (all Januaries etc.) can be used to check whether the seasonal patterns

are modeled adequately.

The mean of the absolute values of the prediction errors or the mean of the

squared prediction errors can be used as a form of CV to measure the devia-

tion of the model forecasts from the GREG estimates. These measures indicate

that TSM2 performs slightly better than TSM1. The prediction errors obtained

with the trigonometric seasonal models are also slightly smaller compared to

the dummy variable seasonal models (Krieg and Van den Brakel, 2007). The

trigonometric models have slightly larger standard errors for the ﬁltered esti-

mates, but better meet the stated model assumptions and have slightly smaller

prediction errors and are therefore preferred over dummy variable seasonal

models in this application.

The tests for heteroscedasticity indicate that the assumption of constant

variance in the prediction errors is violated for domain 1 and 5 under model

TSM1 and TSM2; the variance of the prediction errors is larger in the second

part of these series. This heteroscedasticity can be partially diminished by taking

the variances of the disturbances in the measurement equation proportional to

Towards small area estimation at Statistics Netherlands 45

the inverse of the sample size. This does not result in signiﬁcant changes in

the ﬁltered estimates of the unemployment rates.

AIC and BIC can be used to compare and select time series models.

Their use in the context of state space modeling is, however, not straight-

forward. Standard expressions for these criteria only penalize the number of

hyperparameters in the state space model, see e.g. Harvey (1989), Section 2.6.

Deterministic components in the state vector are not penalized. Nor does

this penalty account for the increased model complexity if state variables for

separate domains share the same hyperparameter. For example this standard

expression gives the same penalty to the model where all domains have the

same seasonal component and to a model where all domains are allowed to

have separate seasonal components, but share the same hyperparameters. One

strategy is to penalize the number of hyperparmeters and the number of non-

stationary elements in the state vector, see Harvey (1989), Section 5.5. This

strategy, however, still does not account for the fact that large values for the

variance parameters of the state variables increase the effective number of de-

grees of freedom of the model. Indeed, large values for these hyperparameters

allow large adjustments of the state variables and increase the effective number

of degrees of freedom as in the case of random components in linear mixed

models. The estimates of the state-space models of Section 4 are, given the

hyperparameters, linear expressions in the data, i.e. of the form Hy, where y is

the data vector containing the observed time series

ˆ

θ

i,t

. Therefore the effective

number of degrees of freedom, deﬁned by the trace of H,isanalternative

choice for the effective number of model parameters that can be used in model

selection criteria, such as AIC and BIC, Hastie et al. (2003), Chapter 7.

6. Discussion

As emphasized in the introduction, Statistics Netherlands is rather reserved

in the application of model-based estimation procedures for producing ofﬁcial

statistics. Several properties of the GREG estimators make them very attractive

to produce ofﬁcial releases in a regular production environment where there is

generally limited time available for the analysis phase. First, they are robust in

the sense that model-misspeciﬁcation does not compromise design-consistency

for large sample sizes. Second, they are often used to produce one set of

weights for the estimation of all target parameters of a multi-purpose sample

survey. This is not only convenient but also enforces consistency between the

marginal totals of different publication tables.

A major drawback of the GREG estimator is its relatively large design

variance in the case of small sample sizes. In such situations, model-based

estimation procedures might be used to produce sufﬁciently reliable statistics,

46 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

since they have much smaller variances. The price that is paid for this variance

reduction is that these model-based estimators are more or less design-biased.

Model-misspeciﬁcation easily results in severely biased estimates, particularly

when design features are not taken into account. Careful model selection

and validation is a central part of the application of these model-dependent

procedures. To facilitate the use of model-based procedures in ofﬁcial statistics,

methods that have some built-in mechanism against model-misspeciﬁcation are

preferable. The benchmark approach discussed in Section 3.3, for example,

reduces the sensitivity to model-misspeciﬁcation in the context of linear mixed

models. A similar approach for time series models is developed by Pfeffermann

and Burck (1990) and Pfeffermann and Tiller (2006).

In this paper linear mixed models are applied to estimate annual munic-

ipal unemployment fractions and multivariate structural time series models to

estimate monthly unemployment rates. As expected, the precision of the direct

estimators can be improved considerably under both approaches. It appears that

most is gained by using sample information observed in the past. This makes

the time series approach very attractive for repeated surveys, which are widely

used at NSIs. The time series approach also ﬁts in a framework for producing

preliminary timely releases. At the start of the data collection period, the time

series model yields forecasts for the population parameters (this is sometimes

called nowcasting). When new survey data become available, timely prelimi-

nary and ﬁnal estimates can be produced, taking advantage of data collected in

the past and in neighbouring areas. If the number of areas or domains becomes

large, as in the case of municipal unemployment ﬁgures, the dimensions of the

multivariate time series models are likely to result in estimation problems. In

such cases the linear mixed models might be preferable. The linear mixed

models considered in this paper can be improved by taking advantage of sam-

ple information from the past. The formal approach is to allow for temporal

autocorrelation in the random terms of the linear models (Rao and Yu, 1994

and EURAREA, 2004). A simpler alternative is to use estimates obtained in

preceding periods as auxiliary information in the model. Another possibility is

to apply univariate time series models to make model-based estimates that bor-

row strength over time. Subsequently, these estimates and corresponding MSE

estimates are used as the input for an area level model to borrow strength over

space. An interesting topic for future research is to compare the performances

of the multivariate time series models with linear mixed models that include

both random area and time effects.

Another preliminary ﬁnding for the unit level model is that including an

exponential spatial correlation structure improves the standard errors. These

ﬁndings are in line with EURAREA (2004) where the use of spatial correla-

tion structures in linear mixed models is explored. Also Pratesi and Salvati

(2008) use simultaneously autoregressive models and conditional autoregressive

Towards small area estimation at Statistics Netherlands 47

models to deﬁne spatial correlation structures in an area level model and report

important model improvements in a simulation study. Therefore the use of

spatial correlation structures is another possibility to improve the linear mixed

models for the Dutch LFS.

It is concluded that there is a case for basing ofﬁcial statistics on model-

based procedures in situations where design-based estimators do not result

in sufﬁciently reliable estimates. Such releases should be accompanied by

appropriate methodology and quality descriptions, where the underlying model

assumptions are stated explicitly.

The statistical theory of model-based SAE is rather complex and the avail-

able software at NSIs is often not suitable to conduct the required calculations

in a straightforward manner in a production environment. To facilitate the use

of SAE in survey processes, a user friendly software tool that can be launched

from SPSS is currently being developed. So far the basic area level model has

been implemented. A similar tool to support the structural time series approach

is desired.

Acknowledgments

The authors wish to thank Professor D. Pfeffermann for his advice during this project, and

Professor S.J. Koopman for making the beta version of Ssfpack 3 available. The views expressed in

this paper are those of the authors and do not necessarily reﬂect the policies of Statistics Netherlands.

REFERENCES

Battese, G. E., Harter, R. M., and Fuller, W. A. (1988) An error components model for

prediction of county crop areas using survey and satellite data, Journal of the American

Statistical Association, 83, 28–36.

Brakel, J. A. Van Den and Krieg, S. (2007) Modelling Rotation Group Bias and Survey Errors in

the Dutch Labour Force Survey, In: Proceedings of the Section on Survey Research Methods,

American Statistical Association, 2675–2682.

Breslow, N. E. and Clayton, D. G. (1993) Approximate Inference in Generalized Linear Mixed

Models, Journal of the American Statistical Association, 88, 9-25.

Brown, G., Chambers, R., Heady, P., and Heasman, D. (2001) Evaluation of Small Area Estima-

tion Methods - an Application to Unemployment estimates from the UK LFS, In: Proceedings

of Statistics Canada Symposium, 2001.

Chambers, R., Van Den Brakel, J. A., Hedlin, D., Lehtonen, R., and Zhang, Li-Chun (2006)

Future Challenges of Small Area Estimation, Statistics in Transition,7,759–769.

Datta, G. S. and Ghosh, M. (1991) Bayesian Prediction in Linear Models: Applications to Small

Area Estimation, The Annals of Statistics, 19, 1748–1770.

Datta, G. S., Rao, J. N. K., and Smith, D. D. (2005) On measuring the variability of small area

estimators under a basic area level model, Biometrika, 92, 183–196.

Doornik, J. A. (1998) Object-Oriented Matrix Programming using Ox 2.0,Timberlake Consultants

Press, London.

48 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS

Durbin, J. and Koopman, S. J. (2001) Time series analysis by state space methods, Oxford University

Press, Oxford.

Eurarea (2004) Project reference volume, deliverable D7.1.4, Technical report, EURAREA consor-

tium.

Fay, R. E. and Herriot, R. A. (1979) Estimates of income for small places: an application of

James-Stein procedures to Census data, Journal of the American Statistical Association, 74,

269–277.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian Data Analysis,

Chapman & Hall/CRC, London.

Harvey, A. C. (1989) Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge

University Press, Cambridge.

Hastie, T., Tibshirani, R., and Friedman, J. H. (2003) The Elements of Statistical Learning,

Springer, New York.

Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement

from a ﬁnite universe, Journal of the American Statistical Association, 47, 663–685.

Jiang, J. and Lahiri, P. (2006) Mixed Model Prediction and Small Area Estimation, Test, 15, 1–96.

Koopman, S. J., Shephard, N., and Doornik, J. A. (1999) Statistical Algorithms for Models in

State Space using SsfPack 2.2, Econometrics Journal,2,113-166.

Krieg, S. and Van Den Brakel, J. A. (2007) Model evaluation for multivariate structural time

series models for the Dutch Labour Force Survey, In: Proceedings of the Section on Survey

Research Methods, American Statistical Association, 2767–2774.

Macgibbon, B. and Tomberlin, T. J. (1989) Small Area Estimates of Proportions Via Empirical

Bayes Techniques, Survey Methodology, 15, 237–252.

Malec, D., Sedransk, J., Moriarity, C. L., and Leclere, F. B. (1997) Small Area Inference for

Binary Variables in the National Health Interview Survey, Journal of the American Statistical

Association, 92, 815–826.

Marker, D. A. (2001) Producing Small Area Estimates From National Surveys: Methods for

Minimizing Use of Indirect Estimators, Survey Methodology, 27, 183–188.

Narain, R. (1951) On sampling without replacement with varying probabilities, Journal of the Indian

Society of Agricultural Statistics,3,169–174.

Pfeffermann, D. (1991) Estimation and Seasonal Adjustment of Population Means Using Data from

Repeated Surveys, Journal of Business & Economic Statistics,9,163–175.

Pfeffermann, D. and Bleuer, S. R. (1993) Robust Joint Modelling of Labour Force Series of Small

Areas, Survey Methodology, 19, 149–163.

Pfeffermann, D. and Burck, L. (1990) Robust Small Area Estimation combining Time Series and

Cross-sectional Data, Survey Methodology, 16, 217–237.

Pfeffermann, D., Feder, M., and Signorelli, D. (1998) Estimation of Autocorrelations of Survey

Errors with Application to Trend Estimation in Small Areas, Journal of Business & Economic

Statistics, 16, 339–348.

Pfeffermann, D. and Tiller, R. (2006) Small Area Estimation with State Space Models Subject

to Benchmark Constraints, Journal of the American Statistical Association, 101, 1387–1397.

Pratesi, M. and Salvati, N. (2008) Small area estimation: the EBLUP estimator based on spatially

correlated random area effects, Statistical Methods and Applications, 17, 113–141.

Rao, J. N. K. (2003) Small Area Estimation,Wiley, New York.

Rao, J. N. K. and Yu, M. (1994) Small-area estimation by combining time-series and cross-sectional

data, The Canadian Journal of Statistics, 22, 511–528.

Towards small area estimation at Statistics Netherlands 49

S

¨

arndal, C-E., Swensson, B., and Wretman, J. (1992) Model Assisted Survey Sampling, Springer,

New York.

Sorensen, D. (2004) An Introductory Overview of Model Comparison and Related Topics,

http://www.dcam.upv.es/acteon/docs/modselmaster.pdf.

Tam, S. M. (1987) Analysis of Repeated Surveys using a Dynamic Linear Model, International

Statistical Review, 55, 63–73.

Tiller, R. B. (1992) Time Series Modelling of Sample Survey Data from the U.S. Current Population

Survey, Journal of Ofﬁcial Statistics,8,149–166.

Valliant, R., Dorfman, A. H., and Royall, R. M. (2000) Finite Population Sampling and

Inference-APrediction Approach,Wiley, New York.

HARM JAN BOONSTRA

JANA.VAN DEN BRAKEL

BART BUELENS

SABINE KRIEG

MARC SMEETS

Department of Statistical Methods

Statistics Netherlands

P.O. Box 4481

6401 CZ Heerlen

The Netherlands

hbta@cbs.nl