ArticlePDF Available

Towards small area estimation at Statistics Netherlands

Authors:

Abstract

Official releases produced by Statistics Netherlands are predominantly based on design-based or model-assisted estimation procedures. In the case of small sample sizes, however, model-based procedures can be used to improve the precision of these design-based estimators. In this paper two lines of model-based small area estimation are discussed: the use of linear mixed models to borrow strength from other areas and multivariate time series models to borrow strength from both previous time periods and other areas. The different approaches are applied to the estimation of annual and monthly unemployment figures. It is discussed how these model-based approaches can be further developed before they are implemented in the regular survey process to compile official releases.
METRON - International Journal of Statistics
2008, vol. LXVI, n. 1, pp. 21-49
HARM JAN BOONSTRA JAN A. VAN DEN BRAKEL
BART BUELENS SABINE KRIEG MARC SMEETS
Towards small area estimation
at Statistics Netherlands
Summary -Official releases produced by Statistics Netherlands are predominantly
based on design-based or model-assisted estimation procedures. In the case of small
sample sizes, however, model-based procedures can be used to improve the precision
of these design-based estimators. In this paper two lines of model-based small area
estimation are discussed: the use of linear mixed models to borrow strength from
other areas and multivariate time series models to borrow strength from both previous
time periods and other areas. The different approaches are applied to the estimation
of annual and monthly unemployment figures. It is discussed how these model-based
approaches can be further developed before they are implemented in the regular survey
process to compile official releases.
Key Words - Labour Force Survey; Linear mixed models; Model-based estimation;
Official statistics; Structural time series models.
1. Introduction
The purpose of survey sampling is to obtain statistical information about
a finite population, by selecting a probability sample from this population,
measuring the required information about the units in this sample and estimating
finite population parameters such as means, totals and ratios. The statistical
inference in the traditional design-based approach is based on the stochastic
structure induced by the sampling design. Parameter and variance estimators
are derived under the concept of repeatedly drawing samples from a finite
population according to the same sampling design. A well known design-based
estimator is the Horvitz-Thompson (HT) estimator, developed by Narain (1951),
and Horvitz and Thompson (1952) for unequal probability sampling from finite
populations without replacement. The precision of the HT estimator can be
improved by exploiting correlations with auxiliary variables known for the
complete population, resulting in model-assisted generalized regression (GREG)
estimators (S¨arndal et al., 1992).
Received November 2007 and revised February 2008.
22 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
In the model-based approach the probability structure of the sampling de-
sign plays a less pronounced role, since the inference is based on the proba-
bility structure of an assumed statistical model, see e.g. Valliant et al. (2000).
Statistics Netherlands is, like many other European national statistical institutes
(NSIs), rather reserved in the application of such model-based estimation pro-
cedures. The prevailing opinion is that official statistics are preferably based on
empirical evidence and as little as possible on model assumptions. As a result,
traditional design-based or model-assisted procedures like GREG estimators are
generally applied for producing official statistics.
The property of (approximate) design-unbiasedness of design-based esti-
mators is useful for large sample sizes, giving a form of robustness to the
resulting estimates, but is often incompatible with reliable estimates for smaller
sample sizes. For small sample sizes, design-unbiasedness generally goes hand
in hand with large design-variances. The sampling design can be improved
to obtain more reliable estimates for some purposes, such as estimation for
small subpopulations, see e.g. Marker (2001) and Rao (2003), Section 2.6.
However, most surveys are multi-purpose, and there is often a trade-off be-
tween efficiencies for the various purposes. In any case, it appears that not
enough can be gained from adapting sampling designs. In addition, one needs
to bring in relevant information from other sources to improve the estima-
tion. This information can be included using a model-based procedure. The
estimation of subpopulation parameters for which insufficient data are avail-
able to apply design-based or model-assisted procedures is the realm of small
area estimation (SAE). Rao (2003) gives a comprehensive overview of SAE
procedures.
Models can be used to borrow strength from various sources of information.
In a typical small area setting in official statistics there may be the following
sources of information:
A survey is often conducted regularly, e.g. every month or every year.
Survey data from preceding periods, generally summarized in the form of
estimates, provide valuable information for estimation in the current period.
The small area estimands exhibit some degree of similarity to each other,
i.e. information about the estimand in one area provides some information
about the estimands in other areas as well. In the case of geographic
areas, a model with a spatial correlation structure can account for the
relative locations of the areas, in the sense that nearby areas are expected
to be more similar than areas far apart.
Auxiliary information related to the characteristics of interest is often avail-
able from registrations. The availability of data from registrations is rising.
Such information can be used in the form of covariates in the model. Aux-
iliary information may be available at the unit or area level, or both.
Towards small area estimation at Statistics Netherlands 23
Several developments make model-based procedures increasingly attractive
and relevant to NSIs for the production of official statistics (Chambers et al.,
2006). There is a growing demand for detailed statistics at a level where
sample sizes are small. Small sample sizes also arise in the production of
timely short-term economic indicators. Since there is a strong demand for
these timely indicators, many NSIs work with provisional releases based on
the data obtained in the first part of the data collection period. Finally, there is
a persistent pressure for NSIs to reduce costs and response burden for businesses
by replacing survey data with register data.
This paper describes two applications of SAE to the Dutch Labour Force
Survey (LFS). A short description of the LFS is provided in Section 2. Sec-
tion 3 focuses on the use of mixed models to borrow strength from other areas
(geographic subpopulations) to produce annual municipal unemployment fig-
ures. Section 4 focuses on the use of structural time series modeling to borrow
strength from other time periods and domains (socio-demographic subpopula-
tions) to estimate monthly unemployment rates. For small sample sizes there
can be no large sample robustness, and it becomes more important to evaluate
the assumptions underlying the model. Possible approaches are described in
Section 5. The paper concludes with a discussion about the use of model-based
procedures in official statistics in Section 6.
2. The Dutch Labour Force Survey
The objective of the LFS is to provide reliable information about the labour
market. Until September 1999, the LFS was conducted as a cross-sectional
survey. In October 1999, the LFS changed to a rotating panel design where
the respondents are interviewed five times at quarterly intervals.
Each month a sample of addresses is selected through a stratified two-
stage cluster design. Strata are formed by geographic regions. Municipalities
are considered as primary sampling units and addresses as secondary sampling
units. All households residing on an address, up to a maximum of three, are
included in the sample. During the period that the LFS was conducted as a
cross-sectional survey, the gross sample size averaged about 10,000 addresses
monthly. Since the LFS has to provide accurate outcomes on unemployment,
addresses that occur in the register of the Employment Exchange were over-
sampled. Furthermore, addresses with only persons aged 65 years and over are
undersampled, since most target parameters of the LFS concern people aged
15 through 64 years. Commencing the moment that the LFS is conducted as a
rotating panel design, the gross sample size has averaged about 8,000 addresses
monthly and the oversampling of addresses that occur in the register of the
Employment Exchange has stopped.
24 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
Under the cross-sectional design and in the first wave of the panel, data are
collected by means of computer assisted personal interviewing (CAPI). In the
four subsequent waves of the panel, data are collected by means of computer
assisted telephone interviewing (CATI). During these re-interviews a condensed
questionnaire is applied to establish changes in the labour market position of
the household members aged 15 years and over. When a household member
cannot be contacted, proxy interviewing is allowed by members of the same
household in each wave.
The weighting procedure of the LFS is based on the GREG estimator
(S¨arndal et al., 1992). The inclusion probabilities reflect the oversampling and
undersampling of addresses described above as well as the different response
rates between geographic regions. The weighting scheme is based on a com-
bination of different socio-demographic categorical variables.
The population aged 15 through 65 is divided in three groups, namely
the employed labour force, the unemployed labour force and the group that
does not belong to the labour force. The population fractions belonging to
these groups are important parameters of the LFS. Another important parame-
ter is the unemployment rate, which is defined as the ratio of the unemployed
labour force and the labour force. Because the monthly sample size of the
LFS is too small to publish reliable monthly figures using the GREG estima-
tor, moving averages over the preceding three months are published. Also the
yearly sample size is too small to produce reliable annual figures for separate
municipalities. Therefore, annual figures are only produced for large munici-
palities. In the next sections, model-based estimation procedures are applied to
improve annual municipal unemployment figures and monthly unemployment
figures.
3. Borrowing strength over space
As mentioned in the introduction, design-based or direct estimates for small
areas have large sampling variances and can be improved using explicit models
in which the individual areas are linked in some way. The resulting model-
based small area estimates thereby borrow strength from data about other areas.
Besides, relevant covariates at the area or unit level are important to include
in the model to further improve the estimates.
The focus in this section is on the estimation of municipal unemployment
fractions, based on a full year’s LFS data. The LFS data are reasonably well
spread out over the year, so these fractions can be thought of as time-averages
over the year. Several models and corresponding small area estimators are
compared in a simulation study. More information on these models, estimators
and their mean squared errors (MSEs) can be found in Rao (2003).
Towards small area estimation at Statistics Netherlands 25
3.1. The basic area level model
A popular model for SAE is the basic area level or type A model, also
known as the Fay-Herriot model (Fay and Herriot, 1979). The data in this
model are direct estimates
θ
i
, supposed to be (approximately) unbiased for the
municipal unemployment fractions θ
i
, and corresponding variance estimates ψ
i
.
The area index i runs from 1 to the number m of areas (municipalities). The
complete type A model is
θ
i
= θ
i
+ ǫ
i
i
ind
N (0
i
), (1)
θ
i
= β
Z
i
+ v
i
,v
i
iid
N (0
2
v
), (2)
in which Z
i
is a vector of known covariates for the ith area, β is the corre-
sponding vector of fixed effects, and v
i
are random area effects.
As
θ
i
, i = 1,...,m, are the input data for the area level model, (1) can
be viewed as a measurement equation with errors ǫ
i
, which in this case are
mainly due to sampling. The second line (2) of the model is the structural
part, which links the areas through the common coefficients β. The model
parts can be combined into
θ
i
= β
Z
i
+ v
i
+ ǫ
i
, which can be recognized as
a linear mixed model, and estimation proceeds using the method of Empirical
Best Linear Unbiased Prediction (EBLUP). The EBLUPs for θ
i
based on the
type A model are
θ
A
i
=
ˆ
β
Z
i
v
i
γ
i
θ
i
+ (1 −ˆγ
i
)
ˆ
β
Z
i
,
ˆ
β =
m
i=1
ˆγ
i
Z
i
Z
i
1
m
i=1
ˆγ
i
Z
i
θ
i
,
(3)
where ˆγ
i
=
σ
2
v
ψ
i
+σ
2
v
are the estimated ratios of model variance to total variance.
Various methods exist to estimate σ
2
v
(Rao, 2003). In this study the Fay-Herriot
moments estimator is used, which in Datta et al. (2005) is shown to perform
well compared to some other estimators, in particular with respect to the MSE
estimators for the small area predictors.
In contrast to direct estimators, model-based estimators are even defined
for areas where there is no survey data at all. For such areas (3) reduces to
ˆ
β
Z
i
, which is the limit of
θ
A
i
as ψ
i
→∞.
The basic area level model is appealing to most survey statisticians since
complex sampling designs can be taken into account via the input estimates
θ
i
and ψ
i
. The
θ
i
can be design-unbiased HT estimators, or approximately
design-unbiased GREG estimators. The estimates (3) can then be viewed as
26 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
model-based improvements of the direct design-based estimates, and they inherit
the property of design-consistency, since ˆγ
i
1asthe area sample size n
i
grows large. Design-consistency is a useful property for areas with relatively
large sample sizes.
When (part of the) auxiliary information is defined and available at the
unit level, there is a choice between using population or sample aggregates
as covariates in (2). Since both response and covariate sample aggregates are
affected by the same sampling and non-response mechanisms, it seems more
natural to use the sample aggregates in the model. In that case, denoting by z
i
and Z
i
the vectors of sample and population aggregates for area i, respectively,
the EBLUP for θ
i
becomes
θ
A
i
=
ˆ
β
Z
i
v
i
γ
i
θ
i
+
ˆ
β
(Z
i
z
i
)
+ (1 −ˆγ
i
)
ˆ
β
Z
i
,
ˆ
β =
m
i=1
ˆγ
i
z
i
z
i
1
m
i=1
ˆγ
i
z
i
θ
i
.
(4)
The direct component
θ
i
is effectively replaced by
θ
i
+
ˆ
β
(Z
i
z
i
) with coeffi-
cients estimated at the area level. Provided that
θ
i
and z
i
are (approximately)
design-unbiased for θ
i
and Z
i
, respectively, this estimator, known as a survey
regression (SREG) estimator (Battese et al., 1988), is approximately design-
unbiased for θ
i
. Alternatively, one may use SREG estimators, fitted at either
the area or unit level, directly as the input estimates
θ
i
in the type A model.
It is important to provide reliable variance estimates ψ
i
for the estimates
θ
i
,
since the weights ˆγ
i
in the EBLUPs are directly dependent on them. Individual
estimates of the design variances of the direct estimates may be highly unstable,
i.e. have large design variances themselves, due to the small area sample sizes.
A simple method to stabilize the variance estimates is to use a common pooled
sample variance S
2
p
of the response variable over areas, instead of individual
area variances S
2
i
,sothat
ψ
i
=
1 n
i
/N
i
n
i
S
2
p
, where S
2
p
=
1
n m
m
i=1
(n
i
1)S
2
i
,
where N
i
is the population size in area i, n
i
the sample size, and n =
m
i=1
n
i
.
3.2. Unit level models
Area estimates based on unit-level models are obtained by fitting the model
to the data and using the fitted model to predict the response for unobserved
Towards small area estimation at Statistics Netherlands 27
units. The data now consist of the binary unemployment variable y
ij
for persons
j s
i
, the sample in area i. The area estimates based on a model M are
θ
M
i
=
1
N
i
n
i
¯y
i
+
jr
i
ˆy
ij
, (5)
where ¯y
i
is the sample area mean, r
i
is the index set for unsampled units in
area i , and ˆy
ij
are predictions based on the fitted model. Since measurement
errors are ignored, the response values y
ij
for j s
i
represent themselves in
the first term of (5). However, the sampling fractions in the LFS are small
(on the order of 1 % for the data of a year), so that (5) is well approximated
by
1
N
i
iU
i
ˆy
ij
where U
i
= s
i
r
i
denotes the population in area i . Optimal
predicted values under squared error loss are ˆy
ij
= E(y
ij
|y), the conditional
expectations given the data y. Restriction to the class of linear predictors gives
the BLUP, which is optimal under normality.
The basic unit level or type B model in SAE is the nested error regression
model of Battese et al. (1988). It is given by
y
ij
= β
x
ij
+ v
i
+ ǫ
ij
, i = 1,...,m and j = 1,...,N
i
,
v
i
iid
N (0
2
v
), ǫ
ij
iid
N (0
2
e
),
(6)
where x
ij
is a p-vector of covariates for individual j in area i, β is the
corresponding p-vector of fixed effects, v
i
are random area effects, and ǫ
ij
are
residual errors. The type B model is again a linear mixed model, and EBLUPs
are given by
θ
B
i
γ
i
¯y
i
+
ˆ
β
X
i
−¯x
i
+ (1 −ˆγ
i
)
ˆ
β
X
i
,
ˆγ
i
=
σ
2
v
σ
2
v
+
σ
2
e
/n
i
,
ˆ
β =
X
ˆ
1
X
1
X
ˆ
1
y .
(7)
Here ¯x
i
is a p-vector of sample means for area i , X
i
is the corresponding
vector of population means, X is the full n × p matrix of covariates, y is the
n-vector of response values, and
ˆ
= cov(y) =
σ
2
e
I
n
+
σ
2
v
m
i=1
J
n
i
, where
I
n
is the n-dimensional identity matrix, J
n
i
the n
i
× n
i
matrix with all ele-
ments 1, and
m
i=1
J
n
i
the block diagonal matrix with J
n
i
the block diagonal
elements. Maximum likelihood estimates for σ
2
e
and σ
2
v
are used. The hi-
erarchical Bayesian approach of integrating (7) over the posterior density for
the variance parameters (Datta and Ghosh, 1991) is an alternative. Due to the
large number of municipalities (over 400), this density is found to be quite
sharply peaked, and the resulting small area estimates are very similar to the
EBLUP estimates. This is also true for the corresponding MSE estimates. The
28 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
contribution of the uncertainty in the variance parameters to the MSEs of the
small area estimates is negligible in this application.
Note the similarity between (7) and (4), especially when the variance
estimates ψ
i
in the latter are pooled. The main difference is that the fixed
effects in (7) are estimated at unit instead of area level. One advantage of
modeling at the unit level is that many more degrees of freedom are available
to fit a model, so that more variables available from registrations can be used
as covariates.
Without the area effects v
i
, model (6) is a simple linear regression model.
The EBLUPs based on models without random effects are often called synthetic
estimators in the survey sampling literature. The synthetic estimate based on
the linear regression model for area i is
θ
S
i
=
ˆ
β
X
i
,
ˆ
β =
X
X
1
X
y . (8)
Here, as in the last term of (7), the sample part n
i
¯y
i
in (5) has been approxi-
mated by the sum of the fitted values.
Since the variable unemployed is binary, a unit level model for binary data
may be more appropriate. A possible model for binary data is
y
ij
ind
Be
y
ij
( p
ij
),
logit( p
ij
) = β
x
ij
+ v
i
,
v
i
iid
N (0
2
v
),
(9)
where Be
z
( p) denotes the Bernoulli distribution for z with parameter 0 p 1,
and logit( p
ij
) log(
p
ij
1p
ij
). This model is known as the logistic-normal model,
a member of the family of generalized linear mixed models. Early references
on the use of the logistic-normal model for SAE are MacGibbon and Tomberlin
(1989) and Malec et al. (1997). Empirical Bayes predictions
θ
LN
i
for the small
area means are
θ
LN
i
=
1
N
i
n
i
¯y
i
+
jr
i
p
ij
(
ˆ
β, ˆv)
,
p
ij
(
ˆ
β, ˆv) = logit
1
(
ˆ
β
x
ij
v
i
) =
1
1 + exp(
ˆ
β
x
ij
−ˆv
i
)
,
(10)
where estimates of β and v
i
are plugged in. The penalized quasi-likelihood
method (Breslow and Clayton, 1993) is used to fit the logistic-normal model.
Prediction for non-linear models is computationally more cumbersome than
for linear models for two reasons. First, the fitting of non-linear models is gen-
erally more difficult. Second, the sum over the non-sampled units in (10) cannot
Towards small area estimation at Statistics Netherlands 29
be simplified in terms of population and sample means as in (7) and (8) for
linear models. It can at best be reduced to a sum over all unique configurations
of auxiliary vectors x
ij
occurring in the population of area i.
If the random effects v
i
in (9) are set to zero, a standard logistic regres-
sion model is obtained. The resulting small area estimates are called logistic
synthetic estimates.
3.3. Benchmarking
It is often desirable to benchmark the model-based small area estimates so
that they add up to direct estimates for large areas. For example, model-based
estimates can be benchmarked to the single national level GREG estimate. Also
the estimates for unemployment, employment and not in the labour force (nilf)
fractions, which are estimated using separate univariate models, are subject to
the constraint that they add up to 1. This results in the following restrictions:
1
N
m
i=1
N
i
θ
M;adj
i;a
=
θ
GREG
a
for a = 1, 2, 3 , (11)
3
a=1
θ
M;adj
i;a
= 1 for i = 1,...,m , (12)
where index a runs over the categories unemployed, employed and nilf.
The m × 3 table of small area estimates
θ
M
i;a
can be benchmarked to sat-
isfy (11) and (12) using the criterion of weighted least squares with the inverse
MSE estimates as weights (Battese et al., 1988). Stacking all 3m area estimates
in a vector
θ
M
and writing the 3+m 1 = m +2 restrictions (one is redundant)
as
θ
M;adj
= r with (m +2) ×3m matrix and (m +2) vector r , the method
of Lagrange multipliers gives
θ
M;adj
=
θ
M
+ V
(V
)
1
(r
θ
M
),
where V is a 3m ×3m covariance matrix, taken to be diagonal with the model-
based MSE estimates as elements.
3.4. Results from a simulation study
A simulation study is undertaken to compare the small area estimators
discussed. Population data for the simulation are constructed by replicating the
LFS sample data of one year, consisting of 44,687 households, to the population
level using the inclusion weights, see S¨arndal et al. (1992), Section 11.6. Only
first wave (CAPI) data are considered. The coordination of monthly samples
30 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
is such that nearly all municipalities are sampled within a year. Therefore,
the sampling design over a year can be regarded as stratified with respect to
municipality. The simulation results are based on 100 samples. Each sample
is a simple random sample of 44,687 households stratified with respect to the
444 municipalities.
Unemployment fractions for all municipalities are estimated using both
design-based and model-based methods. The design-based estimators consid-
ered are HT, GREG, and SREG estimators. They share the property of (ap-
proximate) design-unbiasedness. The model-based estimators considered are
those based on the models described previously. The main covariate used in
this study is registered unemployment, which is a good explanatory variable for
unemployment as measured in the LFS, even though there are large differences
in definition between the two.
The main simulation results are displayed in Figure 1. Results are shown
for the SREG estimator, and the model-based estimators based on linear regres-
sion, type A, type A with variances pooled, type B, and logit-normal models.
The displayed simulation measures are (1) the square root of the MSE over
the 100 simulation runs, (2) the square root of the simulation mean of the
model-based MSE estimates, and (3) the coverage of estimated 95 % confi-
dence intervals. These measures have been computed for all individual areas,
but shown are only their averages over three groups of municipalities, with
small, moderate and large population sizes, respectively.
With regard to all simulation measures considered, the SREG estimator
performs better than the other design-based estimators HT and GREG; the latter
are not shown in the figures. The simulation standard errors of both GREG and
HT estimates are approximately 10 % larger than those of the SREG estimates.
It is concluded that the model-based small area estimators, with the exception
of synthetic estimators, perform better than the design-based ones. As expected,
the difference is largest for the small municipalities. The synthetic estimates
perform somewhat better than the SREG estimates in terms of MSE. However,
the MSE estimates based on the fixed effects model are far too low, and hence
coverage of estimated confidence intervals is very low. This relatively poor
performance is partly due to the rather large dispersion of area unemployment
fractions in our simulation population, presumably much larger than in the real
population. This is a consequence of the way sample data have been used
to construct the population, even though municipalities with no unemployed
have been excluded. Nevertheless, standard model-based MSE estimates for
the synthetic small area estimates are clearly not very robust.
Due to the simple structure of auxiliary information used, the logistic
synthetic estimates are essentially the same as the synthetic estimates based on
the linear regression model. Also, the simulation results for the alternative type
A model estimates (4) are similar to those for the type A model. The use of
Towards small area estimation at Statistics Netherlands 31
0.0 0.005 0.010 0.015
average simulation RMSE
0 - 20000 inhabitants 20000-50000 inhabitants over 50000 inhabitants
Survey regression
Synthetic
Type A
Type A, pooled variances
Type B
Logistic-normal
0.0 0.005 0.010 0.015
average estimated RMSE
0 - 20000 inhabitants 20000-50000 inhabitants over 50000 inhabitants
Survey regression
Synthetic
Type A
Type A, pooled variances
Type B
Logistic-normal
87
16
78
87
93
89 93
15
89
93
96
95
93
15
93
95
94
95
Figure 1. Top: simulation root mean squared errors (RMSEs) for 6 estimators. Bottom: average MSE
estimates corresponding to the 6 estimators. The numbers on top of the bars denote 95 %
coverage percentages.
SREG estimates fitted at the unit level as input estimates to the type A model
does reduce the simulation MSEs somewhat, and in combination with pooled
variance estimates the results are very similar to those of the type B model, as
expected.
Pooling the variances in the type A area level model has a clear positive
effect: MSEs are somewhat smaller, and the coverage is improved. Overall,
32 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
the models with random area effects yield the best results in this simulation
study, both in terms of MSEs of the point estimates and width and coverage
of estimated confidence intervals. Type A (with pooled variances), type B and
logit-normal models perform quite similarly. In particular, in this study of
municipal unemployment fractions with registered unemployment as the only
covariate, there is no compelling reason for using the logistic-normal model
instead of a linear mixed model.
Figure 1 displays the simulation results before benchmarking. In a separate
step, small area estimates are benchmarked to satisfy (11) and (12). Since
aggregation yields estimates that are already in very close agreement with
the constraints, benchmarking turns out to have only very small effects on
the small area estimates. The only exception is the type A model without
variance pooling, which yields estimates with a significant downward bias for
unemployment at the national level. This bias disappears after pooling the
variance estimates of the direct area estimates.
3.5. Software tool for SAE
To make model-based SAE available for the production of official statistics,
Statistics Netherlands has started to develop a tool to support the required
computations. This is a conveniently accessible tool, not a set of cumbersome
scripts. As a first method the EBLUP based on the basic area level model is
implemented. This method is well accepted in the literature and sophisticated
enough to provide accurate small area estimates. Moreover, it can be deployed
in a process that naturally follows the weighting of survey data, from which
design-based estimates are derived as input.
The software tool can be launched from within SPSS and presents a graph-
ical user interface in which various settings can be entered, such as the option
to pool variance estimates. While the tool is implemented in the programming
languages Visual Basic and C#, it interacts with the SPSS software to pro-
vide an integrated system to the user. The software makes direct estimates,
fits the model using the Fay-Herriot moments estimator, and calculates EBLUP
estimates, which, if requested, are made consistent with direct estimates at ag-
gregated levels. Besides the calibrated EBLUP small area estimates, the SPSS
output table contains MSE estimates, direct estimates and corresponding vari-
ance estimates. Estimated model parameters are given as well. At this stage the
tool is prototype software. It is anticipated that future research outcomes will
lead to modifications and enhancements of the software, such as the addition
of alternative models. Once the SAE methodology is an accepted approach in
the production of official statistics at Statistics Netherlands, the prototype can
be used as a base for building a production grade software tool to be used in
regular operations.
Towards small area estimation at Statistics Netherlands 33
4. Borrowing strength over time and space
Most surveys conducted by NSIs operate continuously in time and are
based on cross-sectional or rotating panel designs. SAE procedures that borrow
strength from data collected in the past as well as cross-sectional data from
other small domains are particularly interesting in such situations. The LFS, for
example, is conducted continuously in time and the monthly unemployment rate
is correlated with the unemployment rate in the preceding periods. Therefore it
is efficient to use data observed in preceding periods to improve the estimator
for this parameter through time series modeling. This approach dates back
to Scott and Smith (1974), who proposed to consider the true value of the
finite population parameter as a realization of a stochastic process that can be
described with a time series model.
The common approach to borrow strength over time and space is to allow
for random domain and random time effects in a linear mixed model and
apply a composite estimator like the EBLUP. Rao and Yu (1994) extended the
area level model with an AR(1) model to combine cross-sectional data with
information observed in preceding periods. In EURAREA (2004) linear mixed
models that allow for spatial and temporal autocorrelation in the random terms
are proposed for area and unit level models. A different approach is followed
by Pfeffermann and Burck (1990) and Pfeffermann and Bleuer (1993). They
combine time series data with cross-sectional data by modeling the correlation
between the parameters of the separate domains in a multivariate structural
time series model. Pfeffermann and Burck (1990) show how the Kalman filter
recursions under particular state-space models can be restructured, like the
EBLUP estimators, as a weighted average of a design-based estimator and a
synthetic regression type estimator based on information observed in preceding
sample surveys and other small domains.
4.1. Multivariate structural time series models for monthly unemployment rates
In this section the state-space approach for repeated surveys of Pfeffermann
and Burck (1990) and Pfeffermann and Bleuer (1993) is applied to develop
model-based estimates for the monthly unemployment rates for a classification
of gender by age in six domains. The state-space approach is applied because
it can handle a very flexible and powerful class of models that account for
trend, seasonal effects and auxiliary information. This approach has a high
practical value for official statistics. First, Pfeffermann and Burck (1990) and
Pfeffermann and Tiller (2006) made this estimation procedure more robust
against model misspecification by benchmarking the sum of the small area
estimates to the direct estimates at an aggregated level. Second, seasonally
adjusted parameter estimates and their estimation errors are obtained as a by-
34 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
product of this estimation procedure. Third, this approach accounts for the
complexity of the sampling design, since the GREG estimates are the input
data for the model. Finally, this approach can be extended to models that
account for the rotation group bias and for the autocorrelation between the
panels of a rotating panel design (Pfeffermann, 1991 and Pfeffermann et al.,
1998). Key references to early papers that develop the state-space approach for
repeated surveys are Tam (1987) and Tiller (1992). In this paper, a relatively
simple model is discussed, where only the first wave of the rotating panel
design of the LFS is used, and where no auxiliary information is included. An
application of a model that uses all waves of the LFS is given in Van den
Brakel and Krieg (2007).
Let
θ
i,t
denote the GREG estimates for the true unemployment rate θ
i,t
of domain i and month t based on monthly samples for the following six
domains: (1) Men, 15-24 year, (2) Women, 15-24 year, (3) Men, 25-44 year,
(4) Women, 25-44 year, (5) Men, 45-64 year, (6) Women, 45-64 year. Each
month a vector
θ
t
= (
θ
1,t
,... ,
θ
6,t
)
is observed. The time series of this vector
is decomposed in a stochastic trend, a stochastic seasonal component and an
irregular component, i.e.
θ
t
= L
t
+ S
t
+ ǫ
t
, with L
t
= (L
1,t
,... ,L
6,t
)
the
vector with trends, S
t
= (S
1,t
,... ,S
6,t
)
the vector with seasonal effects and
ǫ
t
=
1,t
,... ,ǫ
6,t
)
the vector with irregular components.
For the stochastic trends, the so-called smooth trend model is assumed, i.e.
L
i,t
= L
i,t1
+ R
i,t1
, R
i,t
= R
i,t1
+ η
R,i,t
, E
R,i,t
) = 0,
Cov
R,i,t
R,i
,t
) =
σ
2
R,i
if t = t
and i = i
ζ
R,i,i
if t = t
and i = i
0ift = t
.
(13)
The parameters L
i,t
and R
i,t
are referred to as the trend and the slope parameters
respectively. For the seasonal components a trigonometric model is assumed:
S
i,t
=
6
h=1
S
i,t,h
,
S
i,t,h
= S
i,t1, h
cos
h
) + S
i,t1, h
sin
h
) + η
S,i,t,h
,
S
i,t,h
=−S
i,t1,h
sin
h
) + S
i,t1, h
cos
h
) + η
S,i,t,h
,
λ
h
= hπ/6, for h = 1,... ,6,
with
E
S,i,t,h
) = 0,
Cov
S,i,t,h
S,i
,t
,h
) =
σ
2
S,i, h
if t = t
and i = i
and h = h
0ift = t
or i = i
or h = h
.
(14)
Towards small area estimation at Statistics Netherlands 35
The assumptions of (14) also apply to the error terms η
S,i,t,h
. The vari-
ances of the error terms within each harmonic are equal, i.e. σ
2
S,i, h
= σ
2
S,i, h
,
where σ
2
S,i, h
denotes the variance of η
S,i,t,h
. Furthermore it is assumed that
Cov
S,i,t,h
S,i
,t
,h
) = 0 for all i , i
, t, t
, h and h
. The irregular compo-
nents ǫ
i,t
are modeled as uncorrelated white noise processes. These irregular
components contain the survey errors of the GREG estimates and the unex-
plained variation of the stochastic process used to model the true population
parameter θ
i,t
.
The trend and the seasonal components of the time series model describes
how the unemployment rate in month t is related to the unemployment rates
in the preceding months for each particular domain. This shows how sample
information obtained in preceding periods is used to improve the estimates
for the unemployment rates in month t. The model uses sample information
from other domains through the correlation between the slope parameters of
the trend. This ensures that the trend in the estimated unemployment rates for
the different domains change more or less simultaneously, depending on the
estimated correlation. In a similar way, it is possible to model the correlation
between the seasonal components. Under the trigonometric model this will
result in a rather complex model with a large number of hyperparameters.
A more rigid approach, assuming that the seasonal components for different
domains are equal, can be applied instead.
The standard way to proceed, is to express the model in state-space repre-
sentation, assume that the error terms are normally distributed, and apply the
Kalman filter to obtain optimal estimates for the monthly unemployment rates
as well as the trend and the seasonal components in these series, see Harvey
(1989) or Durbin and Koopman (2001) for details. The analysis is conducted
with software developed in Ox in combination with the subroutines of SsfPack
(beta 3) (Doornik, 1998 and Koopman et al., 1999).
4.2. Results
In this section, two models are applied to the series of the GREG estimates
of the monthly unemployment rates from 1996 through 2006. The first model,
abbreviated as TSM1, only borrows strength over time, since it assumes a
univariate model for each domain by taking ζ
R,i,i
= 0in(13). The second
model, abbreviated as TSM2, assumes equal correlations between the slopes
of the smooth trend model, thus ζ
R,i,i
= ρ
R
σ
R,i
σ
R,i
in (13), and therefore
borrows strength both over time and space.
In Figure 2, the smoothed Kalman filter estimates for the slopes of the
six domains are plotted for TSM1 and TSM2. The figures illustrate that the
trend in the unemployment rate is declining during the first three years, since
the slope parameters take negative values. From 2001 until 2005 the trend
36 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
1998 2000 2002 2004 2006
-0.003 -0.001 0.001 0.003
Year
Unemployment rate
Domain 1
Domain 2
Domain 3
Domain 4
Domain 5
Domain 6
(a) TSM1
1998 2000 2002 2004 2006
-0.003 -0.001 0.001 0.003
Year
Unemployment rate
Domain 1
Domain 2
Domain 3
Domain 4
Domain 5
Domain 6
(b) TSM2
Figure 2. Smoothed estimates of slope parameters for models TSM1 and TSM2.
gradually increases, since the slope parameters take positive values during this
period. From the second half of 2005 the trend in the unemployment rate is
decreasing again. It follows from Figure 2a that the slopes in the univariate
models for the six domains move more or less simultaneously up and down.
Towards small area estimation at Statistics Netherlands 37
Model TSM2 takes advantage of this, by explicitly allowing for correlation
between the slopes. The maximum likelihood estimate for the correlation, ρ
R
,
equals 0.85. As can be seen in Figure 2b, the slopes for the domains are more
similar under this model.
The estimated seasonal patterns are almost equal under both models. For
domains 1, 4, 5, and 6, the models find rather flexible seasonal patterns. For
domains 2 and 3, the seasonal patterns are more or less time independent.
The seasonal patterns of domains 1 and 3 under model TSM1 are plotted as
examples in Figure 3.
1998 2000 2002 2004 2006
-0.02 0.0 0.02
Year
Seasonal
(a) Domain 1
1998 2000 2002 2004 2006
-0.005 0.0 0.005
Year
Seasonal
(b) Domain 3
Figure 3. Smoothed estimates of seasonal components for domains 1 and 3.
In Figure 4 the GREG estimates and the filtered estimates based on the two
models are compared for domains 2, 5, and 6. The filtered estimates are
used because they are based on the complete set of information that would be
available in the regular production process to produce a model-based estimate
for month t, directly after finishing the data collection for that month. The
filtered estimates under both models partly follow the fluctuations in the GREG
series, since these fluctuations are considered as time dependent seasonal effects
under the assumed model. A substantial part of the irregularities in the series
of the GREG estimates, however, are flattened out, since they are considered
as survey errors under both models. The series of the filtered estimates are
at the level of the series of the GREG estimates indicating that there is no
obvious bias in the filtered estimates.
Comparing the filtered estimates obtained under both models shows that
the time dependent seasonal patterns are the same. Small differences occur
in the level of the series. The correlation between the slopes results in small
adjustments in the level of the estimated monthly unemployment rates. For
example, the estimated monthly unemployment rates in domain 6 are slightly
adjusted downward during the last two years of the series.
38 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
1998 2000 2002 2004 2006
0.0 0.05 0.10 0.15
Year
Unemployment rate
GREG
STM1
(a) Domain 2
1998 2000 2002 2004 2006
0.0 0.05 0.10 0.15
Year
Unemployment rate
STM1
STM2
(b) Domain 2
1998 2000 2002 2004 2006
0.0 0.02 0.04 0.06
Year
Unemployment rate
GREG
STM1
(c) Domain 5
1998 2000 2002 2004 2006
0.0 0.02 0.04 0.06
Year
Unemployment rate
STM1
STM2
(d) Domain 5
1998 2000 2002 2004 2006
0.0 0.02 0.06 0.10
Year
Unemployment rate
GREG
STM1
(e) Domain 6
1998 2000 2002 2004 2006
0.0 0.02 0.06 0.10
Year
Unemployment rate
STM1
STM2
(f) Domain 6
Figure 4. GREG estimates and filtered estimates of TSM1 and TMS2 for domains 2, 5, and 6.
In Figure 5 the standard errors of the filtered estimates are compared with
the standard errors of the GREG estimates for domains 2 and 6. In Table I
the averages of the standard errors over the 12 months of 2006 are given for
the six domains. The variance of the GREG estimates is approximated with
the variance of the ratio of the GREG estimators for total unemployment and
total labour force under multistage sampling where the households are used
as the PSUs. The standard errors of the filtered estimates of TSM1 are much
smaller than the standard errors of the GREG estimates. This illustrates that
Towards small area estimation at Statistics Netherlands 39
1998 2000 2002 2004 2006
0.0 0.01 0.02 0.03
Year
Standard error
GREG
STM1
STM2
(a) Domain 2
1998 2000 2002 2004 2006
0.0 0.005 0.010 0.015 0.020
Year
Standard error
GREG
STM1
STM2
(b) Domain 6
Figure 5. Standard errors of GREG estimates and filtered estimates for domains 2 and 6.
borrowing strength from preceding time periods increases the precision of the
estimates substantially.
Comparing the standard errors of TSM1 and TSM2 shows that the preci-
sion can be further improved by using information from other domains. The
additional gain, however, is relatively small compared to the reduction in the
standard errors that is obtained by borrowing strength from the past. The stan-
40 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
Table I: Standard error GREG and filtered estimates for two models, mean of 2006.
Estimator Domain 1 Domain 2 Domain 3 Domain 4 Domain 5 Domain 6
GREG 0.0189 0.0239 0.0060 0.0080 0.0070 0.0101
TSM1 0.0097 0.0108 0.0031 0.0043 0.0033 0.0056
TSM2 0.0094 0.0096 0.0029 0.0041 0.0031 0.0054
dard errors of the Kalman filter estimates do not reflect the bias due to model
misspecification. This requires a careful model evaluation and selection, which
is addressed in Section 5.
5. Model evaluation and selection
It is always important to assess the assumptions underlying an estimation
procedure. Model assessment is particularly important in small area estima-
tion, since the data are sparse so that estimates are more sensitive to model
choice. A large literature exists on model comparison; see e.g. the overview
in Sorensen (2004) and references therein. The literature on model compari-
son and diagnostics specific to small area estimation is smaller, but growing.
Section 6 of Jiang and Lahiri (2006) provides a short overview and some key
references.
Measures for model selection usually weigh some form of goodness-of-fit
against model complexity. However, goodness-of-fit measures may not always
reflect the actual purpose of a study. Some types of lack of fit may not
be important for the purpose of the study while other types may yield poor
results even for what seems “a small amount of lack of fit”. For example,
models that yield good estimates for the overall population mean or total may
not be adequate for small area estimates. Cross-validation (CV) is a more
direct measure of the predictive power of a model that can be used to compare
models. Individual models can also be assessed using certain model diagnostics
that indicate whether a model is appropriate in some relevant aspect. Brown
et al. (2001) describe several diagnostics specific to small area estimation.
Harvey (1989) and Durbin and Koopman (2001) discuss several diagnostics for
state-space time series models. Model diagnostics should ideally indicate in
what directions a deficient model can be improved.
Not all knowledge about a system and the data collection mechanism can
be accounted for in a model; the model would become too complicated. For
example, measurement errors are usually ignored in survey sampling. Neglect-
ing measurement errors in the model building process may have a large impact
on the results, however. Consider the realistic situation that each small area
is assigned to a single interviewer. In a study with large interviewer effects,
a model with random area effects would capture such effects as real whereas
Towards small area estimation at Statistics Netherlands 41
in a model without area effects the interviewer effects would largely average
out. In this situation, a synthetic estimator may actually work better than an
EBLUP based on a mixed model.
Models should be evaluated not only with regard to the small area estimates
they produce but also with regard to their standard errors. For example, models
without area effects may yield reasonable small area estimates, but sometimes
produce far too small standard errors, as observed for the synthetic estimates
in the simulation study of Section 3.4.
These considerations indicate that model evaluation and selection is a com-
plicated process, for which no single standard procedure exists. In the context
of small area estimation, and survey sampling in general, this process not
only involves subject matter knowledge about the variables of interest, but also
knowledge about the data collection process. Information used in the sampling
design is usually relevant to the main characteristics of interest. Design-based
estimators incorporate this information directly, most importantly by using in-
verse inclusion probabilities as weights. In order to avoid selection bias in
model-based procedures, such information should also be taken into account in
the modeling effort. If possible, relevant variables related to inclusion proba-
bilities or non-response propensities should be included as covariates, see e.g.
Gelman et al. (2003), Chapter 7. In the following two subsections, some pre-
liminary results about the use of model selection measures and diagnostics are
described for the application of SAE to the LFS.
5.1. Mixed models
A simulation study as described in Section 3.4 is a useful instrument to
select an appropriate model, especially when qualitatively different models are
compared, such as normal linear against specific discrete data models, and
unit against area level models. However, such studies are time consuming and
it is generally impossible to create study populations that are realistic in all
important aspects. One usually has to rely on model diagnostics and model
comparison measures, evaluated using the sample data at hand.
For the selection of a model within a certain class, such as the class of
normal linear mixed models, or the selection of a suitable set of covariates,
several model comparison measures exist. Widely used model comparison
measures are AIC and BIC, which combine a measure of goodness-of-fit (the
log-likelihood) and a penalty term for the complexity of the model, which in
the case of linear regression models is simply the number of parameters p in
the model. These measures are given by
AIC =−2
L + 2 p , BIC =−2L + log(n) p , (15)
where
L is the log-likelihood at the parameter estimates and n is the number
of observations (units or areas, depending on the model). In mixed models
42 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
the number p of model degrees of freedom is more difficult to determine,
because random effects should contribute something between 0 and their total
number as their variance varies from 0 (complete pooling) to (no pooling).
A practical solution to this problem in the case of linear mixed models is to
use the effective number of degrees of freedom defined by the trace of the
hat matrix H , which takes the data to the fitted values, i.e. ˆy = Hy, see
Hastie et al. (2003), Chapter 7. This reduces to the number of fixed effects
in the case of complete pooling and to the total number of fixed plus random
effects in the case of no pooling, and is in between in the case of partial
pooling.
Another widely applicable measure is CV. For area level models the use
of CV is perhaps not so obvious, since in that case CV measures how well
the direct input estimates, themselves subject to large variances, are predicted
by the model. Nevertheless, CV appears to be a useful measure also for
type A models. In that case a CV measure is given by (see the notation of
Section 3.1)
CV =
m
i=1
w
i
θ
i
θ
A(i)
i
2
=
m
i=1
w
i
θ
i
θ
A
i
1 H
ii
2
, (16)
where
θ
A(i)
i
= Z
i
ˆ
β
(i)
with
ˆ
β
(i)
coefficients estimated from the data excluding
the i th area, w
i
are adjustable weights and H
ii
γ
i
+(1−ˆγ
i
) ˆγ
i
Z
i
(
j
ˆγ
j
Z
j
Z
j
)
1
Z
i
is the i th diagonal element of the hat matrix. The second equality in (16) can
be derived using the matrix inversion lemma, provided that a common value
of
σ
2
v
(the one based on all areas) is used in all m “minus one” fits. Its
practical value is that no additional model fitting is necessary to compute the
CV measure.
In a case study, AIC, BIC and CV are compared. Municipal unemployment
fractions are estimated using the type A model with 43 different combinations
of certain demographic covariates, the smallest model being the constant and
the largest model ethnicity × (degree of urbanisation + household composition)
with a total of 56 fixed effects. It is found that BIC is very similar to CV with
weights w
i
chosen proportional to the estimated variance ratio ˆγ
i
, defined below
equation (3). Another finding is that AIC does not penalize enough, since it
chooses the most complex model, which actually is among the models with the
largest true MSEs. BIC and especially CV achieve a higher correlation with
the true errors over the set of models. An advantage of CV in this case is that
the weights w
i
can be chosen to better reflect the objectives. For example, the
choice of constant weights is appropriate when all areas are considered equally
important, whereas the aforementioned choice w
i
∝ˆγ
i
gives more weight to
areas that can be better estimated using direct estimates, which are the larger
municipalities in this case.
Towards small area estimation at Statistics Netherlands 43
Examining residuals is an important part of model diagnosis. For example,
in the case of municipal estimates one may find residual spatial correlation,
indicating that the model would benefit from including some form of spatial
correlation. In a further extension of the simulation study, small improvements
of municipal unemployment estimates are observed after adding an exponential
spatial correlation structure to the basic unit level model. The improvements
are largest (10 % reduction in standard errors, on average) when degree of
urbanisation is included as a covariate. However, if registered unemployment is
used as a covariate, the improvements due to the exponential spatial correlation
structure are found to be negligible.
In Brown et al. (2001), a coverage diagnostic is developed as a model
evaluation method for SAE procedures. From the design-based estimates and
their variance estimates, approximate 95 % confidence intervals can be formed.
If the model estimates are similar to the true population values, these intervals
should cover the model estimates in around 95 % of the cases. Substantially
smaller rates can indicate that the model estimates are strongly biased. Larger
rates, on the other hand, can be an indication that the model overfits the data.
In the simulation study of Section 3.4 the synthetic estimator does not per-
form as well as the estimators based on models with random effects, especially
concerning the corresponding MSE estimates. Using design-based SREG esti-
mates and variance estimates to form approximate 95 % confidence intervals,
the coverage diagnostic for the synthetic estimates is only 78 %, much lower
than for the estimates based on models with random effects. Here this indicates
overshrinkage of the synthetic estimates, which can be overcome by including
random area effects. In using such a coverage diagnostic, one should keep in
mind that the approximate 95 % design-based confidence intervals themselves
may have lower than nominal coverage of the true population parameter due to
the small sample sizes, that the model-based estimates are also subject to un-
certainty, and that the design-based and model-based estimates can be strongly
correlated. Another diagnostic is the difference between model-based estimates
and direct estimates at an aggregate level. A large difference may be corrected
by benchmarking the model-based estimates, or by adjusting the model in an
appropriate way.
5.2. Structural time series models
Alternative variants of the time series models applied in Section 4 are
considered. Competing models assume that the variances for the harmonics
in the trigonometric seasonal model are equal or use the well known dummy
variable seasonal model, i.e.
11
h=0
S
i,th
= η
S,i,t
with E
S,i,t
) = 0. These
models result in less flexible seasonal patterns in this particular application. The
more flexible the seasonal patterns, the more past observations are discounted
44 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
in constructing a seasonal pattern in the model estimates for the unemployment
rate. The dummy variable seasonal model as well as the trigonometric model
with equal variances for the harmonics make more use of sample information
observed in the past, which results in smaller standard errors for the filtered
estimates of the monthly unemployment rates. In Krieg and Van den Brakel
(2007), several models using the dummy variable seasonal model are applied
to the series of Section 4.2.
The underlying assumptions of the state-space model are that the distur-
bances of the measurement and system equations are normally distributed and
serially independent with constant variances. Under these assumptions, the
prediction errors (
θ
i,t
ˆ
θ
TSMx
i,t|t 1
), for x =1or 2, are also normally distributed
and serially independent with constant variance, where
ˆ
θ
TSMx
i,t|t 1
denotes the one
step forecast for time period t using the information observed until time pe-
riod t 1. There are different diagnostics available in the literature to test to
what extent these assumptions are tenable, see Durbin and Koopman (2001),
Section 2.12. These diagnostic tests indicate that the prediction errors of the
dummy variable seasonal model contain more autocorrelation compared to the
trigonometric model with separate variances for the harmonics.
The coverage diagnostics described in Section 5.1 are almost equal for
TSM1 and TSM2 and vary between 93 % and 99 % for the individual domains.
Therefore this diagnostic does not indicate that the model estimates are biased
nor that they are too close to the GREG estimates. The coverage diagnostics
for the dummy variable seasonal models are similar, see Krieg and Van den
Brakel (2007). Coverage rates for subsets of the series can also provide valuable
information. For example coverage rates for the same months over the different
years (all Januaries etc.) can be used to check whether the seasonal patterns
are modeled adequately.
The mean of the absolute values of the prediction errors or the mean of the
squared prediction errors can be used as a form of CV to measure the devia-
tion of the model forecasts from the GREG estimates. These measures indicate
that TSM2 performs slightly better than TSM1. The prediction errors obtained
with the trigonometric seasonal models are also slightly smaller compared to
the dummy variable seasonal models (Krieg and Van den Brakel, 2007). The
trigonometric models have slightly larger standard errors for the filtered esti-
mates, but better meet the stated model assumptions and have slightly smaller
prediction errors and are therefore preferred over dummy variable seasonal
models in this application.
The tests for heteroscedasticity indicate that the assumption of constant
variance in the prediction errors is violated for domain 1 and 5 under model
TSM1 and TSM2; the variance of the prediction errors is larger in the second
part of these series. This heteroscedasticity can be partially diminished by taking
the variances of the disturbances in the measurement equation proportional to
Towards small area estimation at Statistics Netherlands 45
the inverse of the sample size. This does not result in significant changes in
the filtered estimates of the unemployment rates.
AIC and BIC can be used to compare and select time series models.
Their use in the context of state space modeling is, however, not straight-
forward. Standard expressions for these criteria only penalize the number of
hyperparameters in the state space model, see e.g. Harvey (1989), Section 2.6.
Deterministic components in the state vector are not penalized. Nor does
this penalty account for the increased model complexity if state variables for
separate domains share the same hyperparameter. For example this standard
expression gives the same penalty to the model where all domains have the
same seasonal component and to a model where all domains are allowed to
have separate seasonal components, but share the same hyperparameters. One
strategy is to penalize the number of hyperparmeters and the number of non-
stationary elements in the state vector, see Harvey (1989), Section 5.5. This
strategy, however, still does not account for the fact that large values for the
variance parameters of the state variables increase the effective number of de-
grees of freedom of the model. Indeed, large values for these hyperparameters
allow large adjustments of the state variables and increase the effective number
of degrees of freedom as in the case of random components in linear mixed
models. The estimates of the state-space models of Section 4 are, given the
hyperparameters, linear expressions in the data, i.e. of the form Hy, where y is
the data vector containing the observed time series
ˆ
θ
i,t
. Therefore the effective
number of degrees of freedom, defined by the trace of H,isanalternative
choice for the effective number of model parameters that can be used in model
selection criteria, such as AIC and BIC, Hastie et al. (2003), Chapter 7.
6. Discussion
As emphasized in the introduction, Statistics Netherlands is rather reserved
in the application of model-based estimation procedures for producing official
statistics. Several properties of the GREG estimators make them very attractive
to produce official releases in a regular production environment where there is
generally limited time available for the analysis phase. First, they are robust in
the sense that model-misspecification does not compromise design-consistency
for large sample sizes. Second, they are often used to produce one set of
weights for the estimation of all target parameters of a multi-purpose sample
survey. This is not only convenient but also enforces consistency between the
marginal totals of different publication tables.
A major drawback of the GREG estimator is its relatively large design
variance in the case of small sample sizes. In such situations, model-based
estimation procedures might be used to produce sufficiently reliable statistics,
46 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
since they have much smaller variances. The price that is paid for this variance
reduction is that these model-based estimators are more or less design-biased.
Model-misspecification easily results in severely biased estimates, particularly
when design features are not taken into account. Careful model selection
and validation is a central part of the application of these model-dependent
procedures. To facilitate the use of model-based procedures in official statistics,
methods that have some built-in mechanism against model-misspecification are
preferable. The benchmark approach discussed in Section 3.3, for example,
reduces the sensitivity to model-misspecification in the context of linear mixed
models. A similar approach for time series models is developed by Pfeffermann
and Burck (1990) and Pfeffermann and Tiller (2006).
In this paper linear mixed models are applied to estimate annual munic-
ipal unemployment fractions and multivariate structural time series models to
estimate monthly unemployment rates. As expected, the precision of the direct
estimators can be improved considerably under both approaches. It appears that
most is gained by using sample information observed in the past. This makes
the time series approach very attractive for repeated surveys, which are widely
used at NSIs. The time series approach also fits in a framework for producing
preliminary timely releases. At the start of the data collection period, the time
series model yields forecasts for the population parameters (this is sometimes
called nowcasting). When new survey data become available, timely prelimi-
nary and final estimates can be produced, taking advantage of data collected in
the past and in neighbouring areas. If the number of areas or domains becomes
large, as in the case of municipal unemployment figures, the dimensions of the
multivariate time series models are likely to result in estimation problems. In
such cases the linear mixed models might be preferable. The linear mixed
models considered in this paper can be improved by taking advantage of sam-
ple information from the past. The formal approach is to allow for temporal
autocorrelation in the random terms of the linear models (Rao and Yu, 1994
and EURAREA, 2004). A simpler alternative is to use estimates obtained in
preceding periods as auxiliary information in the model. Another possibility is
to apply univariate time series models to make model-based estimates that bor-
row strength over time. Subsequently, these estimates and corresponding MSE
estimates are used as the input for an area level model to borrow strength over
space. An interesting topic for future research is to compare the performances
of the multivariate time series models with linear mixed models that include
both random area and time effects.
Another preliminary finding for the unit level model is that including an
exponential spatial correlation structure improves the standard errors. These
findings are in line with EURAREA (2004) where the use of spatial correla-
tion structures in linear mixed models is explored. Also Pratesi and Salvati
(2008) use simultaneously autoregressive models and conditional autoregressive
Towards small area estimation at Statistics Netherlands 47
models to define spatial correlation structures in an area level model and report
important model improvements in a simulation study. Therefore the use of
spatial correlation structures is another possibility to improve the linear mixed
models for the Dutch LFS.
It is concluded that there is a case for basing official statistics on model-
based procedures in situations where design-based estimators do not result
in sufficiently reliable estimates. Such releases should be accompanied by
appropriate methodology and quality descriptions, where the underlying model
assumptions are stated explicitly.
The statistical theory of model-based SAE is rather complex and the avail-
able software at NSIs is often not suitable to conduct the required calculations
in a straightforward manner in a production environment. To facilitate the use
of SAE in survey processes, a user friendly software tool that can be launched
from SPSS is currently being developed. So far the basic area level model has
been implemented. A similar tool to support the structural time series approach
is desired.
Acknowledgments
The authors wish to thank Professor D. Pfeffermann for his advice during this project, and
Professor S.J. Koopman for making the beta version of Ssfpack 3 available. The views expressed in
this paper are those of the authors and do not necessarily reflect the policies of Statistics Netherlands.
REFERENCES
Battese, G. E., Harter, R. M., and Fuller, W. A. (1988) An error components model for
prediction of county crop areas using survey and satellite data, Journal of the American
Statistical Association, 83, 28–36.
Brakel, J. A. Van Den and Krieg, S. (2007) Modelling Rotation Group Bias and Survey Errors in
the Dutch Labour Force Survey, In: Proceedings of the Section on Survey Research Methods,
American Statistical Association, 2675–2682.
Breslow, N. E. and Clayton, D. G. (1993) Approximate Inference in Generalized Linear Mixed
Models, Journal of the American Statistical Association, 88, 9-25.
Brown, G., Chambers, R., Heady, P., and Heasman, D. (2001) Evaluation of Small Area Estima-
tion Methods - an Application to Unemployment estimates from the UK LFS, In: Proceedings
of Statistics Canada Symposium, 2001.
Chambers, R., Van Den Brakel, J. A., Hedlin, D., Lehtonen, R., and Zhang, Li-Chun (2006)
Future Challenges of Small Area Estimation, Statistics in Transition,7,759–769.
Datta, G. S. and Ghosh, M. (1991) Bayesian Prediction in Linear Models: Applications to Small
Area Estimation, The Annals of Statistics, 19, 1748–1770.
Datta, G. S., Rao, J. N. K., and Smith, D. D. (2005) On measuring the variability of small area
estimators under a basic area level model, Biometrika, 92, 183–196.
Doornik, J. A. (1998) Object-Oriented Matrix Programming using Ox 2.0,Timberlake Consultants
Press, London.
48 H. J. BOONSTRA – J. A. VAN DEN BRAKEL – B. BUELENS – S. KRIEG – M. SMEETS
Durbin, J. and Koopman, S. J. (2001) Time series analysis by state space methods, Oxford University
Press, Oxford.
Eurarea (2004) Project reference volume, deliverable D7.1.4, Technical report, EURAREA consor-
tium.
Fay, R. E. and Herriot, R. A. (1979) Estimates of income for small places: an application of
James-Stein procedures to Census data, Journal of the American Statistical Association, 74,
269–277.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian Data Analysis,
Chapman & Hall/CRC, London.
Harvey, A. C. (1989) Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge
University Press, Cambridge.
Hastie, T., Tibshirani, R., and Friedman, J. H. (2003) The Elements of Statistical Learning,
Springer, New York.
Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement
from a finite universe, Journal of the American Statistical Association, 47, 663–685.
Jiang, J. and Lahiri, P. (2006) Mixed Model Prediction and Small Area Estimation, Test, 15, 1–96.
Koopman, S. J., Shephard, N., and Doornik, J. A. (1999) Statistical Algorithms for Models in
State Space using SsfPack 2.2, Econometrics Journal,2,113-166.
Krieg, S. and Van Den Brakel, J. A. (2007) Model evaluation for multivariate structural time
series models for the Dutch Labour Force Survey, In: Proceedings of the Section on Survey
Research Methods, American Statistical Association, 2767–2774.
Macgibbon, B. and Tomberlin, T. J. (1989) Small Area Estimates of Proportions Via Empirical
Bayes Techniques, Survey Methodology, 15, 237–252.
Malec, D., Sedransk, J., Moriarity, C. L., and Leclere, F. B. (1997) Small Area Inference for
Binary Variables in the National Health Interview Survey, Journal of the American Statistical
Association, 92, 815–826.
Marker, D. A. (2001) Producing Small Area Estimates From National Surveys: Methods for
Minimizing Use of Indirect Estimators, Survey Methodology, 27, 183–188.
Narain, R. (1951) On sampling without replacement with varying probabilities, Journal of the Indian
Society of Agricultural Statistics,3,169–174.
Pfeffermann, D. (1991) Estimation and Seasonal Adjustment of Population Means Using Data from
Repeated Surveys, Journal of Business & Economic Statistics,9,163–175.
Pfeffermann, D. and Bleuer, S. R. (1993) Robust Joint Modelling of Labour Force Series of Small
Areas, Survey Methodology, 19, 149–163.
Pfeffermann, D. and Burck, L. (1990) Robust Small Area Estimation combining Time Series and
Cross-sectional Data, Survey Methodology, 16, 217–237.
Pfeffermann, D., Feder, M., and Signorelli, D. (1998) Estimation of Autocorrelations of Survey
Errors with Application to Trend Estimation in Small Areas, Journal of Business & Economic
Statistics, 16, 339–348.
Pfeffermann, D. and Tiller, R. (2006) Small Area Estimation with State Space Models Subject
to Benchmark Constraints, Journal of the American Statistical Association, 101, 1387–1397.
Pratesi, M. and Salvati, N. (2008) Small area estimation: the EBLUP estimator based on spatially
correlated random area effects, Statistical Methods and Applications, 17, 113–141.
Rao, J. N. K. (2003) Small Area Estimation,Wiley, New York.
Rao, J. N. K. and Yu, M. (1994) Small-area estimation by combining time-series and cross-sectional
data, The Canadian Journal of Statistics, 22, 511–528.
Towards small area estimation at Statistics Netherlands 49
S
¨
arndal, C-E., Swensson, B., and Wretman, J. (1992) Model Assisted Survey Sampling, Springer,
New York.
Sorensen, D. (2004) An Introductory Overview of Model Comparison and Related Topics,
http://www.dcam.upv.es/acteon/docs/modselmaster.pdf.
Tam, S. M. (1987) Analysis of Repeated Surveys using a Dynamic Linear Model, International
Statistical Review, 55, 63–73.
Tiller, R. B. (1992) Time Series Modelling of Sample Survey Data from the U.S. Current Population
Survey, Journal of Official Statistics,8,149–166.
Valliant, R., Dorfman, A. H., and Royall, R. M. (2000) Finite Population Sampling and
Inference-APrediction Approach,Wiley, New York.
HARM JAN BOONSTRA
JANA.VAN DEN BRAKEL
BART BUELENS
SABINE KRIEG
MARC SMEETS
Department of Statistical Methods
Statistics Netherlands
P.O. Box 4481
6401 CZ Heerlen
The Netherlands
hbta@cbs.nl
... The synthetic and composite estimators have been analyzed using the Iranian data in [13]. Among other estimators, EBLUP, based on the FH model, is applied to the Dutch [2], Lithuanian [14], and Swiss [21] unemployment data. The conclusions of our comparative simulation study are discussed in Section 4. ...
... To find a compromise between larger variances of approximately unbiased estimators (2) and biases of synthetic estimators (4), their linear compositions ...
... Model (14) is an estimated and extended regression (3), which generates synthetic estimates (4) and their randomized analogues. For the value θ = 1 and the set of indices I H , this model imitates direct estimates (2). Introduce the characteristics of random variables (14): A similar interpretation is that the probabilitỹ p i;j describes the distance from the domain U i to the domain U j of interest by means of the difference in their sizesτ i andτ j , taking into account the distances among all domains. ...
Article
Full-text available
Small area estimation techniques are used in sample surveys, where direct estimates for small domains are not reliable due to small sample sizes in the domains. We estimate the domain means by generalized linear compositions of the weighted sample means and the synthetic estimators that are obtained from the regression-synthetic model of fixed effects, based on the domain level auxiliary information. In the proposed method, the number of parameters of optimal compositions is reduced to a single unknown parameter, which is further evaluated by minimizing an empirical risk function. We apply various composite and related estimators to estimate proportions of the unemployed in a simulation study, based on the Lithuanian Labor Force Survey data. Conclusions on advantages and disadvantages of the proposed compositions are obtained from this empirical comparison.
... We compare the estimators and their MSE estimators in the simulation study using the Lithuanian Labor Force Survey (LFS) data, where fractions of the unemployed and employed are the proportions of interest estimated in municipalities. Applications of EBLUPs to LFS unemployment data of other countries are found, for example, in [1,8,12]. SSD compositions, with subjectively chosen values of the parameter, are used in [6,18]. ...
... of ψ i are based on the ordinary least squares estimates of the regression parameters. Other smoothing examples are pooled variance estimation [1] and a nonparametric smoothing like in [8]. Despite the smoothing, estimators (7) tend to underestimate MSEs of (6) because the estimation of the sampling variances ψ i is ignored in the derivation of (7). ...
Preprint
Full-text available
Traditional direct estimation methods are not efficient for domains of a survey population with small sample sizes. To estimate the domain proportions, we combine the direct estimators and the regression-synthetic estimators based on domain-level auxiliary information. For the case of small true proportions, we introduce the design-based linear combination that is a robust alternative to the empirical best linear unbiased predictor (EBLUP) based on the Fay--Herriot model. We also consider an adaptive procedure optimizing a sample-size-dependent composite estimator, which depends on a single parameter for all domains. We imitate the Lithuanian Labor Force Survey, where we estimate the proportions of the unemployed and employed in municipalities. We show where the considered design-based compositions and estimators of their mean square errors are competitive for EBLUP and its accuracy estimation.
... In the context of small area estimation, model-based inference procedures are applied to increase the effective sample size of a domain with sample information from other domains or preceding sampling occasions [9]. This kind of estimation procedures are gradually accepted by NSIs [10,11]. ...
Article
This paper aims to elicit a discussion of the existence of a paradigm shift in official statistics through the emergence of new (unstructured) data sources and methods that may not adhere to established and existing statistical practices and quality frameworks. The paper discusses strengths and weaknesses of several data sources. Furthermore, it discusses methodological, technical and cultural barriers in dealing with new data and methods in data science; cultural as in the culture that reigns in an area of expertise or approach. The paper concludes with suggestions of updating the existing quality frameworks. We take the position that there is no paradigm shift but that the existing production processes should be adjusted and that existing quality frameworks should be updated in order for official statistics to benefit from the fusion of data, knowledge and skills among survey methodologists and data scientists.
... Nevertheless, the cAIC in a stepwise selection procedure might result in complex models that overfit the data. Alternatively, cross-validation is sometimes used as a measure for model selection, see Boonstra et al. (2008). Other authors propose the LASSO (Hastie et al. 2001) as a form of model selection (Thao and Geskus 2019). ...
Article
Full-text available
Policy measures to combat low literacy are often targeted at municipalities or regions with low levels of literacy. However, current surveys on literacy do not contain enough observations at this level to allow for reliable estimates when using only direct estimation techniques. To provide more reliable results at a detailed regional level, alternative methods must be used. The aim of this article is to obtain literacy estimates at the municipality level using model-based small area estimation techniques in a hierarchical Bayesian framework. To do so, we link Dutch Labour Force Survey data to the most recent literacy survey available, that of the Programme for the International Assessment of Adult Competencies (PIAAC). We estimate the average literacy score, as well as the percentage of people with a low literacy level. Variance estimators for our small area predictions explicitly account for the imputation uncertainty in the PIAAC estimates. The proposed estimation method improves the precision of the area estimates, making it possible to break down the national figures by municipality.
... Particularly for the municipalities with a limited number of observations, the precision of this model-based estimator will be much larger than the design-based estimator since it is based on a much larger sample. In this application this approach reduced standard errors in small municipalities by 30%, Boonstra et al. (2008Boonstra et al. ( , 2010. The price paid by using this method is that model misspecification results in a design bias that is not reflected in the mean squared errors of this estimator. ...
... Model selection for each target variable was carried out considering diagnostic criteria such as maximum likelihood, AIC and BIC, such as in Boonstra et al. (2008), Smeets (2009) andD'Alo' et al. (2009). Except for variable (3), for which the most predictive variable is the educational level, the cross-classification by sex and age resulted to be very predictive. ...
Article
In the Netherlands, very precise and detailed statistical information on labour force participation is derived from registers. A drawback of this data source is that it is not timely since definitive versions typically become available with a delay of two years. More timely information on labour force participation can be derived from the Labour Force Survey (LFS). Quarterly figures, for example, become available six weeks after the calendar quarter. A well-known drawback of this data source is the uncertainty due to sampling error. In this article, a nowcast method is proposed to produce preliminary but timely nowcasts for the register labour force participation on a quarterly frequency at the level of municipalities and neighbourhoods, using the data from the LFS. As a first step, small area estimates for quarterly municipal figures on labour force participation are obtained using the LFS data and the unit-level modelling approach of Battese, Harter and Fuller (1988). Subsequently, time series of these small area estimates at the municipal level are combined with time series on register labour force participation in a bivariate structural time series model in order to nowcast the register labour force participation at the level of municipalities and neighbourhoods.
Chapter
This chapter presents a general design‐based framework for the design and analysis of experiments embedded in probability samples. It discusses design considerations for experiments embedded in probability samples, and gives a design‐based inference approach for single‐factor experiments where sampling units are also the experimental units. The chapter develops a theoretical framework to test hypotheses about differences between finite population parameter estimates observed under different survey implementations or treatments, based on a field experiment embedded in a probability sample. Explaining systematic differences between a target variable observed under different treatments or survey implementations requires a measurement error model. The chapter considers special cases where the proposed procedure coincides with the more familiar model‐based analysis such as ANOVA F‐tests or two sample i‐tests. It also presents an experiment with different data collection modes in the Dutch Crime Victimization Survey as an illustrative example.
Article
Full-text available
In this paper different multivariate structural time series models are described and applied to estimate the monthly unemployment rate of the Dutch Labour Force Survey. The estimation results are compared in a model evaluation. Compared to the generalized regression estimator, the time series approach results in a substantial increase of the precision because this approach uses sample information observed in previous time periods and other domains to improve the monthly estimates.
Article
Full-text available
In this paper a multivariate structural time series model is described that accounts for the panel design of the Dutch Labour Force Survey and is applied to estimate monthly unemployment rates. Compared to the generalized regression estimator, this approach results in a substantial increase of the accuracy due to a reduction of the standard error and the explicit modelling of the bias between the subsequent waves.
Article
Full-text available
Over the last three decades, mixed models have been frequently used in a wide range of small area applications. Such models offer great flexibilities in combining information from various sources, and thus are well suited for solving most small area estimation problems. The present article reviews major research developments in the classical inferential approach for linear and generalized linear mixed models that are relevant to different issues concerning small area estimation and related problems.
Chapter
This article provides an overview of methods for estimating small-area means or totals when the area-specific sample sizes are small. Methods studied include indirect methods based on implicit models and methods based on explicit models.
Article
After giving a brief review of the literature on the analysis of data from repeated surveys, we present a new framework for analysing such data using a dynamic linear model under very general conditions and present optimal predictors of totals of finite populations. We also discuss the estimation of model parameters by the method of maximum likelihood and present two examples to illustrate the theory. /// Pour commencer nous proposons une revue brève de la littérature qui concerne l'analyse des résultats obtenus par les sondages à répétition. Après cette revue nous proposons une nouvelle conception pour l'analyse de tels résultats sous des conditions très générales, utilisant un modèle dynamique linéaire. Nous dérivons aussi les prédicteurs les meilleurs pour les totaux des populations finies. Enfin, nous allons discuter l'estimation des paramètres du modèle par la méthode de vraisemblance maximum en présentant deux exemples.
Article
The National Health Interview Survey is designed to produce precise estimates of finite population parameters for the entire United States but not for small geographical areas or subpopulations. Our investigation concerns estimates of proportions such as the probability of at least one visit to a doctor within the past 12 months. To include all sources of variation in the model, we carry out a Bayesian hierarchical analysis for the desired finite population quantities. First, for each cluster (county) a separate logistic regression relates the individual’s probability of a doctor visit to his or her characteristics. Second, a multivariate linear regression links cluster regression parameters to covariates measured at the cluster level. We describe the numerical methods needed to obtain the desired posterior moments. Then we compare estimates produced using the exact numerical method with approximations. Finally, we compare the hierarchical Bayes estimates to empirical Bayes estimates and to standard methods, that is, synthetic estimates and estimates obtained from a conventional randomization-based approach. We use a cross-validation exercise to assess the quality of model fit. We also summarize the results of a separate study of the binary indicator of partial work limitation. Because we know the value of this variable for each respondent to the 1990 Census long form, we can compare estimates corresponding to alternative methods and models with very accurate estimates of the true values.