ArticlePDF Available

The Contributions of Paradata and Features of Respondents, Interviewers and Survey Agencies to Panel Co-Operation in the Survey of Health, Ageing and Retirement in Europe

Authors:

Abstract and Figures

The paper deals with panel co‐operation in a cross‐national, fully harmonized face‐to‐face survey. Our outcome of interest is panel co‐operation in the fourth wave of the Survey of Health, Ageing and Retirement in Europe. Following a multilevel approach, we focus on the contribution of paradata at three levels: fieldwork strategies at the survey agency level, features of the (current) interviewer and paradata describing respondents’ interview experience from the previous wave. Our results highlight the importance of respondents’ prior interview experience, and of interviewers’ quality of work and experience. We also find that survey agency practice matters: daily communication between fieldwork co‐ordinators and interviewers is positively associated with panel co‐operation.
Content may be subject to copyright.
©2018 Royal Statistical Society 0964–1998/19/182003
J. R. Statist. Soc. A (2019)
182,Par t 1,pp. 3–35
The contributions of paradata and features of
respondents, interviewers and survey agencies to
panel co-operation in the Survey of Health, Ageing
and Retirement in Europe
Johanna Bristle,
Max Planck Institute for Social Law and Social Policy, Munich, Germany
Martina Celidoni and Chiara Dal Bianco
University of Padua, Italy
and Guglielmo Weber
University of Padua, Italy, and Institute for Fiscal Studies, London, UK
[Received July 2015. Final revision June 2018]
Summary. The paper deals with panel co-operation in a cross-national, fully harmonized face-
to-face survey. Our outcome of interest is panel co-operation in the fourth wave of the Survey
of Health, Ageing and Retirement in Europe. Following a multilevel approach, we focus on the
contribution of paradata at three levels:fieldwork strategies at the survey agency level, features
of the (current) interviewer and paradata describing respondents’ interview experience from the
previous wave. Our results highlight the importance of respondents’ prior interview experience,
and of interviewers’ quality of work and experience. We also find that survey agency practice
matters: daily communication between fieldwork co-ordinators and interviewers is positively
associated with panel co-operation.
Keywords: Attrition; Field practices; Interviewer effects; Panel data; Paradata
1. Introduction
The issue of retention in panel surveys is of paramount importance, particularly when the focus
is on slow, long-term processes such as aging. Lack of retention of subjects in longitudinal
surveys, which is also known as attrition, accumulates over waves and particularly harms the
panel dimension of the data.
Survey participation depends on location, contact and co-operation of the sample unit
(Lepkowski and Couper, 2002). In this paper, we investigate the determinants of panel
co-operation—interview completion given location and contact—in the fourth wave of the Sur-
vey of Health, Ageing and Retirement in Europe (SHARE) given participation in the third wave.
We focus on panel co-operation since location and contact are less problematic in a later panel
wave.
Address for correspondence: Johanna Bristle, Max Planck Institute for Social Law and Social Policy, Amalien-
strasse 33, 80799 Munich, Germany.
E-mail: bristle@mea.mpisoc.mpg.de
4J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
As recommended by the literature on the determinants of non-response behaviour, we exploit
information at different levels: individual and household characteristics, interviewer traits and
survey design features. A contribution of this paper is its use of information that is gathered at
the interviewer level in a harmonized, multicountry survey. A further, novel, contribution lies
in our investigation of the role of survey agency practices and variability.
We use a three-level logit model to estimate the deter minants of retention and the variance that
is attributable to each level: respondent, interviewer and survey agency. This model accounts for
correlation in probabilities of co-operation for respondents who were interviewed by the same
interviewer and interviewers working for the same survey agency. Given the limited number
of survey agencies at the third level, we provide also a simulation exercise to document how
estimates behave in finite samples similar to the sample that we use.
The multilevel model that we estimate uses survey data as well as additional paradata that
are obtained as a ‘by-product of the data collection process capturing information about that
process’ (Durrant and Kreuter, 2013). In SHARE, paradata are available on all three levels.
Although paradata at the individual or interviewer level have been used in this strand of
the literature, information at the survey agency level has not been taken into account to ex-
plain participation. One possible reason for this gap could be that, in cross-national research,
information at the survey agency level may not be available or harmonized across countries
(Blom et al., 2008) so comparability is limited. SHARE, which provides harmonized informa-
tion on elderly individuals at the European level, collected such data in wave 4. This additional
source of information gives us the opportunity to investigate the nature of non-response also at
the survey agency level.
Our approach is theoretically based on the framework of survey participation by Groves
and Couper (1998), in which the factors that are expected to influence survey participation are
divided into two major areas: ‘out of researcher control’ and ‘under researcher control’. In this
paper, we are particularly interested in the factors that can be influenced by the researcher,
namely survey agency fieldwork strategies, the features of the interviewer and the respondent–
interviewer interaction.
We find that variables at all three levels affect the probability of retention. Respondent
and interviewer characteristics play an important role. Respondent co-operation decisions are
affected by their previous interview experience: for instance, item non-response in a previous
wave reduces the likelihood of co-operation in a later wave. As far as interviewer character-
istics are concerned, we find that previous experience with working as a SHARE interviewer
matters more than sociodemographic characteristics, such as age, gender or education. Fur-
ther, interviewers who perform well on survey tasks that require diligence are more successful
in gaining co-operation. Regarding survey-agency-related controls, we find that having contact
with interviewers every day increases the chances of gaining respondents’ co-operation. This
result may highlight the importance of communication between survey agency co-ordinators
and interviewers, but may also point to other factors at the survey agency level that affect re-
spondents’ co-operation (such as the relative importance that the survey agency attaches to
SHARE).
The structure of the paper is as follows. Section 2 reviews the literature and Section 3 presents
the features of the available data with a special focus on paradata and the outcome variable.
Section 4 presents the empirical strategy, Section 5 comments on the empirical results and
Section 6 concludes.
The programs that were used to analyse the data can be obtained from
http://wileyonlinelibrary.com/journal/rss-datasets
Contributions to Panel Co-operation 5
2. Previous findings
Panel studies are affected by attrition of subjects, which can bias parameter estimates because
of potential differences between those who stay in the panel and those who drop out. It is by
now standard in the literature to conduct exploratory analyses to understand how to prevent
unit non-response during fieldwork. Literature on the determinants of survey participation has
recently proposed the use of paradata to gain a better understanding of response behaviour
(e.g. Kreuter (2013) and Kreuter et al. (2010)). However, even though paradata represent a
rich source of new information, little attention has been paid for instance to indicators such as
keystroke data (Couper and Kreuter, 2013) as well as additional information at higher levels,
e.g. at the country or survey agency level.
High levels of heterogeneity might be explained by differences in survey characteristics, in
population characteristics or in data collection practices. This was highlighted by Blom (2012)
who examined country differences in contact rates in the European Social Survey—a survey that
is similar to SHARE in its attempt to achieve ex ante harmonization across several European
countries, but different from SHARE since it lacks the longitudinal dimension. By conducting
counterfactual analysis, Blom attributed the differences in contact rates to differential survey
characteristics (mostly related to interviewers’ contact strategies), population characteristics and
coefficients. Like Blom (2012), we investigate the drivers of variability at the country level, but
we are interested in panel co-operation—rather than contact—and use multilevel analysis as our
empirical strategy. Most studies using cross-national data refrain from investigating the country
level because of a small number of countries or the unavailability of harmonized information at
this level. An exception is Lipps and Benson (2005), who analysed contact strategies in the first
wave of SHARE by using a multilevel model also taking into account the country level but did
not find significant between-country differences. However, the response process in later waves of
a panel might differ from the response process in the baseline wave because of survey agencies’
accrued organizational experience or respondents’ self-selection into later waves (Lepkowski
and Couper, 2002). An advantage of using the fourth wave of SHARE, as we do, is that we can
exploit additional harmonized information collected at the survey agency level to understand
better whether different fieldwork practices can explain heterogeneity in panel co-operation at
the survey agency level, given a common survey topic. In SHARE, the countries and survey
agencies mostly overlap; however, since in two countries (Belgium and France) more than one
survey agency collected the data, we shall use the term survey agency, instead of country, for
the third (highest) level.
Taking the role of the interviewer into account is vital for attrition analyses in face-to-face
surveys. In the literature, results regarding interviewer continuity across waves are mixed. For
example, Hill and Willis (2001) found a positive strong significant association between response
rate and interviewer continuity, Lynn et al. (2014) found that continuity positively affects co-
operation in some situations, whereas other studies (Campanelli and O’Muircheartaigh, 1999;
Nicoletti and Peracchi, 2005; Pickery et al., 2001) have found insignificant effects. These findings
have been questioned as not only respondents attrit, but interviewers might attrit non-randomly
from surveys as well (Campanelli and O’Muircheartaigh, 2002). In the multicountry setting of
SHARE, the selection and assignment of interviewers is subject to supervision by the sur-
vey agencies. Although survey guidelines recommend interviewer continuity, we cannot link
interviewers across waves. On the basis of Vassallo et al. (2015), we decided to focus on the
current (wave 4) interviewer. (Whereas Pickery et al. (2001) stated that the previous interviewer
is more relevant, a more recent study (Vassallo et al., 2015) showed that taking into account
both previous and current wave interviewer within a multiple-membership model does not
6J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
improve on the simpler two-level model that controls only for the current wave interviewer
random effect.)
The literature has highlighted that isolating interviewer effects from area effects might be
problematic when there is no fully interpenetrated design, i.e. random assignment of sample units
to interviewers (Campanelli and O’Muircheartaigh, 1999; Durrant et al., 2010; Vassallo et al.,
2015). The lack of interpenetration is likely in face-to-face surveys, such as SHARE, in which the
interviewer generally operates in limited geographical areas. Therefore, if there are geographical
patterns in co-operation, these could appear as interviewer effects. It should be noted that
Vassallo et al. (2015) did not find significant area effects, after controlling for interviewer and
household level effects in a cross-classified model, which is in line with findings by Campanelli
and O’Muircheartaigh (1999) and Durrant et al. (2010). Given the lack of interpenetrated
assignment in SHARE, following standard practice, we include among our controls some area
indicators (living in an urban or rural area) to capture area effects. Unfortunately, more detailed
information about the area where respondents live is not available in waves 3 and 4. (Additional
area characteristics have been collected in wave 5 for all respondents, but in wave 6 only for the
refreshment sample.)
In our analysis, we consider interviewer attributes such as age, gender and experience with the
survey that were collected by the agencies and interviewer work quality indicators (interviewer
average number of contacts and rounding indicators) that we compute. (We construct indicators
of rounding behaviour in measurements following Korbmacher and Schr¨
oder (2013).) In fact,
interviewer sociodemographic characteristics and experience (overall or within a specific survey)
are typically included when explaining interviewer level variance (West and Blom, 2017). The
literature has also documented that interviewers with higher contact rates achieve higher co-
operation rates (O’Muircheartaigh and Campanelli, 1999; Pickery and Loosveldt, 2002; Blom
et al., 2011; Durrant and D’Arrigo, 2014). On the basis of the literature on ‘satisficing be-
haviour’ in surveys (Krosnick, 1991), the underlying hypothesis concerning interviewer effects
is that those who are diligent in specific tasks during the interview are more engaged and more
successful in gaining co-operation than are interviewers who show less diligent interviewing be-
haviour. Diligent interviewers are those who fulfil their task thoroughly to optimize the quality
of their interviews, whereas less diligent interviewers use ‘satisficing strategies’, such as skipping
introductions or rounding measurements, to minimize effort.
Lugtig (2014) highlighted four mechanisms of attrition at the respondent level, namely shocks
(e.g. moving or health decline), habit (consistent participation pattern), absence of commitment
and panel fatigue. Paradata can especially help in capturing commitment and panel fatigue
to single out respondents who are at risk of future attrition due to non-co-operation. This
can be based on interviewer assessments, e.g. willingness to answer or whether the respondent
asked for clarification, or directly derived from the interview data, e.g. item non-response. The
latter in particular is a good predictor of participation in later waves. According to the theory
of a latent co-operation continuum (Burton et al., 1999), in fact, item non-response—not provid-
ing valid answers to some questions—is a precursor of unit non-response—not providing any
answers—in the following wave. This theory finds empirical support in Loosveldt et al. (2002).
The length of interview also contributes to shaping the past interview experience. In
longitudinal surveys, the length of interview in an earlier wave might affect the decision to
participate in later waves. On the one hand, a longer interview can be seen as a burden and
affect co-operation negatively; on the other hand, the length might also measure the respon-
dent’s motivation and commitment to the survey and therefore can have a positive influence on
co-operation. Findings in the literature concerning effects of interview length on panel attri-
tion in interviewer-administered settings are mixed, with some showing a positive association
Contributions to Panel Co-operation 7
(Fricker et al., 2012; Hill and Willis, 2001) with co-operation and some not finding any effect
(Lynn, 2013; Sharp and Frankel, 1983). Branden et al. (1995) disentangled the wave-specific
influence of interview length by taking the longitudinal perspective into account. They found
that long interviews are positively correlated with co-operation during the first waves of a panel,
but the association vanishes in later waves.
3. Data
3.1. Survey of Health, Ageing and Retirement in Europe and sample selection
SHARE is a multidisciplinary harmonized European survey, targeting individuals aged over
50 years and their partners, and represents the principal source of data to describe and in-
vestigate the causes and consequences of the aging process for the European population (see
B¨
orsch-Supan et al. (2013)). SHARE was conducted for the first time in 2004–2005 (wave
1) in 11 European countries (Austria, Belgium, Denmark, France, Germany, Greece, Italy,
the Netherlands, Spain, Sweden and Switzerland) and Israel. In the second wave Poland, the
Czech Republic and Ireland joined SHARE and additional refreshment samples were added to
ensure representativeness of the targeted population. Wave 3, called ‘SHARELIFE’, which was
conducted between 2008 and 2009, differed from the standard waves, since it collected the life
histories of individuals who participated in wave 1 or wave 2. The fourth wave of SHARE, which
started in 2011, is a regular wave (see Malter and B ¨
orsch-Supan (2013)).
The regular wave main questionnaire is composed of about 20 modules, each focusing on a
specific topic, e.g. demographics, mental and physical health, cognitive functions, employment
and pensions. The questionnaire of SHARELIFE differed from the standard waves, since it had
very few questions on the current condition (the variables related to the current condition are
household income, health status, economic status and current income from employment, self-
employment and pensions) but focused on gathering information regarding the life histories of
individuals who participated in wave 1 or wave 2 (Schr¨
oder, 2011). We exploit mainly the third
and the fourth wave of SHARE by investigating co-operation in wave 4 given participation in
SHARELIFE and given contact in wave 4. The two waves are not completely comparable given
the rather special content of the third wave, but the choice was driven mainly by the availability
of paradata. The particular sample definition that we refer to implies that we must be cautious
when extending our results.
Both standard and retrospective SHARE interviews were conducted via face-to-face, com-
puter-assisted personal interviews (CAPIs). Not every eligible household member was asked to
answer every module of the standard CAPI questionnaire: selected household members served
as family, financial or household respondents. These individuals answered questions about chil-
dren and social support, financial issues or household features on behalf of the couple or the
household. This means that the length of the questionnaire varied between respondents by de-
sign, which must be taken into account when analysing participation. An advantage of using
SHARELIFE is that the differences between the types of respondents are limited since there is
a distinction only between the first and second respondent on the basis of very few questions
on the household’s current economic situation (e.g. household income). In all SHARE waves
there is also the possibility of conducting a shorter proxy interview for cognitively impaired
respondents. A proxy can answer on behalf of the eligible individual for most of the modules.
We describe our sample definition more precisely in Table 1. The number of individuals who
were interviewed in SHARELIFE is 20106. We do not consider Greece and Ireland as these
countries did not participate in wave 4. We also excluded France as interviewer information was
unavailable and Poland because of a lack of survey agency practices information. We deleted
8J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 1. Sample definition
Number of observations released in SHARELIFE† 20106
Sample restrictions
Not part of assigned wave 4 sample 144
Household not contacted in wave 4 522
Deceased in wave 4 599
Linkage restrictions
Non-linked with interviewer information 994
Incomplete-data restrictions
Missing data at interviewer level 15
Missing data at respondent level 887
Final number of respondents 16945
Final number of interviewers 643
Final number of survey agencies 11
†Without Greece, Ireland, France and Poland.
144 cases that were not part of the assigned, longitudinal sample for fieldwork wave 4, e.g.
because of legal restrictions or changes in eligibility. We do not consider individuals from the
longitudinal sample whose households were not contacted in wave 4 (522 cases) given that our
focus is on co-operation, and we excluded individuals who died between waves (599 cases).
When linking the various sources of data, interviewer information was not linkable for 5.3% of
the total sample (994 cases). The proportion of non-linked observations exceeds 10% in Austria
and Sweden, but some unresolvable cases remained in all countries. (The sample of non-linked
observations presents higher proportions of singles and women—and a higher average (but
identical median) respondent age. Given the high prevalence of such observations in Austria
and Sweden, we checked that dropping either country from the estimation sample does not affect
parameter estimates in a significant way.) Furthermore, we do not have complete information
on interviewers in wave 4 for 15 cases; wave 3 missing data concern 887 individuals, distributed
among all the countries. (Missing infor mation is especially related to questions of the interviewer
module regarding the area and type of building. In this module interviewers must answer a few
questions about the interview situations without the respondent.)
3.2. Collection and preparation of paradata in the Survey of Health, Ageing and
Retirement in Europe
The collection of paradata is greatly facilitated by computer-assisted sample management tools
and interview instruments. In the following section we describe the sources of data in SHARE
and the preparation of the variables that we derive from them.
For sample management SHARE uses a tailor-made sample management system. This pro-
gram is installed on each interviewer’s laptop and enables the interviewers to manage their as-
signed subsample. The success of a cross-national study such as SHARE depends heavily on the
way in which the data are collected in the various countries. Therefore, using a harmonized tool
for collecting interview data as well as contact data is crucial to ensure the comparability of the
results. The sample management system tool enables interviewers to register every contact with
a household or individual respondent and to enter result codes for every contact attempt (e.g. no
contact, contact—try again, or refusal). These data were also used by Lipps and Benson (2005) to
Contributions to Panel Co-operation 9
analyse contact strategies in the first wave of SHARE. Among the information that was collected
through the sample management system tool, we use the average number of contacts that inter-
viewers registered before obtaining household co-operation or the final refusal. Furthermore,
the sample definition is partly constructed on the basis of contact information (see Table 1).
While the interview is conducted, additional paradata are collected by tracking keystroke
data. Here, every time a key is pressed on the keyboard of the laptop, this is registered and
stored by the software in a text file. From these text files, time stamps at the item level can be
computed. Additionally, the keystrokes record the number of times that an item was accessed,
back-ups, whether a remark was made and the remark itself. We compute the interview length
of wave 3 based on those files. In contrast with commonly used time stamps at the beginning and
the end of the whole interview, this approach provides a precise and adequate length measure
that is net of longer interruptions of the interview. To control for the potential effect of the length
of the interview on co-operation propensity, we include it and its square term to account for
possible non-linear effects as well. Controlling for the length of the interview helps to take into
account the fact that SHARE interviews vary by design because of the complex structure of the
questionnaire. Additionally, we use keystroke information to construct a variable for interviewer
quality that is used in the robustness section. We first compute the median reading time, by
interviewer, for section introductions that are relatively long, such as social network, activities,
financial transfers and income from work and pensions. If this value is lower than the country
(and language) 25th percentile in at least one case, then we define a ‘short introduction’ dummy
variable. This variable should capture interviewers who are likely to skip section introductions.
Furthermore, as paradata at the respondent level, we include information that is derived
from the CAPI interviews in wave 3, in particular the percentage of item non-response to mon-
etary items. The questions that were considered to construct this variable are household income
(HH017), value of the property (AC019), first monthly wage for employed individuals (RE021)
or first monthly work income (RE023) for self-employed individuals, current wage if the re-
spondent is still in employment (RE027), current income if the respondent is still self-employed
(RE029), pension benefit (RE036), wage at the end of the main job if retired (RE041) and income
at the end of the main job if retired and worked as self-employed (RE043). Such questions on
monetary values can be both sensitive and difficult (Loosveldt et al., 2002; Moore et al., 2000).
The respondent might perceive them as burdensome or uncomfortable to answer. Previous em-
pirical research showed that the item non-response to income questions can predict participation
(Loosveldt et al. 2002; Nicoletti and Peracchi, 2005). The public release of SHARE also contains
a section in which interviewers are asked to evaluate the reluctance of respondents (interviewer
module). Related to this, we include a dummy variable indicating whether the interviewer re-
ported a high level of willingness to answer and whether she asked for clarification. Furthermore,
information on the area (urban versus rural) is derived from the interviewer module.
Additionally, interviewer information and survey agency fieldwork strategies were gathered
and delivered by the survey agencies for wave 4. The interviewer information includes demo-
graphics (year of birth, education and gender) and previous experience in conducting SHARE
interviews (a dummy that takes value 1 if the interviewer has already participated in at least one
previous wave of SHARE). Interviewers’ level of education is not available for all countries. For
those survey agencies that provided this information, we apply the 1997 ‘International standard
classification of education’ (ISCED) to harmonize the country-specific answers. (We exploit
this information to run robustness analysis with the subsample of agencies that provided the
education information.)
Among interviewer controls, we also add a measure of work quality, following Korbmacher
and Schr¨
oder (2013). We try to capture interviewers’ quality on the basis of the grip strength
10 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
0 500 1000 1500 2000 2500
Frequency
0 10 20 30 40 50 60 70
Fig. 1. Frequency of grip strength values
test that SHARE proposes in every wave. The test consists of measuring respondents’ grip
strength twice for each hand by using a dynamometer. In the CAPI, interviewers are explic-
itly told to record a value between 0 and 100, without rounding numbers to multiples of 5
and 10. ‘Previous waves showed that multiples of 5 and 10 were recorded more than statis-
tically expected’ (Korbmacher and Schr ¨
oder, 2013); in Fig. 1 we report the wave 4 pattern
of grip strength measurement. If interviewers have percentages of multiples of 5 and 10 that
lie outside the 90% confidence interval centred on the statistically expected value of 20.8%,
then the interviewer is not measuring grip strength properly. We identify interviewers who
round too often, by defining a dummy that takes value 1 if the percentage exceeds the up-
per bound of the confidence interval and 0 otherwise. We also generate another dummy vari-
able for those interviewers who do not report enough multiples of 5 and 10 (the percentage
falls short of the lower bound), as they may be strategically concealing inaccurate measure-
ments.
Additional information is gathered at the survey agency level about fieldwork strategies.
Topics that are covered are recruitment, training, contacting respondents, translation, technical
support, interview content, sampling process, management of interviewers and duration of
fieldwork. (Unfortunately information on interviewers’ pay is not available in wave 4 of SHARE.)
Those data are collected mostly by means of open-ended questions, but some questions have a
drop-down list. Open questions are difficult to handle within a multicountry framework. For this
reason we focus on questions with standard answering options that show some variability. We
consider especially the following questions: ‘Who decides which project is prioritized, assuming
that interviewers work on several projects simultaneously?’ with ‘interviewer, agency or both’ as
possible answers and ‘How often are you in contact with your interviewers about the SHARE
study?’ with the following answering options: ‘less than once a month, once a month, several
times a month, once a week, several times a week or every day’. We define two variables:
Contributions to Panel Co-operation 11
Tab le 2. Descriptive statistics of the variables at the respondent level (ND16945)
Variable Mean Standard Minimum Maximum Description
deviation
Co-operation 0.84 0.37 0 1 Co-operation in wave 4 (outcome)
Female 0.56 0.50 0 1 Gender (reference: male)
Age 66.75 9.60 34 100 Age of respondent in years
Being in poor health 0.38 0.48 0 1 Self-reported poor health
Any proxy 0.06 0.24 0 1 A proxy helped in answering the
questionnaire
Single 0.23 0.42 0 1 Marital status
Years of education 10.74 4.47 0 25
Household income—1st
quartile
0.32 0.47 0 1 Household income, 1st quartile by
country
Household income—2nd
quartile
0.24 0.42 0 1 Household income, 2nd quartile
by country
Household income—3rd
quartile
0.27 0.44 0 1 Household income, 3rd quartile
by country
Working 0.30 0.46 0 1 If Rdeclares to be employed or
self-employed
Living in an urban area 0.23 0.42 0 1 Small town or rural area
(reference: urban)
Living in a (semi-)detached
house
0.70 0.46 0 1 Living in a (semi-)detached house
(reference: flat)
Interrupted response pattern 0.06 0.24 0 1 Interviewed in wave 1 and wave 3
but not in wave 2
Item non-response to
monetary questions
0.20 0.28 0 1 Proportion of item non-response
to monetary items in wave 3
Length of interview 0.90 0.37 0.26 2.54 Length of interview in wave 3
(in hours)
Willingness to answer 0.93 0.26 0 1 Willingness to answer in wave 3
Did not ask for clarification 0.84 0.37 0 1 Did not ask for clarification in
wave 3
†Data: SHARELIFE release 6.0.0, SHARE wave 4 release 6.0.0 and SHARE paradata wave 3 and 4.
priority agency, that takes value 1 if the survey agency decides the priority of projects (four
out of 11 survey agencies do) and daily contact which equals 1 if the survey agency has contact
with the interviewers daily (two out of 11 survey agencies do). We cannot differentiate the
direction of the communication between agency and interviewer—whether agencies check on
interviewers frequently or whether interviewers contact the agency regularly (with questions or
for reporting) cannot be distinguished. An overview of all the variables that were used for the
analysis with descriptive statistics can be found in Tables 2–4.
3.3. Attrition and co-operation in Survey of Health, Ageing and Retirement in Europe
wave 4
After describing the features of SHARE and the paradata that were used, we present in greater
detail the response behaviour patterns in wave 4 for those who participated in SHARELIFE:
the sample in which we are interested. Our sample of analysis differs slightly from the panel
sample because we do not consider those who were interviewed in wave 1 or 2 who did not
participate in wave 3 (SHARELIFE).
The standard distinction in the survey participation process is in terms of location, contact
and co-operation (Lepkowski and Couper, 2002):
12 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 3. Descriptive statistics of the variables at the interviewer level (ND643)
Variable Mean Standard Minimum Maximum Description
deviation
Age 55.07 11.52 19 79 Age of interviewer in years
Female 0.63 0.48 0 1 Gender (reference: male)
Experience 0.68 0.47 0 1 Interviewer’s experience with
working on previous
SHARE waves
Contacts 2.41 0.73 0.20 7.11 Interviewer-specific mean of
contact with household
until co-operation or refusal
Rounding to a multiple of 5
for grip strength measure
(too many)
0.35 0.48 0 1 If the interviewer’s percen-
tage of rounding is below
or above respectively the
lower or upper cut-off of the
90% confidence interval
centred near the statistically
expected value of 20.8%
Rounding to a multiple of 5
for grip strength measure
(too few)
0.03 0.16 0 1
Short introductions 0.52 0.50 0 1 Interviewer has at least 1 short
introduction (i.e. time
recoded lower than a
country-specific median)
Interviewer education
(ISCED 5–6)
0.37 0.48 0 1 Interviewer has tertiary
education (restricted sample)
†Data: SHARE wave 4 release 6.0.0 and SHARE interviewer information wave 4.
Tab le 4. Descriptive statistics of the variables at the agency level (ND11)
Variable Mean Standard Minimum Maximum Description
deviation
Priority agency 0.36 0.48 0 1 Agency decides the priority of
projects
Daily contact 0.18 0.38 0 1 Agency monitors and has
contact with the interviewers
daily
†Data: SHARE agency information wave 4.
(a) location of the sample unit means finding geographically eligible individuals at a given
address,
(b) contact means reaching an eligible sample unit by telephone or face-to-face visits and
(c) co-operation is the completion of the interview.
Given that step (a) is usually less problematic in a panel (Lepkowski and Couper, 2002) and we
cannot test it, the final response rate will be the product of the contact and co-operation rates,
at least in simplified terms.
Kneip (2013) reported household contact rates for the panel sample of SHARE wave 4
that are consistently above 90% with an average of about 95% across all countries, whereas
Contributions to Panel Co-operation 13
0
1
0.2
0.4
0.6
0.8
co-operation rate
AT BE−Fr BE−Fl CH CZ DE DK ESIT NL SE
Fig. 2. Propor tions of co-operation in wave 4 by survey agency ( , 95% confidence intervals): AT, Austria;
BE–Fr, Belgium–Wallonia; BE–Fl, Belgium–Flanders; CH, Switzerland; CZ, Czech Republic; DE, Germany;
DK, Demark; ES, Spain; IT, Italy; NL, the Netherlands; SE, Sweden
household co-operation, which varies between about 60% and about 90%, shows greater varia-
tion across countries. Hence, the retention rates, which combine contact and co-operation, vary
between 56% and about 90%. (All the rates that were calculated by Kneip (2013) are constructed
according to American Association for Public Opinion Research standards.)
This highlights that establishing contact was not an issue in the panel sample for most coun-
tries and non-contact seems to be a very limited phenomenon compared with other surveys,
such as the European Community Household Panel, for which Nicoletti and Peracchi (2005)
analysed participation, modelling contact and co-operation as sequential events. In our case,
the very limited number of individuals in non-contacted households (2.6%) leads us to ignore
the contact phase and to focus exclusively on co-operation, instead.
Fig. 2 presents the percentage of contacted individuals who co-operated in wave 4.
Fig. 2 highlights some heterogeneity among survey agencies with rather high co-operation rates
(85% or more) in Switzerland and Italy and lower co-operation rates, below 80%, in the Czech
Republic, Germany and Sweden. (These numbers are our own calculations based on our sample
restrictions. For the official rates, refer to Kneip (2013).)
4. Empirical strategy
We estimate a multilevel logit model to investigate correlates of subject co-operation while
accounting for correlations in probabilities between respondents. (We estimate the multilevel
logit model with the Stata command melogit with mode curvature adaptive Gauss–Hermite
quadrature integration methods. The estimation results are stable increasing the number of in-
tegration points.) This estimation strategy specifies the hierarchical structure of the data and
enables us to avoid underestimation of standard errors and therefore incorrect inference (Couper
and Kreuter, 2013; Goldstein, 2011). Given that we are interested in understanding how dif-
ferent levels contribute to explaining co-operation, we start by estimating a random-intercept
model (null model). We then enrich this baseline model specification by stepwise inclusion of
14 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
covariates at the individual, interviewer and survey agency level. This bottom-up procedure has
the advantage of keeping the model simple (Hox, 2010). Our outcome of interest is co-operation,
denoted by yij k , which takes the value 1 if respondent iinterviewed by interviewer jof survey
agency kparticipates in wave 4 conditionally on having participated in wave 3.
The null model can be specified as
logit.pij k |β0,ujk,vk/=β0+ujk +vk.4:1/
and the values of y, conditional on the random components, are independent draws from a
Bernoulli random variable with probabilities pij k , i.e. yij k |ujk ,vkBernoulli.pij k /.
In equation (4.1) the two random terms ujk and vkare interviewer-specific and survey-agency-
specific random effects, with ujk N.0; σ2
u/and vkN.0; σ2
v/respectively (Skrondal and Rabe-
Hesketh, 2004). In a logit model the error variance at the first level, L2
e,isfixedtoπ2=3, to fix the
scale (Rabe-Hesketh and Skrondal, 2005). Thus, in the multilevel extension no level 1 variance
will be estimated.
We then compare models 1–4, in which the covariates on the three different levels are intro-
duced in a stepwise procedure to the null model to understand the role of each group of variables
in reducing heterogeneity at different levels.
The first model specification (model 1) includes a set of controls for individual level socio-
demographic characteristics xijk . Among these variables we include SHARELIFE information
on demographics, such as age, gender, years of education (and its square), marital status, em-
ployment status, health status (including a control for proxy interview), controls for household
income (dummy variables for the top three equivalent household income quartiles), a dummy
taking value 1 if the respondent lives in a detached or semi-detached house to control for the
type of residential building, and a binary indicator for living in an urban or rural area to capture
area effects. Although additional area-related controls would be desirable in the absence of in-
terpenetrated assignment, further information about the area where respondents live is available
only from wave 5 onwards.
In the second model specification (model 2) we add a set of paradata indicators .zij k /at
the individual level. We include a dummy variable controlling for interrupted participation in
previous waves, in particular whether the individual was interviewed only in wave 1 (but not
in wave 2). To account for the influence of previous interview duration, the wave 3 interview
length in hours and its square are added. At this stage we also include the percentage of item
non-response to monetary questions, the willingness to answer and whether the respondent
asked for clarification.
In the third model specification (model 3) we include controls at the interviewer level, s
jk,
specifically interviewer age and gender, interviewer experience and the average number of con-
tacts per household registered by the interviewer (before the interview or the final refusal). We
also include interviewer quality indicators: a dummy that identifies the interviewers who round
least and another dummy for the interviewers who round most on grip strength measurement.
Finally, model 4 controls for a survey agency level covariate, tk, indicating daily communication
between the interviewers and the survey agency.
The complete model, model 4, is specified as
logit.pij k |β,ujk,vk/=β0+β
1xij k +β
2zij k +β
3s
jk +β4tk+ujk +vk.4:2/
where the xij k - and zij k -vectors are individual level sociodemographic and paradata controls, s
jk
is a vector of interviewers’ covariates and tkis a survey agency control.
As already pointed out, in the logistic model the variance of the lowest level residuals is
fixed at a constant. The main consequence is that in each of the models the underlying scale
Contributions to Panel Co-operation 15
is standardized to the same standard distribution, meaning that the residual variance cannot
decrease when adding controls to the model. Moreover, the value of the regression coefficients
that are associated with the controls included and the value of the higher level variances are
rescaled. As a consequence, it is not possible to compare the null model parameters with the
following enriched model specifications or to investigate how variance components change.
Hox (2010) extended the rescaling procedure of Fielding (2004) to the multilevel setting and
suggested the construction of scaling factors to be applied to parameters of the fixed part and
random effects to make the changes in these variables directly interpretable. In the case of a
multilevel logistic regression model, the scale correction factor is given by .σ2
0=σ2
m/for the
parameters of the fixed part and by σ2
0=σ2
mfor variance components. The numerator is the total
variance of the null model .σ2
0=σ2
e+σ2
u+σ2
v/and the denominator is the total variance of model
m.m=1, :::,4/including the first-level predictor variables, σ2
m=σ2
F+σ2
e+σ2
u+σ2
v=σ2
F+σ2
0,
with σ2
Fthe variance of the linear predictor of model mobtained by using the coefficients of the
predictors of the fixed part of the equation.
One important issue when dealing with multilevel models is to assess the accuracy of model
parameter estimates, which is influenced both by the number of observations within groups and
by the number of groups. Given our model formulation, the former is not a relevant issue at the
third level but it could be at the second level: for some interviewers the number of interviews
is particularly low. We address this issue in Section 5.3 by restricting our analysis only to in-
terviewers with at least six interviews. Regarding the number of groups, the second level has a
sufficiently high number of interviewers to ensure accuracy of parameter estimates. However, we
might have inaccurate results due to the low number of survey agencies (our third-level units).
We address this problem with a simulation study to understand the finite sample behaviour of
estimates from a three-level logit model when the hierarchical structure of the data is similar to
the structure of our sample of analysis. Results and discussion are presented in Appendix A.
5. Results
5.1. Predictors from multilevel analysis
We report in Table 5 the estimated coefficients for the stepwise model specifications, in which we
add respondent, interviewer and survey agency controls. The effects for each set of variables are
described in the following subsection. We comment mainly on our preferred model specification,
i.e. the complete-model specification reported in the last column.
As in Durrant and Steele (2009), we comment on our results while referring to some socio-
psychological concepts and theories that have been proposed in the literature, bearing in mind
that there is an imperfect match between theoretical constructs and variables used.
Table 5 shows that the respondent characteristics are highly predictive of co-operation in wave
4. Both gender and age influence co-operation in wave 4. According to our estimates, age has
a non-linear effect on the probability of co-operation. Both regressors age and age squared are
statistically significant: up to about 68 years of age there is a positive association after which it
becomes negative—this is controlling for health conditions. Previous research found lower rates
of participation among the elderly and interpreted this result as support of the social isolation
theory (Krause, 1993). Individuals might decide to underuse their social support network be-
cause they are embarrassed or stigmatized or they mayreject aid from others because they feel un-
comfortable when assistance is provided. Isolation might translate also into a lack of survey par-
ticipation behaviour and may explain the negative age effect that we find for older respondents.
If the respondent reported being in poor health in wave 3, this has a negative and statistically
significant effect on the probability of co-operation. This is not surprising but at the same
16 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 5. Estimated multilevel models including respondent, interviewer and agency characteristics (depen-
dent variable: co-operation)
Variable Results for Results for Results for Results for Results for
model 0, model 1, model 2, model 3, model 4,
intercept only respondent respondent interviewer agency
paradata
Respondent characteristics
Female 0.088‡ 0.099§ 0.099§ 0.100§
(0.048) (0.049) (0.049) (0.049)
Age 0.183§§ 0.170§§ 0.171§§ 0.171§§
(0.029) (0.029) (0.029) (0.029)
Age squared 0:001§§ 0:001§§ 0:001§§ 0:001§§
(0.000) (0.000) (0.000) (0.000)
Being in poor health 0:218§§ 0:195§§ 0:195§§ 0:194§§
(0.050) (0.051) (0.051) (0.051)
Single 0.182§§ 0.212§§ 0.212§§ 0.211§§
(0.068) (0.069) (0.069) (0.069)
Any proxy 0:316§§ 0:193§ 0:196§ 0:194§
(0.095) (0.097) (0.097) (0.097)
Years of education 0.047§ 0.039‡ 0.040‡ 0.045§
(0.021) (0.021) (0.021) (0.021)
Years of education squared 0:002§§ 0:002§§ 0:002§§ 0:003§§
(0.001) (0.001) (0.001) (0.001)
Household income—1st 0.164§ 0.236§§ 0.236§§ 0.234§§
quartile (0.077) (0.078) (0.078) (0.078)
Household income—2nd 0.403§§ 0.383§§ 0.378§§ 0.376§§
quartile (0.077) (0.078) (0.078) (0.078)
Household income—3rd 0.146‡ 0.157§ 0.158§ 0.153§
quartile (0.077) (0.078) (0.078) (0.078)
Living in a (semi-)detached 0.283§§ 0.300§§ 0.296§§ 0.301§§
house (0.057) (0.058) (0.058) (0.058)
Working 0:105 0:066 0:065 0:065
(0.067) (0.067) (0.067) (0.067)
Living in an urban area 0.021 0.020 0.047 0.042
(0.073) (0.074) (0.073) (0.073)
Paradata at the respondent level
Interrupted response pattern 0:991§§ 0:973§§ 0:977§§
(interviewed in wave 1 but (0.083) (0.083) (0.083)
not in wave 2)
Item non-response to 0:521§§ 0:527§§ 0:527§§
monetary questions (0.089) (0.089) (0.088)
Length of interview (h) 1.170§§ 1.149§§ 1.147§§
(0.274) (0.273) (0.272)
Length of interview squared (h) 0:368§§ 0:361§§ 0:362§§
(0.114) (0.114) (0.114)
Willingness to answer 0.444§§ 0.441§§ 0.451§§
(0.090) (0.090) (0.090)
Did not ask for clarification 0.257§§ 0.261§§ 0.264§§
(0.068) (0.068) (0.068)
Interviewer characteristics (wave 4)
Age 0:007 0:004
(0.005) (0.005)
Female 0.072 0.060
(0.105) (0.104)
Experience (previous SHARE 0.627§§ 0.642§§
waves) (0.113) (0.109)
(continued)
Contributions to Panel Co-operation 17
Tab le 5 (continued )
Variable Results for Results for Results for Results for Results for
model 0, model 1, model 2, model 3, model 4,
intercept only respondent respondent interviewer agency
paradata
Interviewer characteristics (wave 4)
Contacts 0:161§ 0:134‡
(0.068) (0.069)
Rounding to a multiple of 5 for 0:216§ 0:238§
grip strength measure (too (0.105) (0.105)
many)
Rounding to a multiple of 5 for 0:788§§ 0:768§§
grip strength measure (too (0.230) (0.228)
few)
Agency control variables
Daily contact 0.714§§
(0.153)
Constant 1.872§§ 4:651§§ 5:445§§ 5:009§§ 5:388§§
(0.098) (1.023) (1.043) (1.079) (1.075)
σ2
u(interviewer level) 1.174 1.184 1.198 1.000 1.006
σ2
v(agency level) 0.073 0.093 0.086 0.089 0.007
N16945 16945 16945 16945 16945
†Standard errors are in parentheses; p-values for fixed effect covariates significance refer to Wald-type tests.
p<0:05.
§p<0:01.
§§p<0:001.
time is inconvenient for a survey on health and aging. In case of very bad health conditions,
SHARE allows proxy interviews: the indicator any proxy highlights a negative association with
co-operation suggesting again that health is an important determinant of attrition. We shall
investigate later whether the health effect changes with interviewer attributes.
The literature finds that single-person households are less likely to co-operate and explains this
result referring to social isolation theory (Goyder, 1987; Groves and Couper, 1998). According
to this theory alienation or isolation from society are predictors of non-response. We find the
opposite in our analysis of retention: compared with couples, singles who have already co-
operated in past waves are more likely to participate in the next wave.
In the survey research literature, according to the theory of social exchange (Goyder, 1987;
Groves et al., 1992), socio-economic status has a non-linear effect on co-operation: low and
high socio-economic status groups are less likely to co-operate than average. We include four
indicators of socio-economic status: years of education (and its square), household income
quartile dummies, living in a (semi-)detached house as a proxy for wealth, and employment
status. Education might be positively correlated with retention as those with higher education
might appreciate the value of research more (Groves and Couper, 1998). Years of education is
statistically significant and has a non-linear effect on retention. Income quartiles are significant
as well. Compared with individuals having high household income (fourth quartile), wave 3
respondents with lower household income are more likely to participate in wave 4. Also in this
case we find a non-linear effect (the second quartile dummy has the largest estimated coefficient).
Living in a detached or semi-detached house increases the chances of co-operation in wave 4. (As
missing information is especially related to questions of the IV module regarding the area and
18 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
type of building, we run our analysis including those observations by adding binary indicators
for missing information. The results, which are available on request, do not change.) This is in
line with previous research that found lower co-operation among people living in flats (Goyder,
1987; Groves and Cooper, 1998) and may suggest the presence of a wealth effect on retention.
Socio-economic conditions seem to be relevant for co-operation in a later panel wave.
Compared with individuals in a non-working condition (retired, unemployed, sick or disabled
and homemakers), workers do not have a statistically different probability of co-operating in
the next wave. (We obtained similar results when including additional non-working condition
dummies (retired, unemployed and disabled).) It seems that work-related time constraints do not
matter once individuals have enrolled in the panel, which is different from what has been found
by Durrant and Steele (2009) using cross-sectional data. Time constraints theory considers the
fact that a rather long and detailed questionnaire—that has the advantage of collecting a rich
set of information—requires quite some time to answer all the questions. This might create
problems when respondents are still in employment and must be kept in mind when examining
statistics such as employment rates later in life, for which survey participation or even attrition
could be an issue. Other factors, such as the characteristics of the area where the respondent
lives, might play a role in predicting (continued) co-operation. Living in an urban area in our
case is not significant.
In addition to this standard set of respondent characteristics, we use respondent level para-
data. Compared with continuous participation, individuals with interrupted response patterns
are less likely to participate again. As interrupted participation might signal a subgroup of
respondents who are difficult to retain, we report in Section 5.3 how the effect of such an in-
dicator changes when interacted with interviewer attributes (such as experience with SHARE
fieldwork). We can also observe that a very good or good level of willingness to answer and not
having asked for clarification during the interview in wave 3 are highly significant predictors
of higher probability of co-operation in wave 4. As already explained earlier, we show that the
percentage of missing information in monetary amount questions is a significant predictor of
co-operation failure in wave 4. This result is consistent with the theory of a latent co-operation
continuum (Burton et al., 1999).
As paradata at the respondent level, we also use the length of the whole interview in wave 3.
Both the length of the interview in hours and its square are highly statistically significant,
showing an inverse-u-shaped effect; therefore, interview length has a positive association with
co-operation up to a certain point, roughly 1.6 h, when the probability of co-operating starts
to decrease. Pace is an alternative way of capturing the potential burden that is experienced by
respondents in the previous wave. Here, we define pace as the ratio of length to the number
of items asked and thus accounts for differences in instrument length by respondent type (for
applications see Korbmacher and Schr ¨
oder (2013) and Loosveldt and Beullens (2013)). In the
case of SHARELIFE the number of items asked is similar across respondent types, and this may
explain why the results do not substantially change when we replace length with pace. Using
interview pace rather than length does not change our results (estimates are available on request).
This is in line with previous findings and supports the argument that longer interviews are—at
least up to a certain point—a proxy for pleasant talkative interviews instead of a respondent
burden. We should note that interview length measures the combined interviewer–respondent
interaction and is therefore not exogenous to the interview process (Watson and Wooden, 2009).
To identify the causal effect of interview length one would probably require an experimental
setting, which is out of scope for this paper.
To understand the variation at the interviewer level, we add some sociodemographic con-
trols, age and gender, a variable indicating experience in previous SHARE waves, the average
Contributions to Panel Co-operation 19
number of contacts per interviewer and two dummies capturing interviewers’ quality based on
grip strength rounding behaviour. Age and gender do not significantly affect co-operation in
wave 4, whereas experience does play a role; more precisely, having experience with previous
SHARE waves increases the likelihood of retaining respondents in the survey. Results con-
cerning interviewer experience are consistent over different studies, leading to the conclusion
that experience is positively associated with gaining co-operation (West and Blom, 2017). (See
Groves and Couper (1998), Hox and de Leeuw (2002), J¨
ackle et al. (2013) and Lipps and Pollien
(2011).) However, it is still unclear what drives the effect, i.e. whether this is a selection effect
(bad interviewers quit—J¨
ackle et al. (2013)) or a learning effect (interviewers improve their skills
in approaching resistance over time—Lemay and Durand (2002)). Durrant et al. (2010) showed
that experience in terms of skill level acquired matters more than the time spent on the job.
Our results are partly in line with the previous findings by J¨
ackle et al. (2013) on the effect of
experience, measured in years working for the survey agency. Regarding the average number of
contacts, we see that an interviewer who on average registers many contacts is less likely to gain
co-operation. A high average number of contacts can be an indicator of interviewer quality, i.e.
such interviewers are less persuasive, or it can be seen as a measure of a workload complexity,
as interviewers with difficult case-loads end up trying more times.
It can also be noted that the two variables measuring interviewer quality in terms of diligent in-
terviewing behaviour are significant, with signs as predicted previously. If interviewers’ rounded
grip strength scores more or less than average in wave 3, then gaining co-operation in wave 4 is less
likely than in cases in which the rounding percentage is as expected. This finding is in accordance
with Korbmacher and Schr¨
oder (2013) on consent to record linkage. Whereas rounding too of-
ten is a clear indication of poor compliance to quality standards, rounding too little is probably
due to interviewers strategically avoiding multiples of 5 to prevent being accused of cheating.
The final set of covariates in Table 5 is related to harmonized information that is collected
at the survey agency level to gain knowledge on the correlation between survey agency strate-
gies and co-operation. In this model specification we consider the variable daily contact that
captures the frequency of communication between the survey agencies and their interviewers.
We find that having daily contact with interviewers increases the chances of obtaining the co-
operation of respondents. This result hints at the importance of communication between survey
agency co-ordinators and interviewers to conduct surveys successfully. We report in Table 6 a
model specification (the last column) in which both three-level variables (priority agency and
daily contact) are included among controls together with the two model specifications (the sec-
ond and third columns) in which the three-level predictors are instead included one at a time. (We
consider whether the priority of the projects is decided by the survey agency compared with situ-
ations in which interviewers can totally or partly choose how to organize their work. This can be
seen as a variable capturing the extent to which interviewers are autonomous and free to choose
between several projects on which they are currently working (e.g. working on SHARE or work-
ing on another survey on a specific day).) priority agency is never significant. Degrees-of-freedom
considerations lead us to be parsimonious in level 3 specification and therefore we decided not to
include priority agency in the main specification. (Our simulations show that parsimony is a key
issue to reduce bias in the estimation of level 3 variance; see Appendix A for additional details.)
5.2. Variance component analysis
Table 7 reports the results of various specifications of random-intercept models, without and
with covariates, in terms of estimated variance components, intraclass correlations and model
fit statistics. (A similar approach can be found in Blom et al. (2011) about interviewer effects on
non-response in the European Social Survey. Although the approach is similar, we refrain from
20 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 6. Estimated multilevel models including alternative sets of agency characteristics (dependent vari-
able: co-operation)
Results including at the third level
daily contact priority agency Both controls
Respondent characteristics
Female 0.100§ 0.099§ 0.100§
(0.049) (0.049) (0.049)
Age 0.171§§ 0.171§§ 0.171§§
(0.029) (0.029) (0.029)
Age squared 0:001§§ 0:001§§ 0:001§§
(0.000) (0.000) (0.000)
Being in poor health 0:194§§ 0:195§§ 0:192§§
(0.051) (0.051) (0.051)
Single 0.211§§ 0.211§§ 0.210§§
(0.069) (0.069) (0.069)
Any proxy 0:194§ 0:195§ 0:192§
(0.097) (0.097) (0.097)
Years of education 0.045§ 0.041‡ 0.047§
(0.021) (0.021) (0.021)
Years of education squared 0:003§§ 0:002§§ 0:003§§
(0.001) (0.001) (0.001)
Household income—1st 0.234§§ 0.236§§ 0.233§§
quartile (0.078) (0.078) (0.078)
Household income—2nd 0.376§§ 0.378§§ 0.375§§
quartile (0.078) (0.078) (0.078)
Household income—3rd 0.153§ 0.157§ 0.150‡
quartile (0.078) (0.078) (0.078)
Living in a (semi-)detached 0.301§§ 0.297§§ 0.303§§
house (0.058) (0.058) (0.058)
Working 0:066 0:065 0:065
(0.067) (0.067) (0.067)
Living in an urban area 0.042 0.048 0.044
(0.073) (0.073) (0.073)
Paradata at the respondent level
Interrupted response pattern 0:977§§ 0:972§§ 0:975§
(interviewed in wave 1 but (0.083) (0.083) (0.083)
not in wave 2)
Item non-response to 0:527§§ 0:526§§ 0:527§§
monetary questions (0.088) (0.089) (0.088)
Length of interview (h) 1.147§§ 1.152§§ 1.151§§
(0.272) (0.273) (0.272)
Length of interview squared (h) 0:362§§ 0:362§§ 0:363§§
(0.114) (0.114) (0.114)
Willingness to answer 0.451§§ 0.441§§ 0.450§§
(0.090) (0.090) (0.090)
Did not ask for clarification 0.264§§ 0.260§§ 0.264§§
(0.068) (0.068) (0.068)
Interviewer characteristics (wave 4)
Age 0:004 0:007 0:004
(0.005) (0.005) (0.005)
Female 0.060 0.075 0.064
(0.104) (0.105) (0.104)
Experience (previous SHARE 0.642§§ 0.611§§ 0.608§§
waves) (0.109) (0.114) (0.113)
(continued)
Contributions to Panel Co-operation 21
Tab le 6 (continued )
Results including at the third level
daily contact priority agency Both controls
Interviewer characteristics (wave 4)
Contacts 0:133‡ 0:161§ 0:128‡
(0.069) (0.068) (0.070)
Rounding to a multiple of 5 for 0:239§ 0:219§ 0:247§
grip strength measure (too (0.105) (0.105) (0.105)
many)
Rounding to a multiple of 5 for 0:769§§ 0:785§§ 0:750§§
grip strength measure (too (0.228) (0.230) (0.229)
few)
Agency control variables
Daily contact 0.714§§ 0.689§§
(0.153) (0.141)
Priority decided by survey 0.180 0.136
agency (0.211) (0.116)
Constant 5:389§§ 5:062§§ 5:433§§
(1.075) (1.081) (1.074)
σ2
u(interviewer level) 1.006 1.000 1.008
σ2
v(agency level) 0.007 0.081 0.001
N16945 16945 16945
†Standard errors are in parentheses; p-values for fixed effect covariates significance refer to Wald-type tests.
p<0:05.
§p<0:01.
§§p<0:001.
comparing the findings across SHARE and the European Social Survey here. Non-response
processes can differ substantially between cross-sectional co-operation and co-operation in a
later wave of a panel.) The definitions of level 2 (ICCj) and level 3 (ICCk) intraclass correlations
in a three-level logit model are provided in Appendix A.
Looking at the intraclass correlations, we note that survey agencies contribute about 1.6% of
the variation, whereas interviewers account for about 25.9% (model 0 in Table 7). On the basis
of the adjusted likelihood ratio test, we reject the null that the third-level variance component
is 0. The test statistic takes value 10.90, and it is asymptotically distributed as a mixture of
χ2- with 0 and χ2-distributions with 1 degree of freedom (Self and Liang, 1987). The intraclass
correlations in Table 7 suggest that most (72.5%, i.e. 100(1 ICCjICCk) in model 0) of the
variation in co-operation is at the individual level.
Table 7 also reports the Akaike information criterion values AIC as a measure of goodness of
fit for each successive model specification. Reductions in AIC show improvements in the model
fit. An examination of the log-likelihoods yields similar conclusions, whereby the full model is
to be preferred. The likelihood ratio test shows that adding respondent level paradata improves
the model significantly and reduces the scaled variance at the respondent level by 3%. (The per-
centage change in the scaled variance at the respondent level is defined as (3.125 3.226)/3.125,
following the approach of Couper and Kreuter (2013). Percentage changes in the other scaled
variance components are computed accordingly.) If we compare model 2 and model 3, in which
22 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 7. Estimated variance components, intraclass correlations and model fit statistics for various model
specifications of the multilevel models of co-operation
Variance component Results for Results for Results for Results for Results for
model 0, model 1, model 2, model 3, model 4,
intercept only respondent respondent interviewer agency
paradata
Not scaled
σ2
e(individual level) 3.29
σ2
u(interviewer level) 1.174 1.184 1.198 1.000 1.006
σ2
v(agency level) 0.073 0.093 0.086 0.089 0.007
Scaled
σ2
e(individual level) 3.29 3.226 3.125 3.048 3.031
σ2
u(interviewer level) 1.174 1.161 1.138 0.926 0.927
σ2
v(agency level) 0.073 0.091 0.082 0.082 0.006
Intraclass correlation (scaled variances)
ICC
j(interviewer level) 0.259 0.260 0.262 0.228 0.234
ICCk(agency level) 0.016 0.020 0.019 0.020 0.002
Log-likelihood 6917:251 6835:011 6694:445 6667:654 6661:432
Likelihood ratio test against 164.48 281.13 53.58 12.44
previous column model (14; 0.000) (6; 0.000) (6; 0.000) (1; 0.000)
(degrees of freedom; p-value
of likelihood ratio test)
Model fit statistic AIC 13840.5 13704.02 13434.89 13393.31 13382.86
†Observations: 16945 respondents, 643 interviewers, 11 agencies; ICC, intraclass correlation; AIC, Akaike infor-
mation criterion.
we introduce interviewer characteristics, it can be seen that this set of interviewer level fixed ef-
fects accounts for a modest proportion of the variation at that level. Comparing the variance σ2
u
between model 2 and model 3, we see that about 19% of the variation is captured by interviewer
age, gender, experience, average number of contacts and rounding behaviour. The likelihood
ratio test reveals that adding interviewer characteristics as predictors of co-operation results in
a statistically significant improvement in model fit (p<0:0001).
Finally, in model 4 we add survey agency fieldwork strategies. The inclusion of survey-agency-
related variables captures a large part of the variation at the third level; comparing σ2
vbetween
model 3 and model 4, we note that we can explain about 90% of the variation. However,
we need to take into account the fact that the variation at the survey agency level in total is
rather small in comparison with the variance at the interviewer level. We recall that the survey
agencies contribute about 1.6% of the variation, whereas interviewers account for about 24.9%.
According to the likelihood ratio test, adding the survey agency characteristic as a predictor of
co-operation improves the model fit (p<0:0001).
Our results should be interpreted cautiously because accuracy of higher level parameter esti-
mates might be problematic in the context of multilevel models, particularly when the number
of groups is small. In Appendix A we present simulation analyses along this line.
5.3. Cross-level interactions and robustness analysis
In this subsection we show that our results are robust to the inclusion of cross-level interactions
and to various changes in model specification.
Contributions to Panel Co-operation 23
Considering cross-level interactions allows us to investigate non-co-operation for certain
subgroups of respondents who are difficult to interview for several reasons—e.g. individuals
in bad health, employed, living alone and with ‘unpleasant’ previous interview experience.
We focus on how the effect of individual characteristics differs according to interviewer at-
tributes.
As Groves and Couper (1998) suggested, interviewers with more experience are more able
to gain co-operation in problematic situations (e.g. resistance). Therefore, we first investigate
whether interviewers’ experience can mitigate the negative association of respondent bad health,
marital status and previous interview indicators (item non-response and interrupted response
patterns) with co-operation. We find statistically significant interaction effects only for inter-
rupted response patterns. To clarify: in the last column of Table 5, we see that experience has a
0.642 positive coefficient and interrupted response pattern has a negative coefficient of 0.977. In
the second column of Table 8 we see that experience has a 0.716 positive coefficient, interrupted
response pattern a 0.601 negative coefficient and their interaction a 0.675 negative coefficient.
Thus, experience per se is predictive of retention, but experienced interviewers are less likely to
gain co-operation when the respondent has an interrupted history of participation than inex-
perienced interviewers. A possible explanation is that experienced interviewers put more effort
where they expect higher rewards—and do not work as hard at regaining co-operation where
they know that respondents are more difficult to keep in the sample.
Although the gender of the interviewer is generally not significant in explaining co-operation
(West and Blom, 2017), we investigate whether for at least some respondents it plays a role. We
include in the model cross-level interactions between interviewer gender (female) and respon-
dent characteristics (such as bad health, marital status and previous interview indicators) to
see whether being interviewed by a female changes the propensity to participate. We find sta-
tistically significant effects for marital status: the positive correlation between being single and
co-operation in the baseline model specification (Table 5, model 4) seems to be mainly driven
by singles interviewed by female interviewers (see the third column of Table 8).
Finally, on the basis of the evidence that sociodemographic similarities between respondent
and interviewer increase the propensity to co-operate (West and Blom, 2017), we test whether
matching based on age and gender affects co-operation. We find that the nearness of age between
interviewer and respondent, measured as the distance between interviewer and respondent age,
has an insignificant effect on co-operation. We find a similar insignificant result for gender
concordance.
We further ran robustness analyses by redefining the estimation sample, the list of covariates
and the number of levels considered.
We redefine our estimation sample along three dimensions. First, we look at the effect of
carrying out the analysis at the household, rather than the individual, level (the fourth col-
umn of Table 8). Although Durrant and Steele (2009) highlighted that co-operation is a com-
plex social phenomenon that is explained by individual rather than household characteristics,
we show that household level estimates are in line with individual level estimates. Next, in
the fifth column of Table 8, we drop interviewers with fewer than six interviews. This sec-
ond model specification addresses the potential inaccuracy of the estimates when the group
sizes are small (Hox, 2010). The results do not change. We use this model specification to
perform the goodness-of-fit test that was proposed by Perera et al. (2016) and fail to reject
the null hypothesis that the specified model fits the data well. (Perera et al. (2016) developed
the goodness-of-fit test for a two-level model. Therefore, in performing the test we treat our
model specification as if was a two-level model. The computer code to perform the test is avail-
able from http://wileyonlinelibrary.com/journal/rss-datasets.) Lastly, in
24 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 8. Robustness analysis—multilevel model estimates (dependent variable: co-operation)
Cross-level interactions Additional robustness analysis
Interrupted Single ×female Household Number of No proxy Interviewer Two-level model
response pattern ×interviewer level interviews >5 interviews education grouped countries
interviewer experience
Respondent characteristics
Female 0.102§ 0.098§ 0.167§§ 0.108§ 0.120§ 0.069 0.100§
(0.049) (0.049) (0.059) (0.049) (0.051) (0.054) (0.049)
Age 0.173§§ 0.171§§ 0.207§§ 0.172§§ 0.166§§ 0.171§§ 0.171§§
(0.029) (0.029) (0.036) (0.029) (0.031) (0.033) (0.029)
Age squared 0:001§§ 0:001§§ 0:001§§ 0:001§§ 0:001§§ 0:001§§ 0:001§§
(0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)
Being in poor health 0:195§§ 0:194§§ 0:184§§ 0:200§§ 0:181§§ 0:182§§ 0:195§§
(0.051) (0.051) (0.060) (0.051) (0.053) (0.056) (0.051)
Single 0.206§§ 0.066 0.341§§ 0.200§§ 0.229§§ 0.211§§ 0.212§§
(0.069) (0.098) (0.079) (0.070) (0.072) (0.077) (0.069)
Proxy 0:192§ 0:193§ 0:355§§ 0:183‡ 0:157 0:194§
(0.097) (0.097) (0.123) (0.098) (0.104) (0.097)
Years of education 0.045§ 0.045§ 0.004 0.046§ 0.042‡ 0.046§ 0.047§
(0.021) (0.021) (0.026) (0.021) (0.022) (0.023) (0.021)
Years of education squared 0:003§§ 0:003§§ 0:000 0:003§§ 0:002§§ 0:003§§ 0:003§§
(0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
Household income—1st 0.236§§ 0.235§§ 0.114 0.238§§ 0.213§§ 0.268§§ 0.235§§
quartile (0.078) (0.078) (0.096) (0.078) (0.081) (0.085) (0.078)
Household income—2nd 0.384§§ 0.378§§ 0.268§§ 0.393§§ 0.380§§ 0.400§§ 0.376§§
quartile (0.078) (0.078) (0.084) (0.078) (0.080) (0.084) (0.078)
Household income—3rd 0.154§ 0.154§ 0.130 0.162§ 0.153‡ 0.195§ 0.154§
quartile (0.078) (0.078) (0.084) (0.078) (0.080) (0.086) (0.078)
Living in a (semi-)detached 0.299§§ 0.301§§ 0.274§§ 0.313§§ 0.303§§ 0.327§§ 0.303§§
house (0.058) (0.058) (0.067) (0.058) (0.060) (0.063) (0.058)
Working 0:065 0:065 0.001 0:050 0:055 0:146§ 0:063
(0.067) (0.067) (0.082) (0.068) (0.069) (0.075) (0.067)
Living in an urban area 0.046 0.040 0.024 0.033 0.051 0.068 0.041
(0.073) (0.073) (0.082) (0.074) (0.076) (0.081) (0.073)
(continued)
Contributions to Panel Co-operation 25
Tab le 8 (continued )
Cross-level interactions Additional robustness analysis
Interrupted Single ×female Household Number of No proxy Interviewer Two-level model
response pattern ×interviewer level interviews >5 interviews education grouped countries
interviewer experience
Paradata at the respondent level
Interrupted response 0:601§§ 0:977§§ 0:924§§ 0:974§§ 0:946§§ 0:979§§ 0:978§§
pattern (interviewed in (0.124) (0.083) (0.090) (0.083) (0.087) (0.089) (0.083)
wave 1 but not in wave 2)
Item non-response to 0:523§§ 0:529§§ 0:570§§ 0:537§§ 0:498§§ 0:546§§ 0:526§§
monetary questions (0.088) (0.088) (0.107) (0.089) (0.094) (0.097) (0.088)
Length of interview (h) 1.122§§ 1.152§§ 0.651§§ 1.088§§ 1.179§§ 1.267§§ 1.164§§
(0.272) (0.272) (0.154) (0.275) (0.286) (0.297) (0.274)
Length of interview
squared (h) 0:350§§ 0:364§§ 0:110§§ 0:336§§ 0:356§§ 0:435§§ 0:367§§
(0.114) (0.114) (0.039) (0.115) (0.120) (0.125) (0.114)
Willingness to answer 0.451§§ 0.452§§ 0.383§§ 0.457§§ 0.506§§ 0.370§§ 0.454§§
(0.090) (0.090) (0.111) (0.091) (0.097) (0.096) (0.090)
Did not ask for clarification 0.264§§ 0.262§§ 0.285§§ 0.260§§ 0.279§§ 0.302§§ 0.265§§
(0.068) (0.068) (0.080) (0.069) (0.072) (0.074) (0.068)
Interviewers’ characteristics (wave 4)
Age 0:004 0:004 0:002 0:004 0:004 0:005 0:003
(0.005) (0.005) (0.004) (0.005) (0.005) (0.005) (0.005)
Female 0.055 0.005 0.100 0.048 0.052 0.050 0.056
(0.104) (0.107) (0.100) (0.107) (0.105) (0.120) (0.104)
Interviewer education 0.037
(ISCED 5–6) (0.127)
Experience with working
on
0.716§§ 0.644§§ 0.688§§ 0.690§§ 0.623§§ 0.663§§ 0.634§§
previous SHARE waves (0.110) (0.109) (0.104) (0.112) (0.111) (0.125) (0.109)
Contacts 0:133‡ 0:134‡ 0:101 0:142‡ 0:130‡ 0:145‡ 0:123‡
(0.069) (0.069) (0.063) (0.074) (0.071) (0.075) (0.065)
Rounding to a multiple of 5 0:241§ 0:237§ 0:255§ 0:277§§ 0:239§ 0:199‡ 0:249§
for grip strength measure (0.105) (0.105) (0.101) (0.107) (0.107) (0.119) (0.105)
(too many)
(continued overleaf )
26 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Tab le 8 (continued )
Cross-level interactions Additional robustness analysis
Interrupted Single ×female Household Number of No proxy Interviewer Two-level model
response pattern ×interviewer level interviews >5 interviews education grouped countries
interviewer experience
Interviewers’ characteristics (wave 4)
Rounding to a multiple of 5 0:785§§ 0:772§§ 0:763§§ 0:901§§ 0:809§§ 0:779§§ 0:763§§
for grip strength measure (0.228) (0.228) (0.224) (0.259) (0.232) (0.248) (0.228)
(too few)
Interactions
Single ×female interviewer 0.236§
(0.115)
Interrupted response 0:675§§
pattern ×interviewer (0.165)
experience
Agency control variables
Daily contact 0.712§§ 0.711§§ 0.664§§ 0.707§§ 0.695§§ 0.591§§ 0.665§§
(0.151) (0.152) (0.134) (0.149) (0.154) (0.216) (0.150)
Southern countries 0.116
(0.172)
Central countries 0.053
(0.122)
Constant 5:477§§ 5:344§§ 6:787§§ 5:484§§ 5:333§§ 5:343§§ 5:543§§
(1.076) (1.076) (1.312) (1.082) (1.131) (1.221) (1.085)
σ2
u(interviewer level) 1.006 1.005 0.763 1.003 1.012 1.072 1.011
σ2
v(agency level) 0.006 0.007 <0:001 0.004 0.007 0.012
N16945 16945 11890 16713 15913 13574 16945
†Standard errors are in parentheses; p-values for fixed effect covariates significance refer to Wald-type tests. Household level model specification: the interview
length is defined as the sum of the single-interview lengths; participation in the previous waves is defined at the household level.
p<0:05.
§p<0:01.
§§p<0:001.
Contributions to Panel Co-operation 27
the sixth column of Table 8, we drop proxy interviews to check whether this rather particular
subsample of SHARE respondents drives our baseline results, but we find that this is not so.
Our results are quite robust to the inclusion of further interviewer level controls. For a sub-
group of interviewers, we have information on education (interview education (ISCED 5–6) is a
dummy that takes the value 1 if the interviewer has tertiary education) and in the seventh column
of Table 8 we show that adding this variable does not change our results. Here we do not report
estimation results for a model specification that includes a ‘short introduction’ variable (results
are available on request). This variable should capture interviewers who are likely to skip sec-
tion introductions and is an additional quality indicator. To ensure harmonization, interviewers
are instructed to read the whole CAPI question carefully. However, some interviewers do not
follow this instruction: when we compare keystroke data about section introductions, we find
that there are interviewers who read them quickly. This variable is insignificant and its inclusion
leaves other parameter estimates unchanged.
The results regarding the effects of survey agency practices on the conditional mean of the
dependent variable are also robust to the way that we treat level 3 variability. Although the
simulation exercise confirms the robustness of our baseline result regarding the positive effect of
the third-level control, we present in the final column of Table8atwo-level model with controls
for groups of countries. (We group countries as follows: the dummy Southern countries takes
the value 1 for Italy and Spain and 0 otherwise and the dummy Central countries takes the
value 1 for Belgium, Switzerland, Germany, the Czech Republic and Austria, whereas Northern
countries, the reference group, equals 1 for Denmark, Sweden and the Netherlands.) The third-
level variable daily contact remains highly significant.
6. Conclusions
Panel co-operation has been a long-standing issue in survey research, with several studies seek-
ing to identify the factors that affect subject attrition in panel surveys. Our analysis, based on
observational data, focuses especially on the role of paradata in providing additional infor-
mation to predict co-operation in a later wave of a panel. We are especially interested in the
factors affecting co-operation propensity that are ‘under the researcher’s control’: survey agency
fieldwork strategies, the features of interviewers and the respondent–interviewer interaction. We
investigate which paradata from SHARE waves 3 and 4 help to predict co-operation in wave 4
regarding
(a) the way that the previous interview was conducted,
(b) the characteristics of the wave 4 interviewer and
(c) agency level fieldwork indicators.
Using multilevel models, we find that factors at all three levels (respondent, interviewer and
survey agency) influence co-operation.
Panel respondents may base their co-operation decision on the way that their previous inter-
view was conducted. We find corroborating evidence for this: for instance, item non-response to
monetary questions predicts co-operation in the next wave—respondents who answered most
of the monetary items are more likely to participate in wave 4 than those who refused to an-
swer a considerable number of questions. The length of the interview is another factor that is
associated with co-operation in wave 4. We find that very long interviews are associated with
lower participation in later waves. However, as long as the total length of the interview is less
than 1.6 h, which holds for the vast majority of our cases, longer interviews are associated
positively with future co-operation, possibly reflecting the respondent’s interest in the survey
28 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
or the quality of the interviewer–respondent interaction. This finding shows the difficulty with
deriving implications for questionnaire development from interview length.
As far as interviewer characteristics are concerned, we find that previous experience with
working as a SHARE interviewer matters more than sociodemographic characteristics, such
as age, gender or education. This is in line with the literature: interviewers’ gender and age
have been generally found to be weak or insignificant determinants of co-operation, whereas
experience does play a role, although the mechanisms behind it are still not well understood
(West and Blom, 2017). Interviewers who perform well on survey tasks that require diligence are
also more successful in gaining co-operation. This again reflects the importance of high qual-
ity training and selecting diligent individuals as interviewers. Although the interviewer work
quality indicators that we have are statistically significant, they account, together with socio-
demographic characteristics, for only a modest percentage of the variance at the interviewer
level. Important determinants, which were not considered here because of lack of informa-
tion, are for instance interviewer continuity (Watson and Wooden, 2009; Lynn et al., 2014),
socio-economic status, general attitudes, own behaviour, expectations and more comprehensive
measures of job experience.
Finally, regarding survey-agency-related controls, we find that having contact with interview-
ers every day increases the chances of gaining respondents’ co-operation. This result may high-
light the importance of communication between survey agency co-ordinators and interviewers
to conduct surveys successfully, but it may also point to other factors at the survey agency level
that affect respondents’ co-operation (such as the relative importance that the survey agency
attaches to SHARE compared with other surveys that they are managing at the same time). The
limited number of survey agencies in our sample and the paucity of agency indicators prevent us
from using more agency level covariates and limits our ability to ascertain which is the correct
explanation. To investigate further the role of survey agency controls one should probably use
the most recent SHARE waves (7 and 8) that cover a much larger number of countries. Ideally,
more detailed quantitative paradata at the agency level should also be collected.
We have also investigated cross-level interactions: the most interesting finding is that the
interviewer’s experience is generally predictive of retention, except when the respondent has an
interrupted history of participation. A possible explanation is that experienced interviewers put
in less effort when they expect lower chances of success. We also find significant interaction
effects between the interviewer’s gender (female) and respondent marital status (being single),
and this may be used to devise a profitable assignment strategy.
Our analysis provides a description of response behaviour in SHARE for a specific, relatively
early wave. Even in this setting, we have shown that an interrupted participation pattern makes
retention less likely. The response process in later waves might depend on previous participation
in more complex ways. To investigate this one should consider the whole longitudinal gross
sample, i.e. all the individuals who have been interviewed at least once, as this would allow the
separation of retention and recovery. The underlying mechanisms for subsequent participation
on the one hand (retention) and interrupted participation on the other hand (recovery) might
differ. We leave this to future research.
Acknowledgements
We are grateful for comments and suggestions made by participants at the conference of the
European Survey Research Association, the Panel Survey Methods Workshop and the sem-
inar of the Munich Center for the Economics of Aging, as well as by referees and the Joint
Editor. We gratefully acknowledge discussions with Thorsten Kneip, Julie Korbmacher, Omar
Contributions to Panel Co-operation 29
Paccagnella and Annette Scherpenzeel. This paper uses data from SHARE wave 1, wave 2,
wave 3 (SHARELIFE) and wave 4 release 6.0.0, at March 31st, 2017 (digital object identifier
(DOI) 10.6103/SHARE.w1.600; DOI 10.6103/SHARE.w2.600; DOI 10.6103/SHARE.w3.600;
DOI 10.6103/SHARE.w4.600). The SHARE data collection has been primarily funded by
the European Commission through the fifth framework programme (project QLK6-CT-2001-
00360 in the thematic programme ‘Quality of life’), through the sixth framework programme
(projects SHARE-I3, RII-CT-2006-062193, COMPARE, CIT5-CT-2005-028857 and SHARE-
LIFE, CIT4-CT-2006-028812) and through the seventh framework programme (SHARE-PREP
211909, SHARE-LEAP 227822 and SHARE M4 261982). Additional funding from the US Na-
tional Institute on Aging (U01 AG09740-13S2, P01 AG005842, P01 AG08291, P30 AG12815,
R21 AG025169, Y1-AG-4553-01, IAG BSR06-11 and OGHA 04-064) and the German Ministry
of Education and Research as well as from various national sources is gratefully acknowledged
(see www.share-project.org for a full list of funding institutions).
Appendix A: Simulation study
Multilevel model estimation is generally based on a maximum likelihood approach and standard errors
are derived under the assumption of an asymptotic normal distribution of the estimator. There are several
simulations studies which assess finite sample performance of multilevel models when the outcome is
continuous (see Maas and Hox (2005) for a recent review), but fewer analyses exist for discrete response
multilevel models. Moreover, these results are mainly for two-level binary models (see Paccagnella (2011)
for a literature review), with the exception of a recent study by Kim et al. (2013).
The main conclusions of the simulation analyses for binary multilevel models are that parameter es-
timates are downward biased whenever there are few observations per group (Rodr´
ıguez and Goldman,
1995), and when the number of groups is small, in particular when considering higher level covariates and
variance components (Bryan and Jenkins, 2016; Paccagnella, 2011). (Fewer than 30 groups lead to unac-
ceptable downward biases in the parameter estimates of a two-level logit model according to Bryan and
Jenkins (2016). Similar results are obtained in Paccagnella (2011).) Results for standard error bias exhibit
the same pattern. Paccagnella (2011) investigated the accuracy of model estimates in the case of a two-level
logit model and concluded that the bias in the fixed part of the model is negligible even with 10 clusters,
but the number of clusters should increase significantly to ensure accuracy of the variance components
estimate. Moreover, his simulation results show that the bias in the variance estimate is higher when the
second-level ICC is lower. Kim et al. (2013) focused on the comparison of estimation performance in both
two- and three-level models when using different methods and statistical packages but did not investigate
the role of group size and number on estimation accuracy. (Simulation results for the three-level speci-
fication are based on data sets in which there are 50 level 1 units, 10 level 2 units and 30 level 3 units
throughout.)
If the two-level logit model results extend to a three-level framework, two features of our model specifi-
cation are likely to imply inaccurate estimates: on the one hand, the small number of level 3 groups, i.e. the
number of survey agencies; on the other hand the small ICC at the third level. Given the lack of simulation
results on binary response three-level models, we study the finite sample properties of a multilevel logit
model with a simulation exercise in which the hierarchical structure of the data sets generated replicates
the structure of our survey data set.
Following Goldstein and Rasbash (1996), we specify our baseline model as follows:
logit{pijk |.Zijk ;β,ujk ,vk/}=β0+β1X1ij k +β2D1ij k +β3X2jk +β4D2jk +β5D3k+ujk +vk,
Yij k |ujk,vkBernoulli.pij k /.A:1/
where the controls Zijk are continuous and binary variables XÅand DÅrespectively, and the random effects
are independent and normally distributed, ujk N.0, σ2
u/and vkN.0, σ2
v/. We consider two other model
specifications: a null model and a model without level 3 controls.
The baseline model specification that is presented in equation (A.1) replicates the full model specification
estimated in the last column of Table 5. However, for simplicity we include only two controls at level 1 and
level 2, a continuous and a binary control (X1ij k ,X2jk ) and (D1ij k ,D2jk) respectively, and only one binary
control at the third level, having the same distribution of the daily contact binary variable in our model.
30 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
According to Davis and Scott (1995), the intraclass correlations at level 2 and level 3 in a multilevel logit
model are defined as
ICCj=σ2
u
σ2
e+σ2
u+σ2
v
,
ICCk=σ2
v
σ2
e+σ2
u+σ2
v
where σ2
e=π2=3. Varying level 2 and level 3 variances in ranges consistent with those estimated in
Tables 5 and 7, we exploit finite sample behaviour of estimates for a range of values of the intraclass
correlations: ICCj=[0:19, 0:26] and ICCk=[0:01, 0:035].
The true value of the parameters is reported in Table 9. Parameters are kept constant across model
specifications apart from the level 2 and level 3 variances.
In particular, we investigate the finite sample behaviour of estimates when the number of groups, Nk,
and the variance of the random effect at the third level, σ2
v, are small. In the simulations, the following
conditions vary:
(a) Nkassumes values in {5, 10, 15, 20, 25};
(b) σ2
vassumes values in {0:05, 0:1, 0:15}and the level 2 variance σ2
utakes values in {0:8, 1, 1:2}, in line
with model estimates in Table 7.
To replicate the variability in the group size that is observed in the data, we allow for heterogeneity in the
number of observations within each of the three levels. More precisely, the number of level 2 units within
each level 3 group can take five values .Sjk ={30, 45, 60, 75, 90}/, and this reproduces the variability in
the number of interviewers per survey agency in the data. The number of level 1 units within each level
2 group can take five values .Sij k ={10, 20, 30, 40, 70}/, which replicate the distribution of the number of
respondents per interviewers in the data. (These sets of five values are replicated according to the number
of level 3 and level 2 groups.)
Following Paccagnella (2011), for each combinations of the level 2 and level 3 variances we generate
1000 simulated data sets, R. To generate the covariates we simulate from five standard independent normal
distributions. The binary variables at level 1 and 2 take value 1 if the underlying continuous variable is
positive and 0 otherwise. The binary variable at level 3 is obtained from the underlying standard normal
distribution by imposing that the mean of the binary variable is 0.17, as for daily contact.
The random components ujk and vkare obtained with Rrandom draws from two independent normal
distributions with mean 0 and variances σ2
uand σ2
vrespectively.
Using the regression coefficients of Table 9, the regressors generated and the random components, we
compute πij k =logit.pij k /and derive pij k by applying the inverse logit function. Finally, each value of the
dependent variable Yijk is a random draw from a Bernoulli distribution with probability pij k .
To perform our simulation exercise we use Stata 15. The multilevel logit models are estimated with the
melogit command. The integration method that was used to integrate the approximated likelihood over
the random effects is mode curvature adaptive Gauss–Hermite quadrature with seven integration points.
To gain knowledge on the accuracy of the estimates of model parameters and their standard errors,
we report three summary measures: relative parameter bias, non-coverage rate and relative standard error
Tab le 9. True parameters’
values used in the simulation
analysis
Parameter True value
β01.00
β10.8
β20:3
β30:7
β40.4
β50:2
Contributions to Panel Co-operation 31
Table 10. Results from baseline model simulations when the
level 2 variance σ2
uis set to 1 and the level 3 variance σ2
vto 0.05
Parameter Results for the following values of Nk:
510152025
Relative parameter bias (%)
β00:39 0.30 0:37 0:13 0.64
β10.26 0:01 0:11 0:03 0.01
β20.33 0:27 0.09 0.10 0.04
β30:28 0.16 0:13 0.26 0.14
β40.11 0:31 0.27 0:08 1:30
β50:26 0.00 2:36 4:50 2.21
σ2
u0:75 0:58 0:56 0:30 0:14
σ2
v44:29 29:31 17:29 12:81 10:70
Non-coverage rate
β00.13† 0.10† 0.10† 0.07† 0.08†
β10.05 0.06 0.05 0.05 0.04
β20.04 0.05 0.06 0.05 0.06
β30.05 0.05 0.04 0.05 0.04
β40.05 0.06 0.05 0.05 0.04
β50.21† 0.12† 0.08† 0.09† 0.07†
σ2
u0.09† 0.05 0.06 0.05 0.06
σ2
v0.43† 0.29† 0.21† 0.17† 0.15†
Relative standard error bias (%)
β017:60 11:28 10:10 5:90 2:19
β10:85 2:99 0:51 2.02 3.34
β24.15 0.32 1:73 0.22 3:11
β31.84 1:07 0.14 2:10 1.34
β42.76 3:93 1.57 0.56 0:63
β513:38 13:92 10:42 12:06 6:85
σ2
u5:65 0.80 0.69 1.60 2:12
σ2
v17:30 14:25 8:50 6:66 6:47
†Significantly different from 0.05 at the 5% level of significance.
bias (Paccagnella, 2011; Bryan and Jenkins, 2016; Vassallo et al., 2017). The relative parameter bias is
computed as the percentage difference between estimated and true parameters. The non-coverage rate
(Mass and Hox, 2005) is used to assess the accuracy of the standard errors. It results from the average
over model replications of a binary indicator that takes value 1 if the true parameter value lies outside the
95% estimated confidence interval. The estimates are accurate if the relative parameter bias is close to 0
and the non-coverage rate is close to 5%. Given that the non-coverage rate might reflect both parameter
bias and standard error bias, following Bryan and Jenkins (2016) and Rodr´
ıguez and Goldman (1995),
we compute also standard error bias comparing the ‘analytical’ standard error—the average of estimated
standard errors over the replications—and the ‘empirical’ standard error—the standard deviation of the
estimated parameters based on the Rreplications (Greene, 2004).
In Table 10 we report simulations results for the case in which σ2
u=1 and σ2
v=0:05, the closest scenario
to our full model specification (the last column of Table 5). Focusing on the scenario with 10 groups at the
third level, the relative bias is 0 for most parameters with the exception of the level 3 variance, σ2
v, which
is 29.31% downward biased. The non-coverage rate is significantly different from 0.05 for both β5(the
coefficient of the level 3 dummy control) and σ2
v. This is the result of parameter bias and standard error
bias: according to the bottom panel of Table 10 the standard error of β5is underestimated by 13.92% and
the standard error of σ2
vis underestimated by 14.25%.
Generally, both parameter and standard error biases decrease as the number of level 3 groups increases
32 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Table 11. Results from null model simulations when the level 2 variance
σ2
uis set to 1.2 and the level 3 variance σ2
vto 0.15
Parameter Results for the following values of Nk:
510152025
Relative parameter bias (%)
β00:69 0.51 0:40 0.37 0.58
σ2
u0.13 0.18 0:08 0:05 0.14
σ2
v24:31 13:86 6:27 4:56 4:06
Non-coverage rate
β00.14† 0.10† 0.09† 0.08† 0.07
σ2
u0.06 0.05 0.06 0.05 0.06
σ2
v0.31† 0.22† 0.14† 0.13† 0.11†
Relative standard error bias (%)
β014:93 9:29 8:60 6:20 1:29
σ2
u3:66 0:23 1:33 2.79 2:62
σ2
v6:73 7:64 5:61 2:41 2:65
†Significantly different from 0.05 at the 5% level of significance.
(with the sole exception of the relative parameter bias of β5), but they remain far from the target value
even with 25 groups. In the case of σ2
vthe non-coverage rate is as high as 0.15 even with 25 groups.
By varying σ2
uand σ2
v, and thus ICC, the downward bias of σ2
vranges from 20% to 29% and it is lower
when the ICC at the third level is higher. Results of these further simulations are available on request.
Given the simulation results, in our application in Table 5 we are likely to underestimate the third-level
variance of the full model by about 29%.
The simulation results reveal that the distribution of the estimated level 3 control, β5, shows large vari-
ability for all values of NK. It is worth stressing, however, that the coefficient of daily contact would remain
statistically significant in our application even if its relative bias was equal to the 10th or the 90th percentile
of the relative bias distribution, and accounting for the 14% underestimation of the standard error.
In addition to the baseline model specification in equation (A.1), we replicate the simulation exercise
(by varying the level 2 and 3 variances as in the baseline scenario) for two alternative model specifications:
a null model (as in model 0 of Table 5) and a model without the binary level 3 control (as in model 3 of
Table 5). The rationale is to understand whether estimation accuracy changes with the number and level
of controls that are included and to provide some evidence on how we should expect the parameter bias
to change, varying the model specification as in Table 5.
In Table 11 the simulation results for the null model with σ2
u=1:2 and σ2
v=0:15 are reported (values
that are close to those estimated in the second column of Table 5). The results are very similar when the
model without level 3 control is considered instead. The downward bias in the estimation of the level
3 variance is reduced by about 50% and the same is true for the standard error bias. In particular, the
negative parameter bias of σ2
vis between 11% and 18% and the standard errors bias is between 6% and 9% in
the case of 10 level 3 units when we let ICC vary within the specified range. This provides a rule of thumb to
measure the downward bias in higher level variances for various model specifications as reported in Table 7.
This result is somewhat intuitive if we think that we need a ‘large’ sample size to ensure consistency and
efficiency in regression model parameter estimates. This extends also to level 3 parameters: we need a large
number of groups, i.e. more information to exploit, to estimate additional level 3 effects reliably (Bryan
and Jenkins, 2016).
We should point out that the 95% confidence interval that we use to derive the non-coverage rate is ob-
tained from the inversion of the Waldtest (as is nor mallydone in Stata). Berkhof and Snijders (2001) showed
that the Wald test has low power in the context of variance component tests and should not be used to test for
variance component significance. In fact, the Wald test relies on the assumption of asymptotic normality of
Contributions to Panel Co-operation 33
the maximum likelihood estimator and this is problematic when the random-effect variance is considered,
in particular if its value is close to 0, as 0 lies on the boundary of the parameter space (Maas and Hox, 2005).
Bottai (2003) examined the asymptotic behaviour of confidence intervals in the case in which infor-
mation is zero at a critical point of the parameter space. He compared several ways to derive confidence
intervals—inversion of the log-likelihood ratio test, of the Wald test and of the score test—and found that
the score-test-based confidence intervals, that use expected information (instead of observed information),
are the intervals that perform better. As stressed in Bottai and Orsini (2004), the problem of inference about
the variance of the random effect can be accommodated in this more general framework because, when
the variance component is 0, the score function is identically 0, and information is zero.
Bottai and Orsini (2004) developed the Stata routine xtvc that allows testing the null that the random-
effect variance is equal to a specific value (including 0) and computes ‘corrected’ confidence intervals
based on the inversion of the score test. This routine works for random-effects linear regression models
and can be used after the xtreg command in Stata. The simulation results that are presented in the paper
show that the observed rejection rate is close to the nominal 5% level, regardless of the number of groups
considered. The confidence interval that is obtained with the inversion of the score test is ‘slightly shifted
to include greater values’ (Bottai and Orsini (2004), page 432) with respect to the Wald confidence interval.
In our simulations we use Wald-based confidence intervals, as is normally done in the literature—see
for example the recent contribution by Vassallo et al. (2017)—even though they may be inaccurate as
the variance at level 3 is set to relatively small values. (Note that in our main model specification we
test variance component significance by using the adjusted likelihood ratio test (Section 5.2).) Possible
strategies to assess the level of inaccuracy would be extending Bottai and Orsini’s routine to multilevel
logit models, adopting a parametric bootstrapping strategy, or relying on alternative estimation procedures
such as the Bayesian Markov chain Monte Carlo algorithm.
Generalization of Bottai and Orsini’s (2004) routine to multilevel logit models requires working with a
marginal likelihood (in which random effects are integrated out) that does not have in this case a closed
form. This makes such a procedure computationally expensive. A parametric bootstrap method could be
used to construct 95% confidence intervals for the variance components (Kuk, 1995; Goldstein, 1996), but
this would be even more computationally expensive. (For each iteration of the simulation process, some
samples should be drawn from the model evaluated at current parameter estimates. The model should then
be estimated for each sample and the confidence intervals constructed from the distribution of parameters
estimates.)
Alternatively, a Bayesian Markov chain Monte Carlo algorithm—with non-informative priors to ease
comparability with maximum likelihood—could be used to perform the entire analysis. Such an algorithm,
which would also entail an extra computational burden to achieve convergence, would in fact directly pro-
vide confidence intervals for parameter estimates based on the posterior distributions. We know from
Rodr´
ıguez and Goldman (2001) that in three-level logistic models parameter estimates by using full max-
imum likelihood and Bayesian estimation are similar when the random-effects variances are large. To the
best of our knowledge, a Bayesian Markov chain Monte Carlo procedure for the three-level logit model in
the case where at least one variance is small has not been implemented in the literature; therefore we leave
further investigation of this issue to future research.
To the extent that we can draw from the existing literature, we can expect the confidence intervals for
random-effects variances based on the Wald test to be smaller and shifted towards 0 (see for example
Turner et al. (2001) and Browne and Draper (2006)).
References
Berkhof, J. and Snijders, T. (2001) Variance component testing in multilevel models. J. Educ. Behav. Statist.,26,
133–152.
Blom, A. G. (2012) Explaining cross-country differences in survey contact rates: application of decomposition
methods. J. R. Statist. Soc. A, 175, 217–242.
Blom, A. G., de Leeuw, E. D. and Hox, J. J. (2011) Interviewer effects on nonresponse in the European Social
Survey. J. Off. Statist.,27, 359–377.
Blom, A. G., Lynn, P. and J ¨
ackle, A. (2008) Understanding cross-national differences in unit non-response: the
role of contact data. Working Paper 2008-01. Institute for Social and Economic Research, University of Essex,
Colchester.
B¨
orsch-Supan, A., Brandt, M., Hunkler, C., Kneip, T., Korbmacher, J., Malter, F., Schaan, B., Stuck, S. and
Zuber, S. (2013) Data resource profile: the Survey of Health, Ageing and Retirement in Europe (SHARE). Int.
J. Epidem.,42, 992–1001.
34 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber
Bottai, M. (2003) Confidence regions when the Fisher information is zero. Biometrika,90, 73–84.
Bottai, M. and Orsini, N. (2004) Confidence intervals for the variance component of random-effects linear models.
Stata J.,4, 429–435.
Branden, L., Gritz, R. M. and Pergamit, M. R. (1995) The effect of interview length on attrition in the National
Longitudinal Survey of Youth. Report NLS 95-28. Bureau of Labor Statistics, Washington DC.
Browne, W. J. and Draper D. (2006) A comparison of Bayesian and likelihood-based methods for fitting multilevel
models. Baysn Anal.,1, 473–514.
Bryan, M. L. and Jenkins, S. P. (2016) Multilevel modelling of country effects: a cautionary tale. Eur. Sociol. Rev.,
32, 3–22.
Burton, J., Laurie, H. and Moon, N. (1999) Don’t ask me nothin’ about nothin’, I just might tell you the truth—the
interaction between unit nonresponse and item nonresponse. Int. Conf. Survey Nonresponse, Portland.
Campanelli, P. and O’Muircheartaigh, C. (1999) Interviewers, interviewer continuity, and panel survey response.
Qual. Quant.,33, 59–76.
Campanelli, P. and O’Muircheartaigh, C. (2002) The importance of experimental control in testing the impact of
interviewer continuity on panel survey nonresponse. Qual. Quant.,36 129–144.
Couper, M. P. and Kreuter, F. (2013) Using paradata to explore item level response times in surveys. J. R. Statist.
Soc. A, 176, 271–286.
Davis, P. and Scott, A. (1995) The effect of interviewer variance on domain comparisons. Surv. Methodol.,21,
99–106.
Durrant, G. B. and D’Arrigo, J. (2014) Doorstep interactions and interviewer effects on the process leading to
cooperation or refusal. Sociol. Meth. Res.,43, 490–518.
Durrant, G. B., Groves, R. M., Staetsky, L. and Steele, F. (2010) Effects of interviewer attitudes and behaviors
on refusal in household surveys. Publ. Opin. Q.,74, 1–36.
Durrant, G. and Kreuter, F. (2013) The use of paradata in social survey research. J. R. Statist. Soc. A, 176,
1–3.
Durrant, G.B. and Steele, F. (2009) Multilevel modelling of refusal and non-contact in household surveys: evidence
from six UK Government surveys. J. R. Statist. Soc. A, 172, 361–381.
Fielding, A. (2004) Scaling for residual variance components of ordered category responses in generalised linear
mixed multilevel models: quality and quantity. Eur. J. Methodol.,38, 425–433.
Fricker, S., Creech, B., Davis, J., Gonzalez, J., Tan, L. and To, N. (2012) Does length really matter?: Exploring
the effects of a shorter interview on data quality, nonresponse, and respondent burden. Federal Committee on
Statistical Methodology Research Conf., Washington DC.
Goldstein, H. (1996) Consistent estimators for multilevel generalised linear models using an iterated bootstrap.
Multilev. Modllng Newslett.,8, 3–6.
Goldstein, H. (2011) Multilevel Statistical Models, 4th edn. Chichester: Wiley.
Goldstein, H. and Rasbash, J. (1996) Improved approximations for multilevel models with binary responses. J. R.
Statist. Soc. A, 159, 505–513.
Goyder, J. (1987) The Silent Minority Boulder: Nonrespondents on Sample Surveys. Boulder: Westview.
Greene, W. (2004) The behaviour of the maximum likelihood estimator of limited dependent variable models in
the presence of fixed effects. Econmetr. J.,7, 98–119.
Groves, R. M., Cialdini, R. B. and Couper, M. (1992) Understanding the decision to participate in a survey. Publ.
Opin. Q.,56, 475–495.
Groves, R. M. and Couper, M. P. (1998) Nonresponse in Household Interview Surveys. New York: Wiley.
Hill, D. H. and Willis, R. J. (2001) Reducing panel attrition: a search for effective policy instruments. J. Hum.
Resour.,36, 416–438.
Hox, J. J. (2010) Multilevel Analysis: Techniques and Application, 2nd edn. New York: Routledge.
Hox, J. J. and de Leeuw, E. (2002) The influence of interviewers’ attitude and behavior on household survey
nonresponse: an international comparison. In Survey Nonresponse (eds R. M. Groves, D. A. Dillman, J. L.
Eltinge and R. J. A. Little). New York: Wiley.
J¨
ackle, A., Lynn, P., Sinibaldi, J. and Tipping, S. (2013) The effect of interviewer experience, attitudes, personality
and skills on respondent co-operation with face-to-face surveys. Sur. Res. Meth.,7, 1–15.
Kim, Y., Choi, Y.-K. and Emery, S. (2013) Logistic regression with multiple random effects: a simulation study
of estimation methods and statistical packages. Am. Statistn,63, 171–182.
Kneip, T. (2013) Survey participation in the fourth wave of SHARE. In SHARE Wave 4: Innovations and Method-
ology (eds F. Malter and A. B ¨
orsch-Supan), pp. 140–155. Munich: Munich Center for the Economics of Aging.
Korbmacher, J. M. and Schr ¨
oder, M. (2013) Consent when linking survey data with administrative records: the
role of the interviewer. Surv. Res. Meth.,7, 115–131.
Krause, N. (1993) Neighbourhood deterioration and social isolation in later life. Int. J. Agng Hum. Devlpmnt,36,
9–38.
Kreuter, F. (2013) Improving Surveys with Paradata: Analytic Uses of Process Information. Hoboken: Wiley.
Kreuter, F., Couper, M. P. and Lyberg, L. E. (2010) The use of paradata to monitor and manage survey data
collection. Proc. Surv. Res. Meth. Sect. Am. Statist. Ass., 282–296.
Krosnick, J. A. (1991) Response strategies for coping with the cognitive demands of attitude measures in surveys.
Appl. Cogn. Psychol.,5, 213–236.
Contributions to Panel Co-operation 35
Kuk, A. Y. C. (1995) Asymptotically unbiased estimation in generalized linear models with random effects. J. R.
Statist. Soc. B, 57, 395–407.
Lemay, M. and Durand, C. (2002) The effect of Interviewer Attitude on Survey Cooperation. Bull. Methodol.
Sociol.,76, 27–44.
Lepkowski, J. M. and Couper, M. P. (2002) Nonresponse in the second wave of longitudinal household surveys.
In Survey Nonresponse (eds R. M. Groves, D. A. Dillman, J. L. Eltinge and R. J. A. Little). New York: Wiley.
Lipps, O. and Benson, G. (2005) Cross national contact strategies. Proc. Surv. Res. Meth. Sect. Am. Statist. Ass.
Lipps, O. and Pollien, A. (2011) Effects of interviewer experience on components of nonresponse in the European
Social Survey. Fld Meth.,23, 156–172.
Loosveldt, G. and Beullens, K. (2013) The impact of respondents and interviewers on interview speed in face-to-
face interviews. Socl Sci. Res.,42, 1422–1430.
Loosveldt, G., Pickery, J. and Billiet, J. (2002) Item nonresponse as a predictor of unit nonresponse in a panel
survey. J. Off. Statist.,18, 545–557.
Lugtig, P. (2014) Panel attrition: separating stayers, fast attriters, gradual attriters, and lurkers. Sociol. Meth. Res.,
14, 699–723.
Lynn, P. (2013) Longer interviews may not affect subsequent survey participation propensity. Understanding Soci-
ety Working Paper Series 2013-07. Institute for Social and Economic Research, University of Essex, Colchester.
Lynn P., Kaminska, O. and Goldstein, H. (2014) Panel attrition: how important is it to keep the same interviewer?
J. Off. Statist.,30, 434–457.
Maas, C. and Hox, J. (2004) Robustness issues in multilevel regression analysis. Statist. Neerland.,58, 127–137.
Malter, F. and B ¨
orsch-Supan, A. (eds) (2013) SHARE Wave 4: Innovations & Methodology. Munich: Munich
Center for the Economics of Aging.
Moore, J., Stinson, L. and Welniak, E. (2000) Income measurement error in surveys: a review. J. Off. Statist.,16,
331–361.
Nicoletti, C. and Peracchi, F. (2005) Survey response and survey characteristics: microlevel evidence from the
European Community Household Panel. J. R. Statist. Soc. A, 168, 763–781.
O’Muircheartaigh, C. and Campanelli, P. (1999) A multilevel exploration of the role of interviewers in survey
non-response. J. R. Statist. Soc. A, 162, 437–446.
Paccagnella, O. (2011) Sample size and accuracy of estimates in multilevel models: new simulation results. Method-
ology,7, no. 3, 111–120.
Perera, A. A. P. N. M., Sooriyarachchi, M. R. and Wickramsuriya, S. L. (2016) A goodness of fit test for the
multilevel logistic model. Communs Statist. Simuln Computn,45, 643–659.
Pickery J. and Loosveldt, G. (2002) A multilevel multinomialanalysis of interviewer effects on various components
of unit nonresponse. Qual. Quant.,36, 427–437.
Pickery, J., Loosveldt, G. and Carton, A. (2001) The effects of interviewer and respondent characteristics on
response behavior in panel surveys: a multilevel approach. Sociol. Meth. Res.,29, 509–523.
Rabe-Hesketh, S. and Skrondal, A. (2005) Multilevel and Longitudinal Modeling using Stata, 2nd edn. College
Station: Stata Press.
Rodr´
ıguez, G. and Goldman, N. (1995) An assessment of estimation procedures for multilevel models with binary
responses. J. R. Statist. Soc. A, 158, 73–89.
Rodr´
ıguez, G. and Goldman, N. (2001) Improved estimation procedures for multilevel models with binary re-
sponses: a case-study. J. R. Statist. Soc. A, 164, 339–355.
Self, S. G. and Liang, K. Y. (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio
tests under nonstandard conditions. J. Am. Statist. Ass.,82, 605–610.
Sharp, L. M. and Frankel, J. (1983) Respondent burden: a test of some common assumptions. Publ. Opin. Q.,47,
36–53.
Schr¨
oder, M. (2011) Retrospective data collection in the Survey of Health, Ageing and Retirement in Europe:
SHARELIFE methodology. Munich Center for the Economics of Aging, Munich.
Skrondal, A. and Rabe-Hesketh, S. (2004) Generalized Latent Variable Modeling: Multilevel, Longitudinal and
Structural Equation Models. Boca Raton: Chapman and Hall–CRC.
Turner, R. M., Omar, R. Z. and Thompson, S. G. (2001) Bayesian methods of analysis for cluster randomized
trials with binary outcome data. Statist. Med.,20, 453–472.
Vassallo, R., Durrant, G. and Smith, P. W. F. (2017) Separating interviewer and area effects by using a cross-
classified multilevel logistic model: simulation findings and implications for survey designs. J. R. Statist. Soc.
A, 180, 531–550
Vassallo, R., Durrant, G. B., Smith, P. W. F. and Goldstein, H. (2015) Interviewer effects on non-response propen-
sity in longitudinal surveys: a multilevel modelling approach. J. R. Statist. Soc. A, 178, 83–99.
Watson, N. and Wooden, M. (2009) Identifying factors affecting longitudinal survey response. In Methodology
of Longitudinal Surveys (ed. P. Lynn). Chichester: Wiley.
West, B. T. and Blom, A. G. (2017) Explaining interviewer effects: a research synthesis. J. Surv. Statist. Methodol.,
5, 175–211.
... With its harmonized collection of data in many European countries, SHARE is unique and offers many opportunities to analyze dynamic processes in the European societies. Although previous research has shown that attrition occurs in the SHARE panel (Bergmann et al. 2019), little research has investigated in more detail the changes in the composition of the initially recruited panel sample over time (e.g., Bristle et al. 2019). Moreover, little is known about the relation between attrition and the changes in the panel composition over waves when mortality is particularly considered. ...
... Researchers have used these individual characteristics in almost all models for their substantive analyses based on SHARE data (SHARE-ERIC 2018). Additionally, some of these variables have been found to predict attrition in SHARE (Bristle et al. 2019). As Bristle et al. (2019) showed that item nonresponse to financial questions in SHARE negatively affected cooperation in the next wave, we supplemented the income quartiles with an additional category indicating that respondents did not answer the household income question. ...
... Additionally, some of these variables have been found to predict attrition in SHARE (Bristle et al. 2019). As Bristle et al. (2019) showed that item nonresponse to financial questions in SHARE negatively affected cooperation in the next wave, we supplemented the income quartiles with an additional category indicating that respondents did not answer the household income question. ...
Article
Full-text available
Attrition is a frequently observed phenomenon in panel studies. The loss of panel members over time can hamper the analysis of panel survey data. Based on data from the Survey of Health, Ageing and Retirement in Europe (SHARE), this study investigates changes in the composition of the initially recruited first-wave sample in a multi-national face-to-face panel survey of an older population over waves. By inspecting retention rates and R-indicators, we found that, despite declining retention rates, the composition of the initially recruited panel sample in Wave 1 remained stable after the second wave. Thus, after the second wave there is no further large decline in representativeness with regard to the first wave sample. Changes in the composition of the sample after the second wave over time were due mainly to mortality-related attrition. Non-mortality-related attrition had a slight effect on the changes in sample composition with regard to birth in survey country, area of residence, education, and social activities. Our study encourages researchers to investigate further the impact of mortality- and non-mortality-related attrition in multi-national surveys of older populations.
... Various types of paradata-process data compiled through subject recruitment and respondents' navigation of survey instruments and web portals-have advanced understanding of response behavior in panel studies of adults (Bristle et al., 2019;Callegaro, 2013;Kocar & Biddle, 2019;McClain et al., 2019). In addition to recruitment indicators, such as modes and number of contacts to obtain assent that are customarily generated through face-to-face and computer-assisted telephone interviews (CATI), digitally administered surveys produce access paradata that can advance understanding of compliance in web-administered surveys. ...
... Irrespective of target population, survey administration mode, or panel duration, there is evidence that noncompliance and attrition risks accumulate over waves in both CATI and web-administered surveys (Boys et al., 2003;Bristle et al., 2019;Coyne et al., 2017;Lugtig, 2014;Wagner et al., 2019). The mechanisms generating nonresponse and attrition in panel studies change over the span of the study for many reasons, including extraordinary personal events (e.g., job losses, medical emergencies, and relocation disruptions), response fatigue, and topic salience, among others (Barber et al., 2016;Groves et al., 2004;Kocar & Biddle, 2019;Lugtig, 2014). ...
... To address whether, how much, and in what ways adolescents participate in a mobile-optimized diary study about romantic relationships, we draw upon insights about topic salience (Barber et al., 2016;Groves et al., 2004;Schoeni et al., 2013), established findings about the social and economic correlates of survey participation (Couper, 2017;Groves et al., 2001;Lugtig, 2014;Wen et al., 2017), and recent insights about the power of paradata to understand response behavior in adults (Bristle et al., 2019;Callegaro, 2013;Kocar & Biddle, 2019;McClain et al., 2019). Specifically, we hypothesize that there is a trade-off between the amount of effort expended recruiting subjects and both the likelihood of enrolling in the diary study and longitudinal persistence. ...
Article
Full-text available
We analyze recruitment, access, and longitudinal response paradata from a yearlong intensive longitudinal study (mDiary) that used a mobile-optimized web app to administer 25 biweekly diaries to youth recruited from a birth cohort study. Analyses investigate which aspects of teen recruitment experiences are associated with enrollment and longitudinal response patterns; whether compliance behavior of teens who require multiple invitations to enroll differs from that of teens who enroll on the first invitation, and what personal and social circumstances are associated with different longitudinal compliance patterns. Latent class analysis (LCA) is used to derive longitudinal compliance classes. mDiary’s person-survey response rate of 70% is noteworthy considering reports that response rates for smartphone studies trail those administered via telephone or personal computers. Conditional on agreeing to participate, teens with texting capability were over 6 times as likely to enroll as their peers lacking access, and they also completed six to seven more diaries. Youth who required multiple prods to register not only were less likely to enroll than their peers who registered at the first invitation but also tended to attrite early. Compared with teens who completed all 25 surveys, those who attrited early had less access to texting capability, home Internet service, and also had low-education mothers. Consistent with studies of adults, nonparticipants were disproportionately Black males from socioeconomically disadvantaged backgrounds.
... The analysis of the data set showed that in 2018 (ESS9) there were the most refusals by the residences of apartment buildings, residents of individual buildings refused much less often, so there is reason to investigate these factors in more depth. The result confirms the findings [19] on the increased likelihood of successful interviews with respondents living in individual houses or duplexes. This is consistent with previous studies that have found a low level of cooperation of people living in apartments [4,20] and may indicate the impact of wealth on the ability to provide certain housing. ...
... This is in line with previous studies [20,21], which indicate a link between the type of home and the likelihood of participating in the survey, in particular less cooperation of respondents living in apartment buildings may indicate the effect of social status [17,18] as a latent variable. In addition, the low quality of the model can be explained by the nonlinearity of communicationrespondents with low and high social status are less likely to cooperate [19]. ...
... On the other hand, it can be seen that characteristics that are known to be strongly correlated with attrition, such as level of education or migrant background (e.g. Bristle, Celidoni, Dal Bianco, & Weber, 2019;Uhrig, 2008;Watson & Wooden, 2009), show the largest differences. ...
Article
Full-text available
Longitudinal surveys aim to correctly represent the population of interest over time. In this respect, panel attrition, i.e. the systematic drop-out of sample members, is a major challenge for maintaining long-running panel surveys. A second problem might arise when some sample members die during the life of the panel. This holds in particular for panel surveys that consider (mainly) older people, because here the overall mortality rate is higher than in studies including all age groups. Distinguishing between mortality and other forms of attrition hence is crucial as the death of respondents in a longitudinal survey is a natural process that needs to be considered in order to maintain representativeness of the panel sample. If mortality is not taken into account properly, attrition analyses might overestimate the effect of systematic drop-outs for variables that are highly correlated with mortality, such as age or health of the respondents. Therefore, lacking information on the reason why a former respondent cannot be contacted anymore and thus on the vital status is a huge problem in many longitudinal studies that further increases from wave to wave. Using the Survey of Health, Ageing and Retirement in Europe (SHARE), three methods are implemented in this paper to examine the extent of missing death reports. The first method randomly assigns people with unknown vital status to death. The second method uses mortality rates form life-expectancy tables to extrapolate the expected number of deaths among the panel members with unknown vital status. The third method models deaths from data internal to the survey. The correction methods are compared to the original, uncorrected sample and the implications for analyses of died sample members as well as attrition analyses are explored.
... However, evidence from non-experimental research shows that interview length has either no or a positive relationship with participation, which may be due to a pleasant interaction between interviewer and respondent (Branden et al., 1995). However, only recently, Bristle et al. (2019) found evidence that the interaction with an interviewer is only perceived as pleasant up to a certain point in time. ...
Article
Selective nonresponse can introduce bias in longitudinal surveys. The present study examines the role of cognitive skills (more specifically, literacy skills), as measured in large-scale assessment surveys, in selective nonresponse in longitudinal surveys. We assume that low-skilled respondents perceive the cognitive assessment as a higher burden than higher-skilled respondents because they are more likely to experience negative feelings. We hypothesize that low-skilled respondents are more likely than high-skilled respondents to refuse to participate in a follow-up wave. We analysed data from two assessment surveys in Germany with similar study designs, target populations, and assessment instruments. Results provide support for our hypothesis. Consistently across both surveys, respondents with the lowest literacy skills had a higher probability of refusal than those with the highest literacy skills. This difference persisted even after controlling for several established predictors of nonresponse, including education.
... Our instruments are eligibility for early and statutory (normal) retirement pension. 38 We construct indicators of rounding behaviour in measurements following Korbmacher and Schröder (2013); see also Bristle et al. (2019). We find that retirement has a negative effect on the frequency of fruit or vegetable consumption for men and it has no effect for women. ...
Article
Full-text available
This paper investigates the effect of retirement on healthy eating using data drawn from the Survey of Health, Ageing and Retirement in Europe (SHARE). We estimate the causal effect of retiring from work on daily fruit or vegetable consumption by exploiting policy changes in eligibility rules for early and statutory retirement. Our results show that changes in eating behaviour upon retirement are gender‐specific: retirement induces men to reduce healthy eating; it has no effect on women. We further show that, for men, retirement increases the probability of becoming obese.
Article
Errors in household finance survey data collection can lead to inaccuracies in population estimates. Manual case-by-case revision has traditionally been used to identify and edit potential errors and omissions in the data, such as omitted or misreported assets, income, and debts. Selective editing strategies aim at reducing the editing burden by prioritizing cases through a scoring function. However, the application of traditional selective editing strategies to household finance survey data is challenging due to their underlying assumptions. Using data from the Spanish Survey of Household Finances, we develop a machine learning approach to classify data during the editing phase into cases affected by severe errors and omissions. We compare the performance of several supervised classification algorithms and find that a Gradient Boosting Trees classifier outperforms the competitors. We then use the resulting score to prioritize cases and consider data editing efforts into the choice of an optimal classification threshold.
Article
Although cross-cultural surveys increasingly use open-ended questions to obtain detailed information on respondents’ attitudes, the issue of coding quality is rarely addressed. These questions are always challenging but even more so in multilingual, cross-cultural research contexts as the different survey languages make response coding more difficult and costly. In this paper, we examine coding issues of open-ended questions and the impact of translation on coding results by comparing codings of translated responses (two-step approach with translation and coding) with codings of the same responses in the original languages (one-step approach using bilingual coders). We draw on data from the project CICOM, specifically respondents’ answers in English and Spanish to open-ended questions about the meaning of left and right. Our goal is to determine whether the coding approach makes a difference to data quality and to identify error sources in the process. Positive news is that both coding approaches resulted in good quality data. We identify several error sources related, first, to respondents’ short answers; second, to the translation process; and third, to the coding process. The response context and the cultural background of translators and coders appear to be important.
Article
This paper analyzes the effect of work disability on the job involvement of workers aged 50–65 living in Europe. We elicit a measure of job involvement from a question asking respondents to think about their job and declare whether they would like to retire as early as they can. We exploit objective health indicators and anchoring vignettes to enhance the comparability across individuals of work disability self-assessments. Individuals’ evaluations of their health-related work limitations are found to be mildly affected by justification bias but to depend on individual heterogeneity in reporting behaviour. Work disability significantly reduces the job involvement of workers. After controlling for individual fixed-effects and an extensive set of time-varying covariates, moving from the first to the third quartile of the work disability distribution is associated with a 8% increase (4 percentage points) in the probability of desiring to retire as soon as possible. The effect is larger for blue-collar workers. Justification bias and heterogeneity in reporting behaviour do not alter the magnitude of these effects.
Article
Full-text available
Cet ouvrage présente des avancées récentes dans le traitement statistique des données introduisant plusieurs niveaux d'agrégation. La partie la plus intéressante aborde les modèles biographiques dans une optique multiniveau, mais pose de nombreux problèmes en démographie. Il faut pouvoir disposer d'enquêtes plus détaillées que les enquêtes biographiques habituelles, qui prendraient simultanément en compte les divers contextes sociaux dans les quels les individus vivent.
Article
Full-text available
Cross-classified multilevel models deal with data pertaining to two different non-hierarchical classifications. It is unclear how much interpenetration is needed for a cross-classified multilevel model to work well and to estimate the two higher-level effects reliably. The paper investigates this question and the properties of cross-classified multilevel logistic models under various survey conditions. The effects of different membership allocation schemes, total sample sizes, group sizes, number of groups, overall rates of response and the variance partitioning coefficient on the properties of the estimators and the power of the Wald test are considered. The work is motivated by an application to separate area and interviewer effects on survey non-response which are often confounded. The results indicate that limited interviewer dispersion (around three areas per interviewer) provides sufficient interpenetration for good estimator properties. Further dispersion yields only very small or negligible gains in the properties. Interviewer dispersion also acts as a moderating factor on the effect of the other simulation factors (sample size, the ratio of interviewers to areas, the overall probability and the variance values) on the properties of the estimators and test statistics. The results also indicate that a higher number of interviewers for a set number of areas and a set total sample size improves these properties. © 2016 The Royal Statistical Society and Blackwell Publishing Ltd.
Article
We explore the potential of Bayesian hierarchical modelling for the analysis of cluster randomized trials with binary outcome data, and apply the methods to a trial randomized by general practice. An approximate relationship is derived between the intracluster correlation coefficient (ICC) and the between‐cluster variance used in a hierarchical logistic regression model. By constructing an informative prior for the ICC on the basis of available information, we are thus able implicitly to specify an informative prior for the between‐cluster variance. The approach also provides us with a credible interval for the ICC for binary outcome data. Several approaches to constructing informative priors from empirical ICC values are described. We investigate the sensitivity of results to the prior specified and find that the estimate of intervention effect changes very little in this data set, while its interval estimate is more sensitive. The Bayesian approach allows us to assume distributions other than normality for the random effects used to model the clustering. This enables us to gain insight into the robustness of our parameter estimates to the classical normality assumption. In a model with a more complex variance structure, Bayesian methods can provide credible intervals for a difference between two variance components, in order for example to investigate whether the effect of intervention varies across clusters. We compare our results with those obtained from classical estimation, discuss the relative merits of the Bayesian framework, and conclude that the flexibility of the Bayesian approach offers some substantial advantages, although selection of prior distributions is not straightforward. Copyright © 2001 John Wiley & Sons, Ltd.
Book
This book unifies and extends latent variable models, including multilevel or generalized linear mixed models, longitudinal or panel models, item response or factor models, latent class or finite mixture models, and structural equation models. Following a gentle introduction to latent variable modeling, the authors clearly explain and contrast a wide range of estimation and prediction methods from biostatistics, psychometrics, econometrics, and statistics. They present exciting and realistic applications that demonstrate how researchers can use latent variable modeling to solve concrete problems in areas as diverse as medicine, economics, and psychology. The examples considered include many nonstandard response types, such as ordinal, nominal, count, and survival data. Joint modeling of mixed responses, such as survival and longitudinal data, is also illustrated. Numerous displays, figures, and graphs make the text vivid and easy to read.
Article
Obtaining estimates that are nearly unbiased has proven to be difficult when random effects are incorporated into a generalized linear model. In this paper, we propose a general method of adjusting any conveniently defined initial estimates to result in estimates which are asymptotically unbiased and consistent. The method is motivated by iterative bias correction and can be applied in principle to any parametric model. A simulation‐based approach of implementing the method is described and the relationship of the method proposed with other sampling‐based methods is discussed. Results from a small scale simulation study show that the method proposed can lead to estimates which are nearly unbiased even for the variance components while the standard errors are only slightly inflated. A new analysis of the famous salamander mating data is described which reveals previously undetected between‐animal variation among the male salamanders and results in better prediction of mating outcomes.
Article
An Introduction to Survey Participation. A Conceptual Framework for Survey Participation. Data Resources for Testing Theories of Survey Participation. Influences on the Likelihood of Contact. Influences of Household Characteristics on Survey Cooperation. Social Environmental Influences on Survey Participation. Influences of the Interviewers. When Interviewers Meet Householders: The Nature of Initial Interactions. Influences of Householder-Interviewer Interactions on Survey Cooperation. How Survey Design Features Affect Participation. Practical Survey Design Acknowledging Nonresponse. References. Index.
Article
This paper aims to identify the interviewer characteristics that influence survey cooperation. A multilevel cross-classified logistic model with random interviewer effects is used to account for clustering of households within interviewers due to unmeasured interviewer attributes, and for the cross-classification of interviewers within areas. We find that interviewer confidence and attitudes play an important role in explaining between-interviewer variation in refusal rates. We also find evidence of interaction effects between the interviewer and householder, for example with respect to gender and educational level, supporting the notion of similarity of interviewers and respondents generating higher cooperation. The results are discussed with respect to potential implications for survey practice and design.
Article
A rich and diverse literature exists on the effects that human interviewers can have on different aspects of the survey data collection process. This research synthesis uses the Total Survey Error (TSE) framework to highlight important historical developments and advances in the study of interviewer effects on a variety of important survey process outcomes, including sample frame coverage, contact and recruitment of potential respondents, survey measurement, and data processing. Included in the scope of the synthesis is research literature that has focused on explaining variability among interviewers in these effects and the different types of variable errors that they can introduce, which can ultimately affect the efficiency of survey estimates. We first consider common tasks with which human interviewers are often charged and then use the TSE framework to organize and synthesize the literature discussing the variable errors that interviewers can introduce when attempting to execute each task. Based on our synthesis, we identify key gaps in knowledge and then use these gaps to motivate an organizing model for future research investigating explanations for interviewer effects on different aspects of the survey data collection process.