Content uploaded by Chiara Dal Bianco

Author content

All content in this area was uploaded by Chiara Dal Bianco on Feb 22, 2019

Content may be subject to copyright.

©2018 Royal Statistical Society 0964–1998/19/182003

J. R. Statist. Soc. A (2019)

182,Par t 1,pp. 3–35

The contributions of paradata and features of

respondents, interviewers and survey agencies to

panel co-operation in the Survey of Health, Ageing

and Retirement in Europe

Johanna Bristle,

Max Planck Institute for Social Law and Social Policy, Munich, Germany

Martina Celidoni and Chiara Dal Bianco

University of Padua, Italy

and Guglielmo Weber

University of Padua, Italy, and Institute for Fiscal Studies, London, UK

[Received July 2015. Final revision June 2018]

Summary. The paper deals with panel co-operation in a cross-national, fully harmonized face-

to-face survey. Our outcome of interest is panel co-operation in the fourth wave of the Survey

of Health, Ageing and Retirement in Europe. Following a multilevel approach, we focus on the

contribution of paradata at three levels:ﬁeldwork strategies at the survey agency level, features

of the (current) interviewer and paradata describing respondents’ interview experience from the

previous wave. Our results highlight the importance of respondents’ prior interview experience,

and of interviewers’ quality of work and experience. We also ﬁnd that survey agency practice

matters: daily communication between ﬁeldwork co-ordinators and interviewers is positively

associated with panel co-operation.

Keywords: Attrition; Field practices; Interviewer effects; Panel data; Paradata

1. Introduction

The issue of retention in panel surveys is of paramount importance, particularly when the focus

is on slow, long-term processes such as aging. Lack of retention of subjects in longitudinal

surveys, which is also known as attrition, accumulates over waves and particularly harms the

panel dimension of the data.

Survey participation depends on location, contact and co-operation of the sample unit

(Lepkowski and Couper, 2002). In this paper, we investigate the determinants of panel

co-operation—interview completion given location and contact—in the fourth wave of the Sur-

vey of Health, Ageing and Retirement in Europe (SHARE) given participation in the third wave.

We focus on panel co-operation since location and contact are less problematic in a later panel

wave.

Address for correspondence: Johanna Bristle, Max Planck Institute for Social Law and Social Policy, Amalien-

strasse 33, 80799 Munich, Germany.

E-mail: bristle@mea.mpisoc.mpg.de

4J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

As recommended by the literature on the determinants of non-response behaviour, we exploit

information at different levels: individual and household characteristics, interviewer traits and

survey design features. A contribution of this paper is its use of information that is gathered at

the interviewer level in a harmonized, multicountry survey. A further, novel, contribution lies

in our investigation of the role of survey agency practices and variability.

We use a three-level logit model to estimate the deter minants of retention and the variance that

is attributable to each level: respondent, interviewer and survey agency. This model accounts for

correlation in probabilities of co-operation for respondents who were interviewed by the same

interviewer and interviewers working for the same survey agency. Given the limited number

of survey agencies at the third level, we provide also a simulation exercise to document how

estimates behave in ﬁnite samples similar to the sample that we use.

The multilevel model that we estimate uses survey data as well as additional paradata that

are obtained as a ‘by-product of the data collection process capturing information about that

process’ (Durrant and Kreuter, 2013). In SHARE, paradata are available on all three levels.

Although paradata at the individual or interviewer level have been used in this strand of

the literature, information at the survey agency level has not been taken into account to ex-

plain participation. One possible reason for this gap could be that, in cross-national research,

information at the survey agency level may not be available or harmonized across countries

(Blom et al., 2008) so comparability is limited. SHARE, which provides harmonized informa-

tion on elderly individuals at the European level, collected such data in wave 4. This additional

source of information gives us the opportunity to investigate the nature of non-response also at

the survey agency level.

Our approach is theoretically based on the framework of survey participation by Groves

and Couper (1998), in which the factors that are expected to inﬂuence survey participation are

divided into two major areas: ‘out of researcher control’ and ‘under researcher control’. In this

paper, we are particularly interested in the factors that can be inﬂuenced by the researcher,

namely survey agency ﬁeldwork strategies, the features of the interviewer and the respondent–

interviewer interaction.

We ﬁnd that variables at all three levels affect the probability of retention. Respondent

and interviewer characteristics play an important role. Respondent co-operation decisions are

affected by their previous interview experience: for instance, item non-response in a previous

wave reduces the likelihood of co-operation in a later wave. As far as interviewer character-

istics are concerned, we ﬁnd that previous experience with working as a SHARE interviewer

matters more than sociodemographic characteristics, such as age, gender or education. Fur-

ther, interviewers who perform well on survey tasks that require diligence are more successful

in gaining co-operation. Regarding survey-agency-related controls, we ﬁnd that having contact

with interviewers every day increases the chances of gaining respondents’ co-operation. This

result may highlight the importance of communication between survey agency co-ordinators

and interviewers, but may also point to other factors at the survey agency level that affect re-

spondents’ co-operation (such as the relative importance that the survey agency attaches to

SHARE).

The structure of the paper is as follows. Section 2 reviews the literature and Section 3 presents

the features of the available data with a special focus on paradata and the outcome variable.

Section 4 presents the empirical strategy, Section 5 comments on the empirical results and

Section 6 concludes.

The programs that were used to analyse the data can be obtained from

http://wileyonlinelibrary.com/journal/rss-datasets

Contributions to Panel Co-operation 5

2. Previous ﬁndings

Panel studies are affected by attrition of subjects, which can bias parameter estimates because

of potential differences between those who stay in the panel and those who drop out. It is by

now standard in the literature to conduct exploratory analyses to understand how to prevent

unit non-response during ﬁeldwork. Literature on the determinants of survey participation has

recently proposed the use of paradata to gain a better understanding of response behaviour

(e.g. Kreuter (2013) and Kreuter et al. (2010)). However, even though paradata represent a

rich source of new information, little attention has been paid for instance to indicators such as

keystroke data (Couper and Kreuter, 2013) as well as additional information at higher levels,

e.g. at the country or survey agency level.

High levels of heterogeneity might be explained by differences in survey characteristics, in

population characteristics or in data collection practices. This was highlighted by Blom (2012)

who examined country differences in contact rates in the European Social Survey—a survey that

is similar to SHARE in its attempt to achieve ex ante harmonization across several European

countries, but different from SHARE since it lacks the longitudinal dimension. By conducting

counterfactual analysis, Blom attributed the differences in contact rates to differential survey

characteristics (mostly related to interviewers’ contact strategies), population characteristics and

coefﬁcients. Like Blom (2012), we investigate the drivers of variability at the country level, but

we are interested in panel co-operation—rather than contact—and use multilevel analysis as our

empirical strategy. Most studies using cross-national data refrain from investigating the country

level because of a small number of countries or the unavailability of harmonized information at

this level. An exception is Lipps and Benson (2005), who analysed contact strategies in the ﬁrst

wave of SHARE by using a multilevel model also taking into account the country level but did

not ﬁnd signiﬁcant between-country differences. However, the response process in later waves of

a panel might differ from the response process in the baseline wave because of survey agencies’

accrued organizational experience or respondents’ self-selection into later waves (Lepkowski

and Couper, 2002). An advantage of using the fourth wave of SHARE, as we do, is that we can

exploit additional harmonized information collected at the survey agency level to understand

better whether different ﬁeldwork practices can explain heterogeneity in panel co-operation at

the survey agency level, given a common survey topic. In SHARE, the countries and survey

agencies mostly overlap; however, since in two countries (Belgium and France) more than one

survey agency collected the data, we shall use the term survey agency, instead of country, for

the third (highest) level.

Taking the role of the interviewer into account is vital for attrition analyses in face-to-face

surveys. In the literature, results regarding interviewer continuity across waves are mixed. For

example, Hill and Willis (2001) found a positive strong signiﬁcant association between response

rate and interviewer continuity, Lynn et al. (2014) found that continuity positively affects co-

operation in some situations, whereas other studies (Campanelli and O’Muircheartaigh, 1999;

Nicoletti and Peracchi, 2005; Pickery et al., 2001) have found insigniﬁcant effects. These ﬁndings

have been questioned as not only respondents attrit, but interviewers might attrit non-randomly

from surveys as well (Campanelli and O’Muircheartaigh, 2002). In the multicountry setting of

SHARE, the selection and assignment of interviewers is subject to supervision by the sur-

vey agencies. Although survey guidelines recommend interviewer continuity, we cannot link

interviewers across waves. On the basis of Vassallo et al. (2015), we decided to focus on the

current (wave 4) interviewer. (Whereas Pickery et al. (2001) stated that the previous interviewer

is more relevant, a more recent study (Vassallo et al., 2015) showed that taking into account

both previous and current wave interviewer within a multiple-membership model does not

6J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

improve on the simpler two-level model that controls only for the current wave interviewer

random effect.)

The literature has highlighted that isolating interviewer effects from area effects might be

problematic when there is no fully interpenetrated design, i.e. random assignment of sample units

to interviewers (Campanelli and O’Muircheartaigh, 1999; Durrant et al., 2010; Vassallo et al.,

2015). The lack of interpenetration is likely in face-to-face surveys, such as SHARE, in which the

interviewer generally operates in limited geographical areas. Therefore, if there are geographical

patterns in co-operation, these could appear as interviewer effects. It should be noted that

Vassallo et al. (2015) did not ﬁnd signiﬁcant area effects, after controlling for interviewer and

household level effects in a cross-classiﬁed model, which is in line with ﬁndings by Campanelli

and O’Muircheartaigh (1999) and Durrant et al. (2010). Given the lack of interpenetrated

assignment in SHARE, following standard practice, we include among our controls some area

indicators (living in an urban or rural area) to capture area effects. Unfortunately, more detailed

information about the area where respondents live is not available in waves 3 and 4. (Additional

area characteristics have been collected in wave 5 for all respondents, but in wave 6 only for the

refreshment sample.)

In our analysis, we consider interviewer attributes such as age, gender and experience with the

survey that were collected by the agencies and interviewer work quality indicators (interviewer

average number of contacts and rounding indicators) that we compute. (We construct indicators

of rounding behaviour in measurements following Korbmacher and Schr¨

oder (2013).) In fact,

interviewer sociodemographic characteristics and experience (overall or within a speciﬁc survey)

are typically included when explaining interviewer level variance (West and Blom, 2017). The

literature has also documented that interviewers with higher contact rates achieve higher co-

operation rates (O’Muircheartaigh and Campanelli, 1999; Pickery and Loosveldt, 2002; Blom

et al., 2011; Durrant and D’Arrigo, 2014). On the basis of the literature on ‘satisﬁcing be-

haviour’ in surveys (Krosnick, 1991), the underlying hypothesis concerning interviewer effects

is that those who are diligent in speciﬁc tasks during the interview are more engaged and more

successful in gaining co-operation than are interviewers who show less diligent interviewing be-

haviour. Diligent interviewers are those who fulﬁl their task thoroughly to optimize the quality

of their interviews, whereas less diligent interviewers use ‘satisﬁcing strategies’, such as skipping

introductions or rounding measurements, to minimize effort.

Lugtig (2014) highlighted four mechanisms of attrition at the respondent level, namely shocks

(e.g. moving or health decline), habit (consistent participation pattern), absence of commitment

and panel fatigue. Paradata can especially help in capturing commitment and panel fatigue

to single out respondents who are at risk of future attrition due to non-co-operation. This

can be based on interviewer assessments, e.g. willingness to answer or whether the respondent

asked for clariﬁcation, or directly derived from the interview data, e.g. item non-response. The

latter in particular is a good predictor of participation in later waves. According to the theory

of a latent co-operation continuum (Burton et al., 1999), in fact, item non-response—not provid-

ing valid answers to some questions—is a precursor of unit non-response—not providing any

answers—in the following wave. This theory ﬁnds empirical support in Loosveldt et al. (2002).

The length of interview also contributes to shaping the past interview experience. In

longitudinal surveys, the length of interview in an earlier wave might affect the decision to

participate in later waves. On the one hand, a longer interview can be seen as a burden and

affect co-operation negatively; on the other hand, the length might also measure the respon-

dent’s motivation and commitment to the survey and therefore can have a positive inﬂuence on

co-operation. Findings in the literature concerning effects of interview length on panel attri-

tion in interviewer-administered settings are mixed, with some showing a positive association

Contributions to Panel Co-operation 7

(Fricker et al., 2012; Hill and Willis, 2001) with co-operation and some not ﬁnding any effect

(Lynn, 2013; Sharp and Frankel, 1983). Branden et al. (1995) disentangled the wave-speciﬁc

inﬂuence of interview length by taking the longitudinal perspective into account. They found

that long interviews are positively correlated with co-operation during the ﬁrst waves of a panel,

but the association vanishes in later waves.

3. Data

3.1. Survey of Health, Ageing and Retirement in Europe and sample selection

SHARE is a multidisciplinary harmonized European survey, targeting individuals aged over

50 years and their partners, and represents the principal source of data to describe and in-

vestigate the causes and consequences of the aging process for the European population (see

B¨

orsch-Supan et al. (2013)). SHARE was conducted for the ﬁrst time in 2004–2005 (wave

1) in 11 European countries (Austria, Belgium, Denmark, France, Germany, Greece, Italy,

the Netherlands, Spain, Sweden and Switzerland) and Israel. In the second wave Poland, the

Czech Republic and Ireland joined SHARE and additional refreshment samples were added to

ensure representativeness of the targeted population. Wave 3, called ‘SHARELIFE’, which was

conducted between 2008 and 2009, differed from the standard waves, since it collected the life

histories of individuals who participated in wave 1 or wave 2. The fourth wave of SHARE, which

started in 2011, is a regular wave (see Malter and B ¨

orsch-Supan (2013)).

The regular wave main questionnaire is composed of about 20 modules, each focusing on a

speciﬁc topic, e.g. demographics, mental and physical health, cognitive functions, employment

and pensions. The questionnaire of SHARELIFE differed from the standard waves, since it had

very few questions on the current condition (the variables related to the current condition are

household income, health status, economic status and current income from employment, self-

employment and pensions) but focused on gathering information regarding the life histories of

individuals who participated in wave 1 or wave 2 (Schr¨

oder, 2011). We exploit mainly the third

and the fourth wave of SHARE by investigating co-operation in wave 4 given participation in

SHARELIFE and given contact in wave 4. The two waves are not completely comparable given

the rather special content of the third wave, but the choice was driven mainly by the availability

of paradata. The particular sample deﬁnition that we refer to implies that we must be cautious

when extending our results.

Both standard and retrospective SHARE interviews were conducted via face-to-face, com-

puter-assisted personal interviews (CAPIs). Not every eligible household member was asked to

answer every module of the standard CAPI questionnaire: selected household members served

as family, ﬁnancial or household respondents. These individuals answered questions about chil-

dren and social support, ﬁnancial issues or household features on behalf of the couple or the

household. This means that the length of the questionnaire varied between respondents by de-

sign, which must be taken into account when analysing participation. An advantage of using

SHARELIFE is that the differences between the types of respondents are limited since there is

a distinction only between the ﬁrst and second respondent on the basis of very few questions

on the household’s current economic situation (e.g. household income). In all SHARE waves

there is also the possibility of conducting a shorter proxy interview for cognitively impaired

respondents. A proxy can answer on behalf of the eligible individual for most of the modules.

We describe our sample deﬁnition more precisely in Table 1. The number of individuals who

were interviewed in SHARELIFE is 20106. We do not consider Greece and Ireland as these

countries did not participate in wave 4. We also excluded France as interviewer information was

unavailable and Poland because of a lack of survey agency practices information. We deleted

8J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 1. Sample deﬁnition

Number of observations released in SHARELIFE† 20106

Sample restrictions

Not part of assigned wave 4 sample 144

Household not contacted in wave 4 522

Deceased in wave 4 599

Linkage restrictions

Non-linked with interviewer information 994

Incomplete-data restrictions

Missing data at interviewer level 15

Missing data at respondent level 887

Final number of respondents 16945

Final number of interviewers 643

Final number of survey agencies 11

†Without Greece, Ireland, France and Poland.

144 cases that were not part of the assigned, longitudinal sample for ﬁeldwork wave 4, e.g.

because of legal restrictions or changes in eligibility. We do not consider individuals from the

longitudinal sample whose households were not contacted in wave 4 (522 cases) given that our

focus is on co-operation, and we excluded individuals who died between waves (599 cases).

When linking the various sources of data, interviewer information was not linkable for 5.3% of

the total sample (994 cases). The proportion of non-linked observations exceeds 10% in Austria

and Sweden, but some unresolvable cases remained in all countries. (The sample of non-linked

observations presents higher proportions of singles and women—and a higher average (but

identical median) respondent age. Given the high prevalence of such observations in Austria

and Sweden, we checked that dropping either country from the estimation sample does not affect

parameter estimates in a signiﬁcant way.) Furthermore, we do not have complete information

on interviewers in wave 4 for 15 cases; wave 3 missing data concern 887 individuals, distributed

among all the countries. (Missing infor mation is especially related to questions of the interviewer

module regarding the area and type of building. In this module interviewers must answer a few

questions about the interview situations without the respondent.)

3.2. Collection and preparation of paradata in the Survey of Health, Ageing and

Retirement in Europe

The collection of paradata is greatly facilitated by computer-assisted sample management tools

and interview instruments. In the following section we describe the sources of data in SHARE

and the preparation of the variables that we derive from them.

For sample management SHARE uses a tailor-made sample management system. This pro-

gram is installed on each interviewer’s laptop and enables the interviewers to manage their as-

signed subsample. The success of a cross-national study such as SHARE depends heavily on the

way in which the data are collected in the various countries. Therefore, using a harmonized tool

for collecting interview data as well as contact data is crucial to ensure the comparability of the

results. The sample management system tool enables interviewers to register every contact with

a household or individual respondent and to enter result codes for every contact attempt (e.g. no

contact, contact—try again, or refusal). These data were also used by Lipps and Benson (2005) to

Contributions to Panel Co-operation 9

analyse contact strategies in the ﬁrst wave of SHARE. Among the information that was collected

through the sample management system tool, we use the average number of contacts that inter-

viewers registered before obtaining household co-operation or the ﬁnal refusal. Furthermore,

the sample deﬁnition is partly constructed on the basis of contact information (see Table 1).

While the interview is conducted, additional paradata are collected by tracking keystroke

data. Here, every time a key is pressed on the keyboard of the laptop, this is registered and

stored by the software in a text ﬁle. From these text ﬁles, time stamps at the item level can be

computed. Additionally, the keystrokes record the number of times that an item was accessed,

back-ups, whether a remark was made and the remark itself. We compute the interview length

of wave 3 based on those ﬁles. In contrast with commonly used time stamps at the beginning and

the end of the whole interview, this approach provides a precise and adequate length measure

that is net of longer interruptions of the interview. To control for the potential effect of the length

of the interview on co-operation propensity, we include it and its square term to account for

possible non-linear effects as well. Controlling for the length of the interview helps to take into

account the fact that SHARE interviews vary by design because of the complex structure of the

questionnaire. Additionally, we use keystroke information to construct a variable for interviewer

quality that is used in the robustness section. We ﬁrst compute the median reading time, by

interviewer, for section introductions that are relatively long, such as social network, activities,

ﬁnancial transfers and income from work and pensions. If this value is lower than the country

(and language) 25th percentile in at least one case, then we deﬁne a ‘short introduction’ dummy

variable. This variable should capture interviewers who are likely to skip section introductions.

Furthermore, as paradata at the respondent level, we include information that is derived

from the CAPI interviews in wave 3, in particular the percentage of item non-response to mon-

etary items. The questions that were considered to construct this variable are household income

(HH017), value of the property (AC019), ﬁrst monthly wage for employed individuals (RE021)

or ﬁrst monthly work income (RE023) for self-employed individuals, current wage if the re-

spondent is still in employment (RE027), current income if the respondent is still self-employed

(RE029), pension beneﬁt (RE036), wage at the end of the main job if retired (RE041) and income

at the end of the main job if retired and worked as self-employed (RE043). Such questions on

monetary values can be both sensitive and difﬁcult (Loosveldt et al., 2002; Moore et al., 2000).

The respondent might perceive them as burdensome or uncomfortable to answer. Previous em-

pirical research showed that the item non-response to income questions can predict participation

(Loosveldt et al. 2002; Nicoletti and Peracchi, 2005). The public release of SHARE also contains

a section in which interviewers are asked to evaluate the reluctance of respondents (interviewer

module). Related to this, we include a dummy variable indicating whether the interviewer re-

ported a high level of willingness to answer and whether she asked for clariﬁcation. Furthermore,

information on the area (urban versus rural) is derived from the interviewer module.

Additionally, interviewer information and survey agency ﬁeldwork strategies were gathered

and delivered by the survey agencies for wave 4. The interviewer information includes demo-

graphics (year of birth, education and gender) and previous experience in conducting SHARE

interviews (a dummy that takes value 1 if the interviewer has already participated in at least one

previous wave of SHARE). Interviewers’ level of education is not available for all countries. For

those survey agencies that provided this information, we apply the 1997 ‘International standard

classiﬁcation of education’ (ISCED) to harmonize the country-speciﬁc answers. (We exploit

this information to run robustness analysis with the subsample of agencies that provided the

education information.)

Among interviewer controls, we also add a measure of work quality, following Korbmacher

and Schr¨

oder (2013). We try to capture interviewers’ quality on the basis of the grip strength

10 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

0 500 1000 1500 2000 2500

Frequency

0 10 20 30 40 50 60 70

Fig. 1. Frequency of grip strength values

test that SHARE proposes in every wave. The test consists of measuring respondents’ grip

strength twice for each hand by using a dynamometer. In the CAPI, interviewers are explic-

itly told to record a value between 0 and 100, without rounding numbers to multiples of 5

and 10. ‘Previous waves showed that multiples of 5 and 10 were recorded more than statis-

tically expected’ (Korbmacher and Schr ¨

oder, 2013); in Fig. 1 we report the wave 4 pattern

of grip strength measurement. If interviewers have percentages of multiples of 5 and 10 that

lie outside the 90% conﬁdence interval centred on the statistically expected value of 20.8%,

then the interviewer is not measuring grip strength properly. We identify interviewers who

round too often, by deﬁning a dummy that takes value 1 if the percentage exceeds the up-

per bound of the conﬁdence interval and 0 otherwise. We also generate another dummy vari-

able for those interviewers who do not report enough multiples of 5 and 10 (the percentage

falls short of the lower bound), as they may be strategically concealing inaccurate measure-

ments.

Additional information is gathered at the survey agency level about ﬁeldwork strategies.

Topics that are covered are recruitment, training, contacting respondents, translation, technical

support, interview content, sampling process, management of interviewers and duration of

ﬁeldwork. (Unfortunately information on interviewers’ pay is not available in wave 4 of SHARE.)

Those data are collected mostly by means of open-ended questions, but some questions have a

drop-down list. Open questions are difﬁcult to handle within a multicountry framework. For this

reason we focus on questions with standard answering options that show some variability. We

consider especially the following questions: ‘Who decides which project is prioritized, assuming

that interviewers work on several projects simultaneously?’ with ‘interviewer, agency or both’ as

possible answers and ‘How often are you in contact with your interviewers about the SHARE

study?’ with the following answering options: ‘less than once a month, once a month, several

times a month, once a week, several times a week or every day’. We deﬁne two variables:

Contributions to Panel Co-operation 11

Tab le 2. Descriptive statistics of the variables at the respondent level (ND16945)†

Variable Mean Standard Minimum Maximum Description

deviation

Co-operation 0.84 0.37 0 1 Co-operation in wave 4 (outcome)

Female 0.56 0.50 0 1 Gender (reference: male)

Age 66.75 9.60 34 100 Age of respondent in years

Being in poor health 0.38 0.48 0 1 Self-reported poor health

Any proxy 0.06 0.24 0 1 A proxy helped in answering the

questionnaire

Single 0.23 0.42 0 1 Marital status

Years of education 10.74 4.47 0 25

Household income—1st

quartile

0.32 0.47 0 1 Household income, 1st quartile by

country

Household income—2nd

quartile

0.24 0.42 0 1 Household income, 2nd quartile

by country

Household income—3rd

quartile

0.27 0.44 0 1 Household income, 3rd quartile

by country

Working 0.30 0.46 0 1 If Rdeclares to be employed or

self-employed

Living in an urban area 0.23 0.42 0 1 Small town or rural area

(reference: urban)

Living in a (semi-)detached

house

0.70 0.46 0 1 Living in a (semi-)detached house

(reference: ﬂat)

Interrupted response pattern 0.06 0.24 0 1 Interviewed in wave 1 and wave 3

but not in wave 2

Item non-response to

monetary questions

0.20 0.28 0 1 Proportion of item non-response

to monetary items in wave 3

Length of interview 0.90 0.37 0.26 2.54 Length of interview in wave 3

(in hours)

Willingness to answer 0.93 0.26 0 1 Willingness to answer in wave 3

Did not ask for clariﬁcation 0.84 0.37 0 1 Did not ask for clariﬁcation in

wave 3

†Data: SHARELIFE release 6.0.0, SHARE wave 4 release 6.0.0 and SHARE paradata wave 3 and 4.

priority agency, that takes value 1 if the survey agency decides the priority of projects (four

out of 11 survey agencies do) and daily contact which equals 1 if the survey agency has contact

with the interviewers daily (two out of 11 survey agencies do). We cannot differentiate the

direction of the communication between agency and interviewer—whether agencies check on

interviewers frequently or whether interviewers contact the agency regularly (with questions or

for reporting) cannot be distinguished. An overview of all the variables that were used for the

analysis with descriptive statistics can be found in Tables 2–4.

3.3. Attrition and co-operation in Survey of Health, Ageing and Retirement in Europe

wave 4

After describing the features of SHARE and the paradata that were used, we present in greater

detail the response behaviour patterns in wave 4 for those who participated in SHARELIFE:

the sample in which we are interested. Our sample of analysis differs slightly from the panel

sample because we do not consider those who were interviewed in wave 1 or 2 who did not

participate in wave 3 (SHARELIFE).

The standard distinction in the survey participation process is in terms of location, contact

and co-operation (Lepkowski and Couper, 2002):

12 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 3. Descriptive statistics of the variables at the interviewer level (ND643)†

Variable Mean Standard Minimum Maximum Description

deviation

Age 55.07 11.52 19 79 Age of interviewer in years

Female 0.63 0.48 0 1 Gender (reference: male)

Experience 0.68 0.47 0 1 Interviewer’s experience with

working on previous

SHARE waves

Contacts 2.41 0.73 0.20 7.11 Interviewer-speciﬁc mean of

contact with household

until co-operation or refusal

Rounding to a multiple of 5

for grip strength measure

(too many)

0.35 0.48 0 1 If the interviewer’s percen-

tage of rounding is below

or above respectively the

lower or upper cut-off of the

90% conﬁdence interval

centred near the statistically

expected value of 20.8%

Rounding to a multiple of 5

for grip strength measure

(too few)

0.03 0.16 0 1

Short introductions 0.52 0.50 0 1 Interviewer has at least 1 short

introduction (i.e. time

recoded lower than a

country-speciﬁc median)

Interviewer education

(ISCED 5–6)

0.37 0.48 0 1 Interviewer has tertiary

education (restricted sample)

†Data: SHARE wave 4 release 6.0.0 and SHARE interviewer information wave 4.

Tab le 4. Descriptive statistics of the variables at the agency level (ND11)†

Variable Mean Standard Minimum Maximum Description

deviation

Priority agency 0.36 0.48 0 1 Agency decides the priority of

projects

Daily contact 0.18 0.38 0 1 Agency monitors and has

contact with the interviewers

daily

†Data: SHARE agency information wave 4.

(a) location of the sample unit means ﬁnding geographically eligible individuals at a given

address,

(b) contact means reaching an eligible sample unit by telephone or face-to-face visits and

(c) co-operation is the completion of the interview.

Given that step (a) is usually less problematic in a panel (Lepkowski and Couper, 2002) and we

cannot test it, the ﬁnal response rate will be the product of the contact and co-operation rates,

at least in simpliﬁed terms.

Kneip (2013) reported household contact rates for the panel sample of SHARE wave 4

that are consistently above 90% with an average of about 95% across all countries, whereas

Contributions to Panel Co-operation 13

0

1

0.2

0.4

0.6

0.8

co-operation rate

AT BE−Fr BE−Fl CH CZ DE DK ESIT NL SE

Fig. 2. Propor tions of co-operation in wave 4 by survey agency ( , 95% conﬁdence intervals): AT, Austria;

BE–Fr, Belgium–Wallonia; BE–Fl, Belgium–Flanders; CH, Switzerland; CZ, Czech Republic; DE, Germany;

DK, Demark; ES, Spain; IT, Italy; NL, the Netherlands; SE, Sweden

household co-operation, which varies between about 60% and about 90%, shows greater varia-

tion across countries. Hence, the retention rates, which combine contact and co-operation, vary

between 56% and about 90%. (All the rates that were calculated by Kneip (2013) are constructed

according to American Association for Public Opinion Research standards.)

This highlights that establishing contact was not an issue in the panel sample for most coun-

tries and non-contact seems to be a very limited phenomenon compared with other surveys,

such as the European Community Household Panel, for which Nicoletti and Peracchi (2005)

analysed participation, modelling contact and co-operation as sequential events. In our case,

the very limited number of individuals in non-contacted households (2.6%) leads us to ignore

the contact phase and to focus exclusively on co-operation, instead.

Fig. 2 presents the percentage of contacted individuals who co-operated in wave 4.

Fig. 2 highlights some heterogeneity among survey agencies with rather high co-operation rates

(85% or more) in Switzerland and Italy and lower co-operation rates, below 80%, in the Czech

Republic, Germany and Sweden. (These numbers are our own calculations based on our sample

restrictions. For the ofﬁcial rates, refer to Kneip (2013).)

4. Empirical strategy

We estimate a multilevel logit model to investigate correlates of subject co-operation while

accounting for correlations in probabilities between respondents. (We estimate the multilevel

logit model with the Stata command melogit with mode curvature adaptive Gauss–Hermite

quadrature integration methods. The estimation results are stable increasing the number of in-

tegration points.) This estimation strategy speciﬁes the hierarchical structure of the data and

enables us to avoid underestimation of standard errors and therefore incorrect inference (Couper

and Kreuter, 2013; Goldstein, 2011). Given that we are interested in understanding how dif-

ferent levels contribute to explaining co-operation, we start by estimating a random-intercept

model (null model). We then enrich this baseline model speciﬁcation by stepwise inclusion of

14 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

covariates at the individual, interviewer and survey agency level. This bottom-up procedure has

the advantage of keeping the model simple (Hox, 2010). Our outcome of interest is co-operation,

denoted by yij k , which takes the value 1 if respondent iinterviewed by interviewer jof survey

agency kparticipates in wave 4 conditionally on having participated in wave 3.

The null model can be speciﬁed as

logit.pij k |β0,ujk,vk/=β0+ujk +vk.4:1/

and the values of y, conditional on the random components, are independent draws from a

Bernoulli random variable with probabilities pij k , i.e. yij k |ujk ,vk∼Bernoulli.pij k /.

In equation (4.1) the two random terms ujk and vkare interviewer-speciﬁc and survey-agency-

speciﬁc random effects, with ujk ∼N.0; σ2

u/and vk∼N.0; σ2

v/respectively (Skrondal and Rabe-

Hesketh, 2004). In a logit model the error variance at the ﬁrst level, L2

e,isﬁxedtoπ2=3, to ﬁx the

scale (Rabe-Hesketh and Skrondal, 2005). Thus, in the multilevel extension no level 1 variance

will be estimated.

We then compare models 1–4, in which the covariates on the three different levels are intro-

duced in a stepwise procedure to the null model to understand the role of each group of variables

in reducing heterogeneity at different levels.

The ﬁrst model speciﬁcation (model 1) includes a set of controls for individual level socio-

demographic characteristics xijk . Among these variables we include SHARELIFE information

on demographics, such as age, gender, years of education (and its square), marital status, em-

ployment status, health status (including a control for proxy interview), controls for household

income (dummy variables for the top three equivalent household income quartiles), a dummy

taking value 1 if the respondent lives in a detached or semi-detached house to control for the

type of residential building, and a binary indicator for living in an urban or rural area to capture

area effects. Although additional area-related controls would be desirable in the absence of in-

terpenetrated assignment, further information about the area where respondents live is available

only from wave 5 onwards.

In the second model speciﬁcation (model 2) we add a set of paradata indicators .zij k /at

the individual level. We include a dummy variable controlling for interrupted participation in

previous waves, in particular whether the individual was interviewed only in wave 1 (but not

in wave 2). To account for the inﬂuence of previous interview duration, the wave 3 interview

length in hours and its square are added. At this stage we also include the percentage of item

non-response to monetary questions, the willingness to answer and whether the respondent

asked for clariﬁcation.

In the third model speciﬁcation (model 3) we include controls at the interviewer level, s

jk,

speciﬁcally interviewer age and gender, interviewer experience and the average number of con-

tacts per household registered by the interviewer (before the interview or the ﬁnal refusal). We

also include interviewer quality indicators: a dummy that identiﬁes the interviewers who round

least and another dummy for the interviewers who round most on grip strength measurement.

Finally, model 4 controls for a survey agency level covariate, tk, indicating daily communication

between the interviewers and the survey agency.

The complete model, model 4, is speciﬁed as

logit.pij k |β,ujk,vk/=β0+β

1xij k +β

2zij k +β

3s

jk +β4tk+ujk +vk.4:2/

where the xij k - and zij k -vectors are individual level sociodemographic and paradata controls, s

jk

is a vector of interviewers’ covariates and tkis a survey agency control.

As already pointed out, in the logistic model the variance of the lowest level residuals is

ﬁxed at a constant. The main consequence is that in each of the models the underlying scale

Contributions to Panel Co-operation 15

is standardized to the same standard distribution, meaning that the residual variance cannot

decrease when adding controls to the model. Moreover, the value of the regression coefﬁcients

that are associated with the controls included and the value of the higher level variances are

rescaled. As a consequence, it is not possible to compare the null model parameters with the

following enriched model speciﬁcations or to investigate how variance components change.

Hox (2010) extended the rescaling procedure of Fielding (2004) to the multilevel setting and

suggested the construction of scaling factors to be applied to parameters of the ﬁxed part and

random effects to make the changes in these variables directly interpretable. In the case of a

multilevel logistic regression model, the scale correction factor is given by √.σ2

0=σ2

m/for the

parameters of the ﬁxed part and by σ2

0=σ2

mfor variance components. The numerator is the total

variance of the null model .σ2

0=σ2

e+σ2

u+σ2

v/and the denominator is the total variance of model

m.m=1, :::,4/including the ﬁrst-level predictor variables, σ2

m=σ2

F+σ2

e+σ2

u+σ2

v=σ2

F+σ2

0,

with σ2

Fthe variance of the linear predictor of model mobtained by using the coefﬁcients of the

predictors of the ﬁxed part of the equation.

One important issue when dealing with multilevel models is to assess the accuracy of model

parameter estimates, which is inﬂuenced both by the number of observations within groups and

by the number of groups. Given our model formulation, the former is not a relevant issue at the

third level but it could be at the second level: for some interviewers the number of interviews

is particularly low. We address this issue in Section 5.3 by restricting our analysis only to in-

terviewers with at least six interviews. Regarding the number of groups, the second level has a

sufﬁciently high number of interviewers to ensure accuracy of parameter estimates. However, we

might have inaccurate results due to the low number of survey agencies (our third-level units).

We address this problem with a simulation study to understand the ﬁnite sample behaviour of

estimates from a three-level logit model when the hierarchical structure of the data is similar to

the structure of our sample of analysis. Results and discussion are presented in Appendix A.

5. Results

5.1. Predictors from multilevel analysis

We report in Table 5 the estimated coefﬁcients for the stepwise model speciﬁcations, in which we

add respondent, interviewer and survey agency controls. The effects for each set of variables are

described in the following subsection. We comment mainly on our preferred model speciﬁcation,

i.e. the complete-model speciﬁcation reported in the last column.

As in Durrant and Steele (2009), we comment on our results while referring to some socio-

psychological concepts and theories that have been proposed in the literature, bearing in mind

that there is an imperfect match between theoretical constructs and variables used.

Table 5 shows that the respondent characteristics are highly predictive of co-operation in wave

4. Both gender and age inﬂuence co-operation in wave 4. According to our estimates, age has

a non-linear effect on the probability of co-operation. Both regressors age and age squared are

statistically signiﬁcant: up to about 68 years of age there is a positive association after which it

becomes negative—this is controlling for health conditions. Previous research found lower rates

of participation among the elderly and interpreted this result as support of the social isolation

theory (Krause, 1993). Individuals might decide to underuse their social support network be-

cause they are embarrassed or stigmatized or they mayreject aid from others because they feel un-

comfortable when assistance is provided. Isolation might translate also into a lack of survey par-

ticipation behaviour and may explain the negative age effect that we ﬁnd for older respondents.

If the respondent reported being in poor health in wave 3, this has a negative and statistically

signiﬁcant effect on the probability of co-operation. This is not surprising but at the same

16 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 5. Estimated multilevel models including respondent, interviewer and agency characteristics (depen-

dent variable: co-operation)†

Variable Results for Results for Results for Results for Results for

model 0, model 1, model 2, model 3, model 4,

intercept only respondent respondent interviewer agency

paradata

Respondent characteristics

Female 0.088‡ 0.099§ 0.099§ 0.100§

(0.048) (0.049) (0.049) (0.049)

Age 0.183§§ 0.170§§ 0.171§§ 0.171§§

(0.029) (0.029) (0.029) (0.029)

Age squared −0:001§§ −0:001§§ −0:001§§ −0:001§§

(0.000) (0.000) (0.000) (0.000)

Being in poor health −0:218§§ −0:195§§ −0:195§§ −0:194§§

(0.050) (0.051) (0.051) (0.051)

Single 0.182§§ 0.212§§ 0.212§§ 0.211§§

(0.068) (0.069) (0.069) (0.069)

Any proxy −0:316§§ −0:193§ −0:196§ −0:194§

(0.095) (0.097) (0.097) (0.097)

Years of education 0.047§ 0.039‡ 0.040‡ 0.045§

(0.021) (0.021) (0.021) (0.021)

Years of education squared −0:002§§ −0:002§§ −0:002§§ −0:003§§

(0.001) (0.001) (0.001) (0.001)

Household income—1st 0.164§ 0.236§§ 0.236§§ 0.234§§

quartile (0.077) (0.078) (0.078) (0.078)

Household income—2nd 0.403§§ 0.383§§ 0.378§§ 0.376§§

quartile (0.077) (0.078) (0.078) (0.078)

Household income—3rd 0.146‡ 0.157§ 0.158§ 0.153§

quartile (0.077) (0.078) (0.078) (0.078)

Living in a (semi-)detached 0.283§§ 0.300§§ 0.296§§ 0.301§§

house (0.057) (0.058) (0.058) (0.058)

Working −0:105 −0:066 −0:065 −0:065

(0.067) (0.067) (0.067) (0.067)

Living in an urban area 0.021 0.020 0.047 0.042

(0.073) (0.074) (0.073) (0.073)

Paradata at the respondent level

Interrupted response pattern −0:991§§ −0:973§§ −0:977§§

(interviewed in wave 1 but (0.083) (0.083) (0.083)

not in wave 2)

Item non-response to −0:521§§ −0:527§§ −0:527§§

monetary questions (0.089) (0.089) (0.088)

Length of interview (h) 1.170§§ 1.149§§ 1.147§§

(0.274) (0.273) (0.272)

Length of interview squared (h) −0:368§§ −0:361§§ −0:362§§

(0.114) (0.114) (0.114)

Willingness to answer 0.444§§ 0.441§§ 0.451§§

(0.090) (0.090) (0.090)

Did not ask for clariﬁcation 0.257§§ 0.261§§ 0.264§§

(0.068) (0.068) (0.068)

Interviewer characteristics (wave 4)

Age −0:007 −0:004

(0.005) (0.005)

Female 0.072 0.060

(0.105) (0.104)

Experience (previous SHARE 0.627§§ 0.642§§

waves) (0.113) (0.109)

(continued)

Contributions to Panel Co-operation 17

Tab le 5 (continued )

Variable Results for Results for Results for Results for Results for

model 0, model 1, model 2, model 3, model 4,

intercept only respondent respondent interviewer agency

paradata

Interviewer characteristics (wave 4)

Contacts −0:161§ −0:134‡

(0.068) (0.069)

Rounding to a multiple of 5 for −0:216§ −0:238§

grip strength measure (too (0.105) (0.105)

many)

Rounding to a multiple of 5 for −0:788§§ −0:768§§

grip strength measure (too (0.230) (0.228)

few)

Agency control variables

Daily contact 0.714§§

(0.153)

Constant 1.872§§ −4:651§§ −5:445§§ −5:009§§ −5:388§§

(0.098) (1.023) (1.043) (1.079) (1.075)

σ2

u(interviewer level) 1.174 1.184 1.198 1.000 1.006

σ2

v(agency level) 0.073 0.093 0.086 0.089 0.007

N16945 16945 16945 16945 16945

†Standard errors are in parentheses; p-values for ﬁxed effect covariates signiﬁcance refer to Wald-type tests.

‡p<0:05.

§p<0:01.

§§p<0:001.

time is inconvenient for a survey on health and aging. In case of very bad health conditions,

SHARE allows proxy interviews: the indicator any proxy highlights a negative association with

co-operation suggesting again that health is an important determinant of attrition. We shall

investigate later whether the health effect changes with interviewer attributes.

The literature ﬁnds that single-person households are less likely to co-operate and explains this

result referring to social isolation theory (Goyder, 1987; Groves and Couper, 1998). According

to this theory alienation or isolation from society are predictors of non-response. We ﬁnd the

opposite in our analysis of retention: compared with couples, singles who have already co-

operated in past waves are more likely to participate in the next wave.

In the survey research literature, according to the theory of social exchange (Goyder, 1987;

Groves et al., 1992), socio-economic status has a non-linear effect on co-operation: low and

high socio-economic status groups are less likely to co-operate than average. We include four

indicators of socio-economic status: years of education (and its square), household income

quartile dummies, living in a (semi-)detached house as a proxy for wealth, and employment

status. Education might be positively correlated with retention as those with higher education

might appreciate the value of research more (Groves and Couper, 1998). Years of education is

statistically signiﬁcant and has a non-linear effect on retention. Income quartiles are signiﬁcant

as well. Compared with individuals having high household income (fourth quartile), wave 3

respondents with lower household income are more likely to participate in wave 4. Also in this

case we ﬁnd a non-linear effect (the second quartile dummy has the largest estimated coefﬁcient).

Living in a detached or semi-detached house increases the chances of co-operation in wave 4. (As

missing information is especially related to questions of the IV module regarding the area and

18 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

type of building, we run our analysis including those observations by adding binary indicators

for missing information. The results, which are available on request, do not change.) This is in

line with previous research that found lower co-operation among people living in ﬂats (Goyder,

1987; Groves and Cooper, 1998) and may suggest the presence of a wealth effect on retention.

Socio-economic conditions seem to be relevant for co-operation in a later panel wave.

Compared with individuals in a non-working condition (retired, unemployed, sick or disabled

and homemakers), workers do not have a statistically different probability of co-operating in

the next wave. (We obtained similar results when including additional non-working condition

dummies (retired, unemployed and disabled).) It seems that work-related time constraints do not

matter once individuals have enrolled in the panel, which is different from what has been found

by Durrant and Steele (2009) using cross-sectional data. Time constraints theory considers the

fact that a rather long and detailed questionnaire—that has the advantage of collecting a rich

set of information—requires quite some time to answer all the questions. This might create

problems when respondents are still in employment and must be kept in mind when examining

statistics such as employment rates later in life, for which survey participation or even attrition

could be an issue. Other factors, such as the characteristics of the area where the respondent

lives, might play a role in predicting (continued) co-operation. Living in an urban area in our

case is not signiﬁcant.

In addition to this standard set of respondent characteristics, we use respondent level para-

data. Compared with continuous participation, individuals with interrupted response patterns

are less likely to participate again. As interrupted participation might signal a subgroup of

respondents who are difﬁcult to retain, we report in Section 5.3 how the effect of such an in-

dicator changes when interacted with interviewer attributes (such as experience with SHARE

ﬁeldwork). We can also observe that a very good or good level of willingness to answer and not

having asked for clariﬁcation during the interview in wave 3 are highly signiﬁcant predictors

of higher probability of co-operation in wave 4. As already explained earlier, we show that the

percentage of missing information in monetary amount questions is a signiﬁcant predictor of

co-operation failure in wave 4. This result is consistent with the theory of a latent co-operation

continuum (Burton et al., 1999).

As paradata at the respondent level, we also use the length of the whole interview in wave 3.

Both the length of the interview in hours and its square are highly statistically signiﬁcant,

showing an inverse-u-shaped effect; therefore, interview length has a positive association with

co-operation up to a certain point, roughly 1.6 h, when the probability of co-operating starts

to decrease. Pace is an alternative way of capturing the potential burden that is experienced by

respondents in the previous wave. Here, we deﬁne pace as the ratio of length to the number

of items asked and thus accounts for differences in instrument length by respondent type (for

applications see Korbmacher and Schr ¨

oder (2013) and Loosveldt and Beullens (2013)). In the

case of SHARELIFE the number of items asked is similar across respondent types, and this may

explain why the results do not substantially change when we replace length with pace. Using

interview pace rather than length does not change our results (estimates are available on request).

This is in line with previous ﬁndings and supports the argument that longer interviews are—at

least up to a certain point—a proxy for pleasant talkative interviews instead of a respondent

burden. We should note that interview length measures the combined interviewer–respondent

interaction and is therefore not exogenous to the interview process (Watson and Wooden, 2009).

To identify the causal effect of interview length one would probably require an experimental

setting, which is out of scope for this paper.

To understand the variation at the interviewer level, we add some sociodemographic con-

trols, age and gender, a variable indicating experience in previous SHARE waves, the average

Contributions to Panel Co-operation 19

number of contacts per interviewer and two dummies capturing interviewers’ quality based on

grip strength rounding behaviour. Age and gender do not signiﬁcantly affect co-operation in

wave 4, whereas experience does play a role; more precisely, having experience with previous

SHARE waves increases the likelihood of retaining respondents in the survey. Results con-

cerning interviewer experience are consistent over different studies, leading to the conclusion

that experience is positively associated with gaining co-operation (West and Blom, 2017). (See

Groves and Couper (1998), Hox and de Leeuw (2002), J¨

ackle et al. (2013) and Lipps and Pollien

(2011).) However, it is still unclear what drives the effect, i.e. whether this is a selection effect

(bad interviewers quit—J¨

ackle et al. (2013)) or a learning effect (interviewers improve their skills

in approaching resistance over time—Lemay and Durand (2002)). Durrant et al. (2010) showed

that experience in terms of skill level acquired matters more than the time spent on the job.

Our results are partly in line with the previous ﬁndings by J¨

ackle et al. (2013) on the effect of

experience, measured in years working for the survey agency. Regarding the average number of

contacts, we see that an interviewer who on average registers many contacts is less likely to gain

co-operation. A high average number of contacts can be an indicator of interviewer quality, i.e.

such interviewers are less persuasive, or it can be seen as a measure of a workload complexity,

as interviewers with difﬁcult case-loads end up trying more times.

It can also be noted that the two variables measuring interviewer quality in terms of diligent in-

terviewing behaviour are signiﬁcant, with signs as predicted previously. If interviewers’ rounded

grip strength scores more or less than average in wave 3, then gaining co-operation in wave 4 is less

likely than in cases in which the rounding percentage is as expected. This ﬁnding is in accordance

with Korbmacher and Schr¨

oder (2013) on consent to record linkage. Whereas rounding too of-

ten is a clear indication of poor compliance to quality standards, rounding too little is probably

due to interviewers strategically avoiding multiples of 5 to prevent being accused of cheating.

The ﬁnal set of covariates in Table 5 is related to harmonized information that is collected

at the survey agency level to gain knowledge on the correlation between survey agency strate-

gies and co-operation. In this model speciﬁcation we consider the variable daily contact that

captures the frequency of communication between the survey agencies and their interviewers.

We ﬁnd that having daily contact with interviewers increases the chances of obtaining the co-

operation of respondents. This result hints at the importance of communication between survey

agency co-ordinators and interviewers to conduct surveys successfully. We report in Table 6 a

model speciﬁcation (the last column) in which both three-level variables (priority agency and

daily contact) are included among controls together with the two model speciﬁcations (the sec-

ond and third columns) in which the three-level predictors are instead included one at a time. (We

consider whether the priority of the projects is decided by the survey agency compared with situ-

ations in which interviewers can totally or partly choose how to organize their work. This can be

seen as a variable capturing the extent to which interviewers are autonomous and free to choose

between several projects on which they are currently working (e.g. working on SHARE or work-

ing on another survey on a speciﬁc day).) priority agency is never signiﬁcant. Degrees-of-freedom

considerations lead us to be parsimonious in level 3 speciﬁcation and therefore we decided not to

include priority agency in the main speciﬁcation. (Our simulations show that parsimony is a key

issue to reduce bias in the estimation of level 3 variance; see Appendix A for additional details.)

5.2. Variance component analysis

Table 7 reports the results of various speciﬁcations of random-intercept models, without and

with covariates, in terms of estimated variance components, intraclass correlations and model

ﬁt statistics. (A similar approach can be found in Blom et al. (2011) about interviewer effects on

non-response in the European Social Survey. Although the approach is similar, we refrain from

20 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 6. Estimated multilevel models including alternative sets of agency characteristics (dependent vari-

able: co-operation)†

Results including at the third level

daily contact priority agency Both controls

Respondent characteristics

Female 0.100§ 0.099§ 0.100§

(0.049) (0.049) (0.049)

Age 0.171§§ 0.171§§ 0.171§§

(0.029) (0.029) (0.029)

Age squared −0:001§§ −0:001§§ −0:001§§

(0.000) (0.000) (0.000)

Being in poor health −0:194§§ −0:195§§ −0:192§§

(0.051) (0.051) (0.051)

Single 0.211§§ 0.211§§ 0.210§§

(0.069) (0.069) (0.069)

Any proxy −0:194§ −0:195§ −0:192§

(0.097) (0.097) (0.097)

Years of education 0.045§ 0.041‡ 0.047§

(0.021) (0.021) (0.021)

Years of education squared −0:003§§ −0:002§§ −0:003§§

(0.001) (0.001) (0.001)

Household income—1st 0.234§§ 0.236§§ 0.233§§

quartile (0.078) (0.078) (0.078)

Household income—2nd 0.376§§ 0.378§§ 0.375§§

quartile (0.078) (0.078) (0.078)

Household income—3rd 0.153§ 0.157§ 0.150‡

quartile (0.078) (0.078) (0.078)

Living in a (semi-)detached 0.301§§ 0.297§§ 0.303§§

house (0.058) (0.058) (0.058)

Working −0:066 −0:065 −0:065

(0.067) (0.067) (0.067)

Living in an urban area 0.042 0.048 0.044

(0.073) (0.073) (0.073)

Paradata at the respondent level

Interrupted response pattern −0:977§§ −0:972§§ −0:975§

(interviewed in wave 1 but (0.083) (0.083) (0.083)

not in wave 2)

Item non-response to −0:527§§ −0:526§§ −0:527§§

monetary questions (0.088) (0.089) (0.088)

Length of interview (h) 1.147§§ 1.152§§ 1.151§§

(0.272) (0.273) (0.272)

Length of interview squared (h) −0:362§§ −0:362§§ −0:363§§

(0.114) (0.114) (0.114)

Willingness to answer 0.451§§ 0.441§§ 0.450§§

(0.090) (0.090) (0.090)

Did not ask for clariﬁcation 0.264§§ 0.260§§ 0.264§§

(0.068) (0.068) (0.068)

Interviewer characteristics (wave 4)

Age −0:004 −0:007 −0:004

(0.005) (0.005) (0.005)

Female 0.060 0.075 0.064

(0.104) (0.105) (0.104)

Experience (previous SHARE 0.642§§ 0.611§§ 0.608§§

waves) (0.109) (0.114) (0.113)

(continued)

Contributions to Panel Co-operation 21

Tab le 6 (continued )

Results including at the third level

daily contact priority agency Both controls

Interviewer characteristics (wave 4)

Contacts −0:133‡ −0:161§ −0:128‡

(0.069) (0.068) (0.070)

Rounding to a multiple of 5 for −0:239§ −0:219§ −0:247§

grip strength measure (too (0.105) (0.105) (0.105)

many)

Rounding to a multiple of 5 for −0:769§§ −0:785§§ −0:750§§

grip strength measure (too (0.228) (0.230) (0.229)

few)

Agency control variables

Daily contact 0.714§§ 0.689§§

(0.153) (0.141)

Priority decided by survey 0.180 0.136

agency (0.211) (0.116)

Constant −5:389§§ −5:062§§ −5:433§§

(1.075) (1.081) (1.074)

σ2

u(interviewer level) 1.006 1.000 1.008

σ2

v(agency level) 0.007 0.081 0.001

N16945 16945 16945

†Standard errors are in parentheses; p-values for ﬁxed effect covariates signiﬁcance refer to Wald-type tests.

‡p<0:05.

§p<0:01.

§§p<0:001.

comparing the ﬁndings across SHARE and the European Social Survey here. Non-response

processes can differ substantially between cross-sectional co-operation and co-operation in a

later wave of a panel.) The deﬁnitions of level 2 (ICCj) and level 3 (ICCk) intraclass correlations

in a three-level logit model are provided in Appendix A.

Looking at the intraclass correlations, we note that survey agencies contribute about 1.6% of

the variation, whereas interviewers account for about 25.9% (model 0 in Table 7). On the basis

of the adjusted likelihood ratio test, we reject the null that the third-level variance component

is 0. The test statistic takes value 10.90, and it is asymptotically distributed as a mixture of

χ2- with 0 and χ2-distributions with 1 degree of freedom (Self and Liang, 1987). The intraclass

correlations in Table 7 suggest that most (72.5%, i.e. 100(1 −ICCj−ICCk) in model 0) of the

variation in co-operation is at the individual level.

Table 7 also reports the Akaike information criterion values AIC as a measure of goodness of

ﬁt for each successive model speciﬁcation. Reductions in AIC show improvements in the model

ﬁt. An examination of the log-likelihoods yields similar conclusions, whereby the full model is

to be preferred. The likelihood ratio test shows that adding respondent level paradata improves

the model signiﬁcantly and reduces the scaled variance at the respondent level by 3%. (The per-

centage change in the scaled variance at the respondent level is deﬁned as (3.125 −3.226)/3.125,

following the approach of Couper and Kreuter (2013). Percentage changes in the other scaled

variance components are computed accordingly.) If we compare model 2 and model 3, in which

22 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 7. Estimated variance components, intraclass correlations and model ﬁt statistics for various model

speciﬁcations of the multilevel models of co-operation†

Variance component Results for Results for Results for Results for Results for

model 0, model 1, model 2, model 3, model 4,

intercept only respondent respondent interviewer agency

paradata

Not scaled

σ2

e(individual level) 3.29

σ2

u(interviewer level) 1.174 1.184 1.198 1.000 1.006

σ2

v(agency level) 0.073 0.093 0.086 0.089 0.007

Scaled

σ2

e(individual level) 3.29 3.226 3.125 3.048 3.031

σ2

u(interviewer level) 1.174 1.161 1.138 0.926 0.927

σ2

v(agency level) 0.073 0.091 0.082 0.082 0.006

Intraclass correlation (scaled variances)

ICC

j(interviewer level) 0.259 0.260 0.262 0.228 0.234

ICCk(agency level) 0.016 0.020 0.019 0.020 0.002

Log-likelihood −6917:251 −6835:011 −6694:445 −6667:654 −6661:432

Likelihood ratio test against 164.48 281.13 53.58 12.44

previous column model (14; 0.000) (6; 0.000) (6; 0.000) (1; 0.000)

(degrees of freedom; p-value

of likelihood ratio test)

Model ﬁt statistic AIC 13840.5 13704.02 13434.89 13393.31 13382.86

†Observations: 16945 respondents, 643 interviewers, 11 agencies; ICC, intraclass correlation; AIC, Akaike infor-

mation criterion.

we introduce interviewer characteristics, it can be seen that this set of interviewer level ﬁxed ef-

fects accounts for a modest proportion of the variation at that level. Comparing the variance σ2

u

between model 2 and model 3, we see that about 19% of the variation is captured by interviewer

age, gender, experience, average number of contacts and rounding behaviour. The likelihood

ratio test reveals that adding interviewer characteristics as predictors of co-operation results in

a statistically signiﬁcant improvement in model ﬁt (p<0:0001).

Finally, in model 4 we add survey agency ﬁeldwork strategies. The inclusion of survey-agency-

related variables captures a large part of the variation at the third level; comparing σ2

vbetween

model 3 and model 4, we note that we can explain about 90% of the variation. However,

we need to take into account the fact that the variation at the survey agency level in total is

rather small in comparison with the variance at the interviewer level. We recall that the survey

agencies contribute about 1.6% of the variation, whereas interviewers account for about 24.9%.

According to the likelihood ratio test, adding the survey agency characteristic as a predictor of

co-operation improves the model ﬁt (p<0:0001).

Our results should be interpreted cautiously because accuracy of higher level parameter esti-

mates might be problematic in the context of multilevel models, particularly when the number

of groups is small. In Appendix A we present simulation analyses along this line.

5.3. Cross-level interactions and robustness analysis

In this subsection we show that our results are robust to the inclusion of cross-level interactions

and to various changes in model speciﬁcation.

Contributions to Panel Co-operation 23

Considering cross-level interactions allows us to investigate non-co-operation for certain

subgroups of respondents who are difﬁcult to interview for several reasons—e.g. individuals

in bad health, employed, living alone and with ‘unpleasant’ previous interview experience.

We focus on how the effect of individual characteristics differs according to interviewer at-

tributes.

As Groves and Couper (1998) suggested, interviewers with more experience are more able

to gain co-operation in problematic situations (e.g. resistance). Therefore, we ﬁrst investigate

whether interviewers’ experience can mitigate the negative association of respondent bad health,

marital status and previous interview indicators (item non-response and interrupted response

patterns) with co-operation. We ﬁnd statistically signiﬁcant interaction effects only for inter-

rupted response patterns. To clarify: in the last column of Table 5, we see that experience has a

0.642 positive coefﬁcient and interrupted response pattern has a negative coefﬁcient of 0.977. In

the second column of Table 8 we see that experience has a 0.716 positive coefﬁcient, interrupted

response pattern a 0.601 negative coefﬁcient and their interaction a 0.675 negative coefﬁcient.

Thus, experience per se is predictive of retention, but experienced interviewers are less likely to

gain co-operation when the respondent has an interrupted history of participation than inex-

perienced interviewers. A possible explanation is that experienced interviewers put more effort

where they expect higher rewards—and do not work as hard at regaining co-operation where

they know that respondents are more difﬁcult to keep in the sample.

Although the gender of the interviewer is generally not signiﬁcant in explaining co-operation

(West and Blom, 2017), we investigate whether for at least some respondents it plays a role. We

include in the model cross-level interactions between interviewer gender (female) and respon-

dent characteristics (such as bad health, marital status and previous interview indicators) to

see whether being interviewed by a female changes the propensity to participate. We ﬁnd sta-

tistically signiﬁcant effects for marital status: the positive correlation between being single and

co-operation in the baseline model speciﬁcation (Table 5, model 4) seems to be mainly driven

by singles interviewed by female interviewers (see the third column of Table 8).

Finally, on the basis of the evidence that sociodemographic similarities between respondent

and interviewer increase the propensity to co-operate (West and Blom, 2017), we test whether

matching based on age and gender affects co-operation. We ﬁnd that the nearness of age between

interviewer and respondent, measured as the distance between interviewer and respondent age,

has an insigniﬁcant effect on co-operation. We ﬁnd a similar insigniﬁcant result for gender

concordance.

We further ran robustness analyses by redeﬁning the estimation sample, the list of covariates

and the number of levels considered.

We redeﬁne our estimation sample along three dimensions. First, we look at the effect of

carrying out the analysis at the household, rather than the individual, level (the fourth col-

umn of Table 8). Although Durrant and Steele (2009) highlighted that co-operation is a com-

plex social phenomenon that is explained by individual rather than household characteristics,

we show that household level estimates are in line with individual level estimates. Next, in

the ﬁfth column of Table 8, we drop interviewers with fewer than six interviews. This sec-

ond model speciﬁcation addresses the potential inaccuracy of the estimates when the group

sizes are small (Hox, 2010). The results do not change. We use this model speciﬁcation to

perform the goodness-of-ﬁt test that was proposed by Perera et al. (2016) and fail to reject

the null hypothesis that the speciﬁed model ﬁts the data well. (Perera et al. (2016) developed

the goodness-of-ﬁt test for a two-level model. Therefore, in performing the test we treat our

model speciﬁcation as if was a two-level model. The computer code to perform the test is avail-

able from http://wileyonlinelibrary.com/journal/rss-datasets.) Lastly, in

24 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 8. Robustness analysis—multilevel model estimates (dependent variable: co-operation)†

Cross-level interactions Additional robustness analysis

Interrupted Single ×female Household Number of No proxy Interviewer Two-level model

response pattern ×interviewer level interviews >5 interviews education grouped countries

interviewer experience

Respondent characteristics

Female 0.102§ 0.098§ 0.167§§ 0.108§ 0.120§ 0.069 0.100§

(0.049) (0.049) (0.059) (0.049) (0.051) (0.054) (0.049)

Age 0.173§§ 0.171§§ 0.207§§ 0.172§§ 0.166§§ 0.171§§ 0.171§§

(0.029) (0.029) (0.036) (0.029) (0.031) (0.033) (0.029)

Age squared −0:001§§ −0:001§§ −0:001§§ −0:001§§ −0:001§§ −0:001§§ −0:001§§

(0.000) (0.000) (0.000) (0.000) (0.000) (0.000) (0.000)

Being in poor health −0:195§§ −0:194§§ −0:184§§ −0:200§§ −0:181§§ −0:182§§ −0:195§§

(0.051) (0.051) (0.060) (0.051) (0.053) (0.056) (0.051)

Single 0.206§§ 0.066 0.341§§ 0.200§§ 0.229§§ 0.211§§ 0.212§§

(0.069) (0.098) (0.079) (0.070) (0.072) (0.077) (0.069)

Proxy −0:192§ −0:193§ −0:355§§ −0:183‡ −0:157 −0:194§

(0.097) (0.097) (0.123) (0.098) (0.104) (0.097)

Years of education 0.045§ 0.045§ 0.004 0.046§ 0.042‡ 0.046§ 0.047§

(0.021) (0.021) (0.026) (0.021) (0.022) (0.023) (0.021)

Years of education squared −0:003§§ −0:003§§ −0:000 −0:003§§ −0:002§§ −0:003§§ −0:003§§

(0.001) (0.001) (0.001) (0.001) (0.001) (0.001) (0.001)

Household income—1st 0.236§§ 0.235§§ 0.114 0.238§§ 0.213§§ 0.268§§ 0.235§§

quartile (0.078) (0.078) (0.096) (0.078) (0.081) (0.085) (0.078)

Household income—2nd 0.384§§ 0.378§§ 0.268§§ 0.393§§ 0.380§§ 0.400§§ 0.376§§

quartile (0.078) (0.078) (0.084) (0.078) (0.080) (0.084) (0.078)

Household income—3rd 0.154§ 0.154§ 0.130 0.162§ 0.153‡ 0.195§ 0.154§

quartile (0.078) (0.078) (0.084) (0.078) (0.080) (0.086) (0.078)

Living in a (semi-)detached 0.299§§ 0.301§§ 0.274§§ 0.313§§ 0.303§§ 0.327§§ 0.303§§

house (0.058) (0.058) (0.067) (0.058) (0.060) (0.063) (0.058)

Working −0:065 −0:065 0.001 −0:050 −0:055 −0:146§ −0:063

(0.067) (0.067) (0.082) (0.068) (0.069) (0.075) (0.067)

Living in an urban area 0.046 0.040 0.024 0.033 0.051 0.068 0.041

(0.073) (0.073) (0.082) (0.074) (0.076) (0.081) (0.073)

(continued)

Contributions to Panel Co-operation 25

Tab le 8 (continued )

Cross-level interactions Additional robustness analysis

Interrupted Single ×female Household Number of No proxy Interviewer Two-level model

response pattern ×interviewer level interviews >5 interviews education grouped countries

interviewer experience

Paradata at the respondent level

Interrupted response −0:601§§ −0:977§§ −0:924§§ −0:974§§ −0:946§§ −0:979§§ −0:978§§

pattern (interviewed in (0.124) (0.083) (0.090) (0.083) (0.087) (0.089) (0.083)

wave 1 but not in wave 2)

Item non-response to −0:523§§ −0:529§§ −0:570§§ −0:537§§ −0:498§§ −0:546§§ −0:526§§

monetary questions (0.088) (0.088) (0.107) (0.089) (0.094) (0.097) (0.088)

Length of interview (h) 1.122§§ 1.152§§ 0.651§§ 1.088§§ 1.179§§ 1.267§§ 1.164§§

(0.272) (0.272) (0.154) (0.275) (0.286) (0.297) (0.274)

Length of interview

squared (h) −0:350§§ −0:364§§ −0:110§§ −0:336§§ −0:356§§ −0:435§§ −0:367§§

(0.114) (0.114) (0.039) (0.115) (0.120) (0.125) (0.114)

Willingness to answer 0.451§§ 0.452§§ 0.383§§ 0.457§§ 0.506§§ 0.370§§ 0.454§§

(0.090) (0.090) (0.111) (0.091) (0.097) (0.096) (0.090)

Did not ask for clariﬁcation 0.264§§ 0.262§§ 0.285§§ 0.260§§ 0.279§§ 0.302§§ 0.265§§

(0.068) (0.068) (0.080) (0.069) (0.072) (0.074) (0.068)

Interviewers’ characteristics (wave 4)

Age −0:004 −0:004 −0:002 −0:004 −0:004 −0:005 −0:003

(0.005) (0.005) (0.004) (0.005) (0.005) (0.005) (0.005)

Female 0.055 0.005 0.100 0.048 0.052 0.050 0.056

(0.104) (0.107) (0.100) (0.107) (0.105) (0.120) (0.104)

Interviewer education 0.037

(ISCED 5–6) (0.127)

Experience with working

on

0.716§§ 0.644§§ 0.688§§ 0.690§§ 0.623§§ 0.663§§ 0.634§§

previous SHARE waves (0.110) (0.109) (0.104) (0.112) (0.111) (0.125) (0.109)

Contacts −0:133‡ −0:134‡ −0:101 −0:142‡ −0:130‡ −0:145‡ −0:123‡

(0.069) (0.069) (0.063) (0.074) (0.071) (0.075) (0.065)

Rounding to a multiple of 5 −0:241§ −0:237§ −0:255§ −0:277§§ −0:239§ −0:199‡ −0:249§

for grip strength measure (0.105) (0.105) (0.101) (0.107) (0.107) (0.119) (0.105)

(too many)

(continued overleaf )

26 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Tab le 8 (continued )

Cross-level interactions Additional robustness analysis

Interrupted Single ×female Household Number of No proxy Interviewer Two-level model

response pattern ×interviewer level interviews >5 interviews education grouped countries

interviewer experience

Interviewers’ characteristics (wave 4)

Rounding to a multiple of 5 −0:785§§ −0:772§§ −0:763§§ −0:901§§ −0:809§§ −0:779§§ −0:763§§

for grip strength measure (0.228) (0.228) (0.224) (0.259) (0.232) (0.248) (0.228)

(too few)

Interactions

Single ×female interviewer 0.236§

(0.115)

Interrupted response −0:675§§

pattern ×interviewer (0.165)

experience

Agency control variables

Daily contact 0.712§§ 0.711§§ 0.664§§ 0.707§§ 0.695§§ 0.591§§ 0.665§§

(0.151) (0.152) (0.134) (0.149) (0.154) (0.216) (0.150)

Southern countries 0.116

(0.172)

Central countries 0.053

(0.122)

Constant −5:477§§ −5:344§§ −6:787§§ −5:484§§ −5:333§§ −5:343§§ −5:543§§

(1.076) (1.076) (1.312) (1.082) (1.131) (1.221) (1.085)

σ2

u(interviewer level) 1.006 1.005 0.763 1.003 1.012 1.072 1.011

σ2

v(agency level) 0.006 0.007 <0:001 0.004 0.007 0.012

N16945 16945 11890 16713 15913 13574 16945

†Standard errors are in parentheses; p-values for ﬁxed effect covariates signiﬁcance refer to Wald-type tests. Household level model speciﬁcation: the interview

length is deﬁned as the sum of the single-interview lengths; participation in the previous waves is deﬁned at the household level.

‡p<0:05.

§p<0:01.

§§p<0:001.

Contributions to Panel Co-operation 27

the sixth column of Table 8, we drop proxy interviews to check whether this rather particular

subsample of SHARE respondents drives our baseline results, but we ﬁnd that this is not so.

Our results are quite robust to the inclusion of further interviewer level controls. For a sub-

group of interviewers, we have information on education (interview education (ISCED 5–6) is a

dummy that takes the value 1 if the interviewer has tertiary education) and in the seventh column

of Table 8 we show that adding this variable does not change our results. Here we do not report

estimation results for a model speciﬁcation that includes a ‘short introduction’ variable (results

are available on request). This variable should capture interviewers who are likely to skip sec-

tion introductions and is an additional quality indicator. To ensure harmonization, interviewers

are instructed to read the whole CAPI question carefully. However, some interviewers do not

follow this instruction: when we compare keystroke data about section introductions, we ﬁnd

that there are interviewers who read them quickly. This variable is insigniﬁcant and its inclusion

leaves other parameter estimates unchanged.

The results regarding the effects of survey agency practices on the conditional mean of the

dependent variable are also robust to the way that we treat level 3 variability. Although the

simulation exercise conﬁrms the robustness of our baseline result regarding the positive effect of

the third-level control, we present in the ﬁnal column of Table8atwo-level model with controls

for groups of countries. (We group countries as follows: the dummy Southern countries takes

the value 1 for Italy and Spain and 0 otherwise and the dummy Central countries takes the

value 1 for Belgium, Switzerland, Germany, the Czech Republic and Austria, whereas Northern

countries, the reference group, equals 1 for Denmark, Sweden and the Netherlands.) The third-

level variable daily contact remains highly signiﬁcant.

6. Conclusions

Panel co-operation has been a long-standing issue in survey research, with several studies seek-

ing to identify the factors that affect subject attrition in panel surveys. Our analysis, based on

observational data, focuses especially on the role of paradata in providing additional infor-

mation to predict co-operation in a later wave of a panel. We are especially interested in the

factors affecting co-operation propensity that are ‘under the researcher’s control’: survey agency

ﬁeldwork strategies, the features of interviewers and the respondent–interviewer interaction. We

investigate which paradata from SHARE waves 3 and 4 help to predict co-operation in wave 4

regarding

(a) the way that the previous interview was conducted,

(b) the characteristics of the wave 4 interviewer and

(c) agency level ﬁeldwork indicators.

Using multilevel models, we ﬁnd that factors at all three levels (respondent, interviewer and

survey agency) inﬂuence co-operation.

Panel respondents may base their co-operation decision on the way that their previous inter-

view was conducted. We ﬁnd corroborating evidence for this: for instance, item non-response to

monetary questions predicts co-operation in the next wave—respondents who answered most

of the monetary items are more likely to participate in wave 4 than those who refused to an-

swer a considerable number of questions. The length of the interview is another factor that is

associated with co-operation in wave 4. We ﬁnd that very long interviews are associated with

lower participation in later waves. However, as long as the total length of the interview is less

than 1.6 h, which holds for the vast majority of our cases, longer interviews are associated

positively with future co-operation, possibly reﬂecting the respondent’s interest in the survey

28 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

or the quality of the interviewer–respondent interaction. This ﬁnding shows the difﬁculty with

deriving implications for questionnaire development from interview length.

As far as interviewer characteristics are concerned, we ﬁnd that previous experience with

working as a SHARE interviewer matters more than sociodemographic characteristics, such

as age, gender or education. This is in line with the literature: interviewers’ gender and age

have been generally found to be weak or insigniﬁcant determinants of co-operation, whereas

experience does play a role, although the mechanisms behind it are still not well understood

(West and Blom, 2017). Interviewers who perform well on survey tasks that require diligence are

also more successful in gaining co-operation. This again reﬂects the importance of high qual-

ity training and selecting diligent individuals as interviewers. Although the interviewer work

quality indicators that we have are statistically signiﬁcant, they account, together with socio-

demographic characteristics, for only a modest percentage of the variance at the interviewer

level. Important determinants, which were not considered here because of lack of informa-

tion, are for instance interviewer continuity (Watson and Wooden, 2009; Lynn et al., 2014),

socio-economic status, general attitudes, own behaviour, expectations and more comprehensive

measures of job experience.

Finally, regarding survey-agency-related controls, we ﬁnd that having contact with interview-

ers every day increases the chances of gaining respondents’ co-operation. This result may high-

light the importance of communication between survey agency co-ordinators and interviewers

to conduct surveys successfully, but it may also point to other factors at the survey agency level

that affect respondents’ co-operation (such as the relative importance that the survey agency

attaches to SHARE compared with other surveys that they are managing at the same time). The

limited number of survey agencies in our sample and the paucity of agency indicators prevent us

from using more agency level covariates and limits our ability to ascertain which is the correct

explanation. To investigate further the role of survey agency controls one should probably use

the most recent SHARE waves (7 and 8) that cover a much larger number of countries. Ideally,

more detailed quantitative paradata at the agency level should also be collected.

We have also investigated cross-level interactions: the most interesting ﬁnding is that the

interviewer’s experience is generally predictive of retention, except when the respondent has an

interrupted history of participation. A possible explanation is that experienced interviewers put

in less effort when they expect lower chances of success. We also ﬁnd signiﬁcant interaction

effects between the interviewer’s gender (female) and respondent marital status (being single),

and this may be used to devise a proﬁtable assignment strategy.

Our analysis provides a description of response behaviour in SHARE for a speciﬁc, relatively

early wave. Even in this setting, we have shown that an interrupted participation pattern makes

retention less likely. The response process in later waves might depend on previous participation

in more complex ways. To investigate this one should consider the whole longitudinal gross

sample, i.e. all the individuals who have been interviewed at least once, as this would allow the

separation of retention and recovery. The underlying mechanisms for subsequent participation

on the one hand (retention) and interrupted participation on the other hand (recovery) might

differ. We leave this to future research.

Acknowledgements

We are grateful for comments and suggestions made by participants at the conference of the

European Survey Research Association, the Panel Survey Methods Workshop and the sem-

inar of the Munich Center for the Economics of Aging, as well as by referees and the Joint

Editor. We gratefully acknowledge discussions with Thorsten Kneip, Julie Korbmacher, Omar

Contributions to Panel Co-operation 29

Paccagnella and Annette Scherpenzeel. This paper uses data from SHARE wave 1, wave 2,

wave 3 (SHARELIFE) and wave 4 release 6.0.0, at March 31st, 2017 (digital object identiﬁer

(DOI) 10.6103/SHARE.w1.600; DOI 10.6103/SHARE.w2.600; DOI 10.6103/SHARE.w3.600;

DOI 10.6103/SHARE.w4.600). The SHARE data collection has been primarily funded by

the European Commission through the ﬁfth framework programme (project QLK6-CT-2001-

00360 in the thematic programme ‘Quality of life’), through the sixth framework programme

(projects SHARE-I3, RII-CT-2006-062193, COMPARE, CIT5-CT-2005-028857 and SHARE-

LIFE, CIT4-CT-2006-028812) and through the seventh framework programme (SHARE-PREP

211909, SHARE-LEAP 227822 and SHARE M4 261982). Additional funding from the US Na-

tional Institute on Aging (U01 AG09740-13S2, P01 AG005842, P01 AG08291, P30 AG12815,

R21 AG025169, Y1-AG-4553-01, IAG BSR06-11 and OGHA 04-064) and the German Ministry

of Education and Research as well as from various national sources is gratefully acknowledged

(see www.share-project.org for a full list of funding institutions).

Appendix A: Simulation study

Multilevel model estimation is generally based on a maximum likelihood approach and standard errors

are derived under the assumption of an asymptotic normal distribution of the estimator. There are several

simulations studies which assess ﬁnite sample performance of multilevel models when the outcome is

continuous (see Maas and Hox (2005) for a recent review), but fewer analyses exist for discrete response

multilevel models. Moreover, these results are mainly for two-level binary models (see Paccagnella (2011)

for a literature review), with the exception of a recent study by Kim et al. (2013).

The main conclusions of the simulation analyses for binary multilevel models are that parameter es-

timates are downward biased whenever there are few observations per group (Rodr´

ıguez and Goldman,

1995), and when the number of groups is small, in particular when considering higher level covariates and

variance components (Bryan and Jenkins, 2016; Paccagnella, 2011). (Fewer than 30 groups lead to unac-

ceptable downward biases in the parameter estimates of a two-level logit model according to Bryan and

Jenkins (2016). Similar results are obtained in Paccagnella (2011).) Results for standard error bias exhibit

the same pattern. Paccagnella (2011) investigated the accuracy of model estimates in the case of a two-level

logit model and concluded that the bias in the ﬁxed part of the model is negligible even with 10 clusters,

but the number of clusters should increase signiﬁcantly to ensure accuracy of the variance components

estimate. Moreover, his simulation results show that the bias in the variance estimate is higher when the

second-level ICC is lower. Kim et al. (2013) focused on the comparison of estimation performance in both

two- and three-level models when using different methods and statistical packages but did not investigate

the role of group size and number on estimation accuracy. (Simulation results for the three-level speci-

ﬁcation are based on data sets in which there are 50 level 1 units, 10 level 2 units and 30 level 3 units

throughout.)

If the two-level logit model results extend to a three-level framework, two features of our model speciﬁ-

cation are likely to imply inaccurate estimates: on the one hand, the small number of level 3 groups, i.e. the

number of survey agencies; on the other hand the small ICC at the third level. Given the lack of simulation

results on binary response three-level models, we study the ﬁnite sample properties of a multilevel logit

model with a simulation exercise in which the hierarchical structure of the data sets generated replicates

the structure of our survey data set.

Following Goldstein and Rasbash (1996), we specify our baseline model as follows:

logit{pijk |.Zijk ;β,ujk ,vk/}=β0+β1X1ij k +β2D1ij k +β3X2jk +β4D2jk +β5D3k+ujk +vk,

Yij k |ujk,vk∼Bernoulli.pij k /.A:1/

where the controls Zijk are continuous and binary variables XÅand DÅrespectively, and the random effects

are independent and normally distributed, ujk ∼N.0, σ2

u/and vk∼N.0, σ2

v/. We consider two other model

speciﬁcations: a null model and a model without level 3 controls.

The baseline model speciﬁcation that is presented in equation (A.1) replicates the full model speciﬁcation

estimated in the last column of Table 5. However, for simplicity we include only two controls at level 1 and

level 2, a continuous and a binary control (X1ij k ,X2jk ) and (D1ij k ,D2jk) respectively, and only one binary

control at the third level, having the same distribution of the daily contact binary variable in our model.

30 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

According to Davis and Scott (1995), the intraclass correlations at level 2 and level 3 in a multilevel logit

model are deﬁned as

ICCj=σ2

u

σ2

e+σ2

u+σ2

v

,

ICCk=σ2

v

σ2

e+σ2

u+σ2

v

where σ2

e=π2=3. Varying level 2 and level 3 variances in ranges consistent with those estimated in

Tables 5 and 7, we exploit ﬁnite sample behaviour of estimates for a range of values of the intraclass

correlations: ICCj=[0:19, 0:26] and ICCk=[0:01, 0:035].

The true value of the parameters is reported in Table 9. Parameters are kept constant across model

speciﬁcations apart from the level 2 and level 3 variances.

In particular, we investigate the ﬁnite sample behaviour of estimates when the number of groups, Nk,

and the variance of the random effect at the third level, σ2

v, are small. In the simulations, the following

conditions vary:

(a) Nkassumes values in {5, 10, 15, 20, 25};

(b) σ2

vassumes values in {0:05, 0:1, 0:15}and the level 2 variance σ2

utakes values in {0:8, 1, 1:2}, in line

with model estimates in Table 7.

To replicate the variability in the group size that is observed in the data, we allow for heterogeneity in the

number of observations within each of the three levels. More precisely, the number of level 2 units within

each level 3 group can take ﬁve values .Sjk ={30, 45, 60, 75, 90}/, and this reproduces the variability in

the number of interviewers per survey agency in the data. The number of level 1 units within each level

2 group can take ﬁve values .Sij k ={10, 20, 30, 40, 70}/, which replicate the distribution of the number of

respondents per interviewers in the data. (These sets of ﬁve values are replicated according to the number

of level 3 and level 2 groups.)

Following Paccagnella (2011), for each combinations of the level 2 and level 3 variances we generate

1000 simulated data sets, R. To generate the covariates we simulate from ﬁve standard independent normal

distributions. The binary variables at level 1 and 2 take value 1 if the underlying continuous variable is

positive and 0 otherwise. The binary variable at level 3 is obtained from the underlying standard normal

distribution by imposing that the mean of the binary variable is 0.17, as for daily contact.

The random components ujk and vkare obtained with Rrandom draws from two independent normal

distributions with mean 0 and variances σ2

uand σ2

vrespectively.

Using the regression coefﬁcients of Table 9, the regressors generated and the random components, we

compute πij k =logit.pij k /and derive pij k by applying the inverse logit function. Finally, each value of the

dependent variable Yijk is a random draw from a Bernoulli distribution with probability pij k .

To perform our simulation exercise we use Stata 15. The multilevel logit models are estimated with the

melogit command. The integration method that was used to integrate the approximated likelihood over

the random effects is mode curvature adaptive Gauss–Hermite quadrature with seven integration points.

To gain knowledge on the accuracy of the estimates of model parameters and their standard errors,

we report three summary measures: relative parameter bias, non-coverage rate and relative standard error

Tab le 9. True parameters’

values used in the simulation

analysis

Parameter True value

β01.00

β10.8

β2−0:3

β3−0:7

β40.4

β5−0:2

Contributions to Panel Co-operation 31

Table 10. Results from baseline model simulations when the

level 2 variance σ2

uis set to 1 and the level 3 variance σ2

vto 0.05

Parameter Results for the following values of Nk:

510152025

Relative parameter bias (%)

β0−0:39 0.30 −0:37 −0:13 0.64

β10.26 −0:01 −0:11 −0:03 0.01

β20.33 −0:27 0.09 0.10 0.04

β3−0:28 0.16 −0:13 0.26 0.14

β40.11 −0:31 0.27 −0:08 −1:30

β5−0:26 0.00 −2:36 −4:50 2.21

σ2

u−0:75 −0:58 −0:56 −0:30 −0:14

σ2

v−44:29 −29:31 −17:29 −12:81 −10:70

Non-coverage rate

β00.13† 0.10† 0.10† 0.07† 0.08†

β10.05 0.06 0.05 0.05 0.04

β20.04 0.05 0.06 0.05 0.06

β30.05 0.05 0.04 0.05 0.04

β40.05 0.06 0.05 0.05 0.04

β50.21† 0.12† 0.08† 0.09† 0.07†

σ2

u0.09† 0.05 0.06 0.05 0.06

σ2

v0.43† 0.29† 0.21† 0.17† 0.15†

Relative standard error bias (%)

β0−17:60 −11:28 −10:10 −5:90 −2:19

β1−0:85 −2:99 −0:51 2.02 3.34

β24.15 0.32 −1:73 0.22 −3:11

β31.84 −1:07 0.14 −2:10 1.34

β42.76 −3:93 1.57 0.56 −0:63

β5−13:38 −13:92 −10:42 −12:06 −6:85

σ2

u−5:65 0.80 0.69 1.60 −2:12

σ2

v−17:30 −14:25 −8:50 −6:66 −6:47

†Signiﬁcantly different from 0.05 at the 5% level of signiﬁcance.

bias (Paccagnella, 2011; Bryan and Jenkins, 2016; Vassallo et al., 2017). The relative parameter bias is

computed as the percentage difference between estimated and true parameters. The non-coverage rate

(Mass and Hox, 2005) is used to assess the accuracy of the standard errors. It results from the average

over model replications of a binary indicator that takes value 1 if the true parameter value lies outside the

95% estimated conﬁdence interval. The estimates are accurate if the relative parameter bias is close to 0

and the non-coverage rate is close to 5%. Given that the non-coverage rate might reﬂect both parameter

bias and standard error bias, following Bryan and Jenkins (2016) and Rodr´

ıguez and Goldman (1995),

we compute also standard error bias comparing the ‘analytical’ standard error—the average of estimated

standard errors over the replications—and the ‘empirical’ standard error—the standard deviation of the

estimated parameters based on the Rreplications (Greene, 2004).

In Table 10 we report simulations results for the case in which σ2

u=1 and σ2

v=0:05, the closest scenario

to our full model speciﬁcation (the last column of Table 5). Focusing on the scenario with 10 groups at the

third level, the relative bias is 0 for most parameters with the exception of the level 3 variance, σ2

v, which

is 29.31% downward biased. The non-coverage rate is signiﬁcantly different from 0.05 for both β5(the

coefﬁcient of the level 3 dummy control) and σ2

v. This is the result of parameter bias and standard error

bias: according to the bottom panel of Table 10 the standard error of β5is underestimated by 13.92% and

the standard error of σ2

vis underestimated by 14.25%.

Generally, both parameter and standard error biases decrease as the number of level 3 groups increases

32 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Table 11. Results from null model simulations when the level 2 variance

σ2

uis set to 1.2 and the level 3 variance σ2

vto 0.15

Parameter Results for the following values of Nk:

510152025

Relative parameter bias (%)

β0−0:69 0.51 −0:40 0.37 0.58

σ2

u0.13 0.18 −0:08 −0:05 0.14

σ2

v−24:31 −13:86 −6:27 −4:56 −4:06

Non-coverage rate

β00.14† 0.10† 0.09† 0.08† 0.07

σ2

u0.06 0.05 0.06 0.05 0.06

σ2

v0.31† 0.22† 0.14† 0.13† 0.11†

Relative standard error bias (%)

β0−14:93 −9:29 −8:60 −6:20 −1:29

σ2

u−3:66 −0:23 −1:33 2.79 −2:62

σ2

v−6:73 −7:64 −5:61 −2:41 −2:65

†Signiﬁcantly different from 0.05 at the 5% level of signiﬁcance.

(with the sole exception of the relative parameter bias of β5), but they remain far from the target value

even with 25 groups. In the case of σ2

vthe non-coverage rate is as high as 0.15 even with 25 groups.

By varying σ2

uand σ2

v, and thus ICC, the downward bias of σ2

vranges from 20% to 29% and it is lower

when the ICC at the third level is higher. Results of these further simulations are available on request.

Given the simulation results, in our application in Table 5 we are likely to underestimate the third-level

variance of the full model by about 29%.

The simulation results reveal that the distribution of the estimated level 3 control, β5, shows large vari-

ability for all values of NK. It is worth stressing, however, that the coefﬁcient of daily contact would remain

statistically signiﬁcant in our application even if its relative bias was equal to the 10th or the 90th percentile

of the relative bias distribution, and accounting for the 14% underestimation of the standard error.

In addition to the baseline model speciﬁcation in equation (A.1), we replicate the simulation exercise

(by varying the level 2 and 3 variances as in the baseline scenario) for two alternative model speciﬁcations:

a null model (as in model 0 of Table 5) and a model without the binary level 3 control (as in model 3 of

Table 5). The rationale is to understand whether estimation accuracy changes with the number and level

of controls that are included and to provide some evidence on how we should expect the parameter bias

to change, varying the model speciﬁcation as in Table 5.

In Table 11 the simulation results for the null model with σ2

u=1:2 and σ2

v=0:15 are reported (values

that are close to those estimated in the second column of Table 5). The results are very similar when the

model without level 3 control is considered instead. The downward bias in the estimation of the level

3 variance is reduced by about 50% and the same is true for the standard error bias. In particular, the

negative parameter bias of σ2

vis between 11% and 18% and the standard errors bias is between 6% and 9% in

the case of 10 level 3 units when we let ICC vary within the speciﬁed range. This provides a rule of thumb to

measure the downward bias in higher level variances for various model speciﬁcations as reported in Table 7.

This result is somewhat intuitive if we think that we need a ‘large’ sample size to ensure consistency and

efﬁciency in regression model parameter estimates. This extends also to level 3 parameters: we need a large

number of groups, i.e. more information to exploit, to estimate additional level 3 effects reliably (Bryan

and Jenkins, 2016).

We should point out that the 95% conﬁdence interval that we use to derive the non-coverage rate is ob-

tained from the inversion of the Waldtest (as is nor mallydone in Stata). Berkhof and Snijders (2001) showed

that the Wald test has low power in the context of variance component tests and should not be used to test for

variance component signiﬁcance. In fact, the Wald test relies on the assumption of asymptotic normality of

Contributions to Panel Co-operation 33

the maximum likelihood estimator and this is problematic when the random-effect variance is considered,

in particular if its value is close to 0, as 0 lies on the boundary of the parameter space (Maas and Hox, 2005).

Bottai (2003) examined the asymptotic behaviour of conﬁdence intervals in the case in which infor-

mation is zero at a critical point of the parameter space. He compared several ways to derive conﬁdence

intervals—inversion of the log-likelihood ratio test, of the Wald test and of the score test—and found that

the score-test-based conﬁdence intervals, that use expected information (instead of observed information),

are the intervals that perform better. As stressed in Bottai and Orsini (2004), the problem of inference about

the variance of the random effect can be accommodated in this more general framework because, when

the variance component is 0, the score function is identically 0, and information is zero.

Bottai and Orsini (2004) developed the Stata routine xtvc that allows testing the null that the random-

effect variance is equal to a speciﬁc value (including 0) and computes ‘corrected’ conﬁdence intervals

based on the inversion of the score test. This routine works for random-effects linear regression models

and can be used after the xtreg command in Stata. The simulation results that are presented in the paper

show that the observed rejection rate is close to the nominal 5% level, regardless of the number of groups

considered. The conﬁdence interval that is obtained with the inversion of the score test is ‘slightly shifted

to include greater values’ (Bottai and Orsini (2004), page 432) with respect to the Wald conﬁdence interval.

In our simulations we use Wald-based conﬁdence intervals, as is normally done in the literature—see

for example the recent contribution by Vassallo et al. (2017)—even though they may be inaccurate as

the variance at level 3 is set to relatively small values. (Note that in our main model speciﬁcation we

test variance component signiﬁcance by using the adjusted likelihood ratio test (Section 5.2).) Possible

strategies to assess the level of inaccuracy would be extending Bottai and Orsini’s routine to multilevel

logit models, adopting a parametric bootstrapping strategy, or relying on alternative estimation procedures

such as the Bayesian Markov chain Monte Carlo algorithm.

Generalization of Bottai and Orsini’s (2004) routine to multilevel logit models requires working with a

marginal likelihood (in which random effects are integrated out) that does not have in this case a closed

form. This makes such a procedure computationally expensive. A parametric bootstrap method could be

used to construct 95% conﬁdence intervals for the variance components (Kuk, 1995; Goldstein, 1996), but

this would be even more computationally expensive. (For each iteration of the simulation process, some

samples should be drawn from the model evaluated at current parameter estimates. The model should then

be estimated for each sample and the conﬁdence intervals constructed from the distribution of parameters

estimates.)

Alternatively, a Bayesian Markov chain Monte Carlo algorithm—with non-informative priors to ease

comparability with maximum likelihood—could be used to perform the entire analysis. Such an algorithm,

which would also entail an extra computational burden to achieve convergence, would in fact directly pro-

vide conﬁdence intervals for parameter estimates based on the posterior distributions. We know from

Rodr´

ıguez and Goldman (2001) that in three-level logistic models parameter estimates by using full max-

imum likelihood and Bayesian estimation are similar when the random-effects variances are large. To the

best of our knowledge, a Bayesian Markov chain Monte Carlo procedure for the three-level logit model in

the case where at least one variance is small has not been implemented in the literature; therefore we leave

further investigation of this issue to future research.

To the extent that we can draw from the existing literature, we can expect the conﬁdence intervals for

random-effects variances based on the Wald test to be smaller and shifted towards 0 (see for example

Turner et al. (2001) and Browne and Draper (2006)).

References

Berkhof, J. and Snijders, T. (2001) Variance component testing in multilevel models. J. Educ. Behav. Statist.,26,

133–152.

Blom, A. G. (2012) Explaining cross-country differences in survey contact rates: application of decomposition

methods. J. R. Statist. Soc. A, 175, 217–242.

Blom, A. G., de Leeuw, E. D. and Hox, J. J. (2011) Interviewer effects on nonresponse in the European Social

Survey. J. Off. Statist.,27, 359–377.

Blom, A. G., Lynn, P. and J ¨

ackle, A. (2008) Understanding cross-national differences in unit non-response: the

role of contact data. Working Paper 2008-01. Institute for Social and Economic Research, University of Essex,

Colchester.

B¨

orsch-Supan, A., Brandt, M., Hunkler, C., Kneip, T., Korbmacher, J., Malter, F., Schaan, B., Stuck, S. and

Zuber, S. (2013) Data resource proﬁle: the Survey of Health, Ageing and Retirement in Europe (SHARE). Int.

J. Epidem.,42, 992–1001.

34 J. Bristle, M. Celidoni, C. Dal Bianco and G.Weber

Bottai, M. (2003) Conﬁdence regions when the Fisher information is zero. Biometrika,90, 73–84.

Bottai, M. and Orsini, N. (2004) Conﬁdence intervals for the variance component of random-effects linear models.

Stata J.,4, 429–435.

Branden, L., Gritz, R. M. and Pergamit, M. R. (1995) The effect of interview length on attrition in the National

Longitudinal Survey of Youth. Report NLS 95-28. Bureau of Labor Statistics, Washington DC.

Browne, W. J. and Draper D. (2006) A comparison of Bayesian and likelihood-based methods for ﬁtting multilevel

models. Baysn Anal.,1, 473–514.

Bryan, M. L. and Jenkins, S. P. (2016) Multilevel modelling of country effects: a cautionary tale. Eur. Sociol. Rev.,

32, 3–22.

Burton, J., Laurie, H. and Moon, N. (1999) Don’t ask me nothin’ about nothin’, I just might tell you the truth—the

interaction between unit nonresponse and item nonresponse. Int. Conf. Survey Nonresponse, Portland.

Campanelli, P. and O’Muircheartaigh, C. (1999) Interviewers, interviewer continuity, and panel survey response.

Qual. Quant.,33, 59–76.

Campanelli, P. and O’Muircheartaigh, C. (2002) The importance of experimental control in testing the impact of

interviewer continuity on panel survey nonresponse. Qual. Quant.,36 129–144.

Couper, M. P. and Kreuter, F. (2013) Using paradata to explore item level response times in surveys. J. R. Statist.

Soc. A, 176, 271–286.

Davis, P. and Scott, A. (1995) The effect of interviewer variance on domain comparisons. Surv. Methodol.,21,

99–106.

Durrant, G. B. and D’Arrigo, J. (2014) Doorstep interactions and interviewer effects on the process leading to

cooperation or refusal. Sociol. Meth. Res.,43, 490–518.

Durrant, G. B., Groves, R. M., Staetsky, L. and Steele, F. (2010) Effects of interviewer attitudes and behaviors

on refusal in household surveys. Publ. Opin. Q.,74, 1–36.

Durrant, G. and Kreuter, F. (2013) The use of paradata in social survey research. J. R. Statist. Soc. A, 176,

1–3.

Durrant, G.B. and Steele, F. (2009) Multilevel modelling of refusal and non-contact in household surveys: evidence

from six UK Government surveys. J. R. Statist. Soc. A, 172, 361–381.

Fielding, A. (2004) Scaling for residual variance components of ordered category responses in generalised linear

mixed multilevel models: quality and quantity. Eur. J. Methodol.,38, 425–433.

Fricker, S., Creech, B., Davis, J., Gonzalez, J., Tan, L. and To, N. (2012) Does length really matter?: Exploring

the effects of a shorter interview on data quality, nonresponse, and respondent burden. Federal Committee on

Statistical Methodology Research Conf., Washington DC.

Goldstein, H. (1996) Consistent estimators for multilevel generalised linear models using an iterated bootstrap.

Multilev. Modllng Newslett.,8, 3–6.

Goldstein, H. (2011) Multilevel Statistical Models, 4th edn. Chichester: Wiley.

Goldstein, H. and Rasbash, J. (1996) Improved approximations for multilevel models with binary responses. J. R.

Statist. Soc. A, 159, 505–513.

Goyder, J. (1987) The Silent Minority Boulder: Nonrespondents on Sample Surveys. Boulder: Westview.

Greene, W. (2004) The behaviour of the maximum likelihood estimator of limited dependent variable models in

the presence of ﬁxed effects. Econmetr. J.,7, 98–119.

Groves, R. M., Cialdini, R. B. and Couper, M. (1992) Understanding the decision to participate in a survey. Publ.

Opin. Q.,56, 475–495.

Groves, R. M. and Couper, M. P. (1998) Nonresponse in Household Interview Surveys. New York: Wiley.

Hill, D. H. and Willis, R. J. (2001) Reducing panel attrition: a search for effective policy instruments. J. Hum.

Resour.,36, 416–438.

Hox, J. J. (2010) Multilevel Analysis: Techniques and Application, 2nd edn. New York: Routledge.

Hox, J. J. and de Leeuw, E. (2002) The inﬂuence of interviewers’ attitude and behavior on household survey

nonresponse: an international comparison. In Survey Nonresponse (eds R. M. Groves, D. A. Dillman, J. L.

Eltinge and R. J. A. Little). New York: Wiley.

J¨

ackle, A., Lynn, P., Sinibaldi, J. and Tipping, S. (2013) The effect of interviewer experience, attitudes, personality

and skills on respondent co-operation with face-to-face surveys. Sur. Res. Meth.,7, 1–15.

Kim, Y., Choi, Y.-K. and Emery, S. (2013) Logistic regression with multiple random effects: a simulation study

of estimation methods and statistical packages. Am. Statistn,63, 171–182.

Kneip, T. (2013) Survey participation in the fourth wave of SHARE. In SHARE Wave 4: Innovations and Method-

ology (eds F. Malter and A. B ¨

orsch-Supan), pp. 140–155. Munich: Munich Center for the Economics of Aging.

Korbmacher, J. M. and Schr ¨

oder, M. (2013) Consent when linking survey data with administrative records: the

role of the interviewer. Surv. Res. Meth.,7, 115–131.

Krause, N. (1993) Neighbourhood deterioration and social isolation in later life. Int. J. Agng Hum. Devlpmnt,36,

9–38.

Kreuter, F. (2013) Improving Surveys with Paradata: Analytic Uses of Process Information. Hoboken: Wiley.

Kreuter, F., Couper, M. P. and Lyberg, L. E. (2010) The use of paradata to monitor and manage survey data

collection. Proc. Surv. Res. Meth. Sect. Am. Statist. Ass., 282–296.

Krosnick, J. A. (1991) Response strategies for coping with the cognitive demands of attitude measures in surveys.

Appl. Cogn. Psychol.,5, 213–236.

Contributions to Panel Co-operation 35

Kuk, A. Y. C. (1995) Asymptotically unbiased estimation in generalized linear models with random effects. J. R.

Statist. Soc. B, 57, 395–407.

Lemay, M. and Durand, C. (2002) The effect of Interviewer Attitude on Survey Cooperation. Bull. Methodol.

Sociol.,76, 27–44.

Lepkowski, J. M. and Couper, M. P. (2002) Nonresponse in the second wave of longitudinal household surveys.

In Survey Nonresponse (eds R. M. Groves, D. A. Dillman, J. L. Eltinge and R. J. A. Little). New York: Wiley.

Lipps, O. and Benson, G. (2005) Cross national contact strategies. Proc. Surv. Res. Meth. Sect. Am. Statist. Ass.

Lipps, O. and Pollien, A. (2011) Effects of interviewer experience on components of nonresponse in the European

Social Survey. Fld Meth.,23, 156–172.

Loosveldt, G. and Beullens, K. (2013) The impact of respondents and interviewers on interview speed in face-to-

face interviews. Socl Sci. Res.,42, 1422–1430.

Loosveldt, G., Pickery, J. and Billiet, J. (2002) Item nonresponse as a predictor of unit nonresponse in a panel

survey. J. Off. Statist.,18, 545–557.

Lugtig, P. (2014) Panel attrition: separating stayers, fast attriters, gradual attriters, and lurkers. Sociol. Meth. Res.,

14, 699–723.

Lynn, P. (2013) Longer interviews may not affect subsequent survey participation propensity. Understanding Soci-

ety Working Paper Series 2013-07. Institute for Social and Economic Research, University of Essex, Colchester.

Lynn P., Kaminska, O. and Goldstein, H. (2014) Panel attrition: how important is it to keep the same interviewer?

J. Off. Statist.,30, 434–457.

Maas, C. and Hox, J. (2004) Robustness issues in multilevel regression analysis. Statist. Neerland.,58, 127–137.

Malter, F. and B ¨

orsch-Supan, A. (eds) (2013) SHARE Wave 4: Innovations & Methodology. Munich: Munich

Center for the Economics of Aging.

Moore, J., Stinson, L. and Welniak, E. (2000) Income measurement error in surveys: a review. J. Off. Statist.,16,

331–361.

Nicoletti, C. and Peracchi, F. (2005) Survey response and survey characteristics: microlevel evidence from the

European Community Household Panel. J. R. Statist. Soc. A, 168, 763–781.

O’Muircheartaigh, C. and Campanelli, P. (1999) A multilevel exploration of the role of interviewers in survey

non-response. J. R. Statist. Soc. A, 162, 437–446.

Paccagnella, O. (2011) Sample size and accuracy of estimates in multilevel models: new simulation results. Method-

ology,7, no. 3, 111–120.

Perera, A. A. P. N. M., Sooriyarachchi, M. R. and Wickramsuriya, S. L. (2016) A goodness of ﬁt test for the

multilevel logistic model. Communs Statist. Simuln Computn,45, 643–659.

Pickery J. and Loosveldt, G. (2002) A multilevel multinomialanalysis of interviewer effects on various components

of unit nonresponse. Qual. Quant.,36, 427–437.

Pickery, J., Loosveldt, G. and Carton, A. (2001) The effects of interviewer and respondent characteristics on

response behavior in panel surveys: a multilevel approach. Sociol. Meth. Res.,29, 509–523.

Rabe-Hesketh, S. and Skrondal, A. (2005) Multilevel and Longitudinal Modeling using Stata, 2nd edn. College

Station: Stata Press.

Rodr´

ıguez, G. and Goldman, N. (1995) An assessment of estimation procedures for multilevel models with binary

responses. J. R. Statist. Soc. A, 158, 73–89.

Rodr´

ıguez, G. and Goldman, N. (2001) Improved estimation procedures for multilevel models with binary re-

sponses: a case-study. J. R. Statist. Soc. A, 164, 339–355.

Self, S. G. and Liang, K. Y. (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio

tests under nonstandard conditions. J. Am. Statist. Ass.,82, 605–610.

Sharp, L. M. and Frankel, J. (1983) Respondent burden: a test of some common assumptions. Publ. Opin. Q.,47,

36–53.

Schr¨

oder, M. (2011) Retrospective data collection in the Survey of Health, Ageing and Retirement in Europe:

SHARELIFE methodology. Munich Center for the Economics of Aging, Munich.

Skrondal, A. and Rabe-Hesketh, S. (2004) Generalized Latent Variable Modeling: Multilevel, Longitudinal and

Structural Equation Models. Boca Raton: Chapman and Hall–CRC.

Turner, R. M., Omar, R. Z. and Thompson, S. G. (2001) Bayesian methods of analysis for cluster randomized

trials with binary outcome data. Statist. Med.,20, 453–472.

Vassallo, R., Durrant, G. and Smith, P. W. F. (2017) Separating interviewer and area effects by using a cross-

classiﬁed multilevel logistic model: simulation ﬁndings and implications for survey designs. J. R. Statist. Soc.

A, 180, 531–550

Vassallo, R., Durrant, G. B., Smith, P. W. F. and Goldstein, H. (2015) Interviewer effects on non-response propen-

sity in longitudinal surveys: a multilevel modelling approach. J. R. Statist. Soc. A, 178, 83–99.

Watson, N. and Wooden, M. (2009) Identifying factors affecting longitudinal survey response. In Methodology

of Longitudinal Surveys (ed. P. Lynn). Chichester: Wiley.

West, B. T. and Blom, A. G. (2017) Explaining interviewer effects: a research synthesis. J. Surv. Statist. Methodol.,

5, 175–211.