- Access to this full-text is provided by Springer Nature.
- Learn more

Download available

Content available from Quality & Quantity

This content is subject to copyright. Terms and conditions apply.

Vol.:(0123456789)

Quality & Quantity (2019) 53:1051–1074

https://doi.org/10.1007/s11135-018-0802-x

1 3

Fixed andrandom eects models: making aninformed

choice

AndrewBell1 · MalcolmFairbrother2· KelvynJones3

Published online: 7 August 2018

© The Author(s) 2018

Abstract

This paper assesses the options available to researchers analysing multilevel (including lon-

gitudinal) data, with the aim of supporting good methodological decision-making. Given

the confusion in the literature about the key properties of ﬁxed and random eﬀects (FE

and RE) models, we present these models’ capabilities and limitations. We also discuss

the within-between RE model, sometimes misleadingly labelled a ‘hybrid’ model, showing

that it is the most general of the three, with all the strengths of the other two. As such, and

because it allows for important extensions—notably random slopes—we argue it should

be used (as a starting point at least) in all multilevel analyses. We develop the argument

through simulations, evaluating how these models cope with some likely mis-speciﬁca-

tions. These simulations reveal that (1) failing to include random slopes can generate anti-

conservative standard errors, and (2) assuming random intercepts are Normally distributed,

when they are not, introduces only modest biases. These results strengthen the case for the

use of, and need for, these models.

Keywords Multilevel models· Fixed eﬀects· Random eﬀects· Mundlak· Hybrid models·

Within and between eﬀects

Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s1113

5-018-0802-x) contains supplementary material, which is available to authorized users.

* Andrew Bell

andrew.j.d.bell@sheﬃeld.ac.uk

Malcolm Fairbrother

Malcolm.fairbrother@umu.se

Kelvyn Jones

kelvyn.jones@bristol.ac.uk

1 Sheﬃeld Methods Institute, University ofSheﬃeld, ICOSS, 219 Portobello, SheﬃeldS14DP, UK

2 Sociology Department, Umeå University, Hus Y, Beteendevarhuset, Mediagränd 14,

Beteendevetarhuset, Umeå universitet, 90187Umeå, Sweden

3 School ofGeographical Sciences andCentre forMultilevel Modelling, University ofBristol,

University Road, BristolBS81SS, UK

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1052

A.Bell et al.

1 3

1 Introduction

Analyses of data with multiple levels, including longitudinal data, can employ a variety

of diﬀerent methods. However, in our view there is signiﬁcant confusion regarding these

methods. This paper therefore presents and clariﬁes the diﬀerences between two key

approaches: ﬁxed eﬀects (FE) and random eﬀects (RE) models. We argue that in most

research scenarios, a well-speciﬁed RE model provides everything that FE provides and

more, making it the superior method for most practitioners (see also Shor et al. 2007;

Western 1998). However, this view is at odds with the common suggestion that FE is often

preferable (e.g. Vaisey and Miles 2017), if not the “gold standard” (e.g. Schurer and Yong

2012). We thus address widespread misunderstandings about FE and RE models, such as

those from the literature’s use of confusing terminology (including the phrase ‘random

eﬀects’ itself—see for example Gelman 2005) and/or diﬀerent disciplines’ contradictory

approaches to the same important methodological questions.

In addition to this synthesis of the inter-disciplinary methodological literature on FE and

RE models (information that, whilst often misunderstood, is not new), we present an origi-

nal simulation study showing how various forms of these models respond in the presence

of some plausible model mis-speciﬁcations. The simulations show that estimated standard

errors are anti-conservative when random-slope variation exists but a model does not allow

for it. They also show the robustness of estimation results to mis-speciﬁcation of random

eﬀects as Normally distributed, when they are not; substantial biases are conﬁned to vari-

ance and random eﬀect estimates in models with a non-continuous response variable.

The paper begins by outlining what both FE and RE aim to account for: clustering

or dependence in a dataset, and diﬀering relationships within and between clusters. We

then present our favoured model: a RE model that allows for distinct within and between

eﬀects,1 which we abbreviate “REWB”, with heterogeneity modelled at both the cluster

(level 2) and observation (level 1) level. Focussing ﬁrst on the ﬁxed part of the model, we

show how the more commonly used FE, RE and pooled OLS models can be understood

as constrained and more limited versions of this model; indeed, REWB is our favoured

model because of its encompassing nature. Section3 of this paper focuses on the diﬀer-

ent treatment of level-2 entities in FE and RE models, and some of the advantages of the

RE approach. In Sect.4, we consider some important extensions to the REWB model that

cannot be as eﬀectively implemented under a FE or Ordinary Least Squares framework:

‘random slopes’ allowing the associations between variables to vary across higher-level

entities, further spatial and temporal levels of analysis, and explicit modelling of complex

level 1 heteroscedasticity. We show that implementing these extensions can often be of

paramount importance and can make results more nuanced, accurate, and informative. Sec-

tion5 then considers models with a non-continuous response variable, and some of the dis-

tinct challenges that such data present, before considering the assumptions made by the RE

model and the extent to which it matters when those assumptions are violated. The article

concludes with some practical advice for researchers deciding what model they should use

and how.

1 The use of the term ‘eﬀect’ in the phrase ‘within eﬀect’, ‘between eﬀect’ and ‘contextual eﬀect’ should

not imply that these should necessarily be interpreted as causal. This caution applies to the phrases ‘random

eﬀects’ and ‘ﬁxed eﬀects’ as well.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1053

Fixed andrandom eﬀects models: making aninformed choice

1 3

2 Within, betweenandcontextual eects: conceptualising thexed

part ofthemodel

Social science datasets often have complex structures, and these structures can be highly

relevant to the research question at hand, and not merely a convenience in the research

design that has become a nuisance in the analysis. Often, observations (at level 1) are clus-

tered into groups of some kind (at level 2). Such two-level data structures are the main

focus of this paper, though data are sometimes grouped at further levels, yielding three (or

more) levels. Some of the most common multilevel structures are outlined in Table1. In

broad terms, these can be categorised into two types: cross-sectional data, where individu-

als are nested within a geographical or social context (e.g. individuals at level 1, within

schools or countries at level 2), and longitudinal data, where individuals or social units are

measured on a number of occasions. In the latter context, this means occasions (at level

1) are nested within the individual or entity (now at level 2). In all cases these structures

represent real and interesting societal conﬁgurations; they are not simply a technicality or

consequence of survey methodology, as the population may itself be structured by social

processes, distinctions, and inequalities.

Structures are important in part because variables can be related at more than one level

in a hierarchy, and the relationships at diﬀerent levels are not necessarily equivalent. Cross-

sectionally, for example, some social attitude (Y) may be related to an individual’s income

X (at level 1) very diﬀerently than to the average income in their neighbourhood, country,

or region (level 2). A classic example of this comes from American politics. American

states with higher incomes therefore tend to elect more Democratic than Republican politi-

cians, but within states richer voters tend to support Republican rather than Democratic

candidates (Gelman 2008).

Longitudinally, people might be aﬀected by earning what is, for them, an unusually high

annual income (level 1) in a diﬀerent way than they are aﬀected by being high-earners

generally across all years (level 2). The same can hold for whole societies: Europeans for

example demand more income redistribution from their governments in times of greater

inequality—relative to the average for their country—even though people in consistently

more unequal countries do not generally demand more redistribution (Schmidt-Catran

2016). Thus, we can have “within” eﬀects that occur at level 1, and “between” or “contex-

tual” eﬀects that occur at level 2 (Howard 2015), and these three diﬀerent eﬀects should

not be assumed to be the same.

Sometimes it is the case that within eﬀects are of the greatest interest, especially when

policy interventions are evaluated. With panel data, for example, within eﬀects can cap-

ture the eﬀect of an independent variable changing over time. Many studies have argued

for focusing on the longitudinal relationships because unobserved, time-invariant diﬀer-

ences between the level 2 entities are then controlled for (Allison 1994; Halaby 2004, see

Sect.2.3). Christmann (2018) for example shows that people are more satisﬁed with the

functioning of democracy in their country during times of good economic performance—a

within-country eﬀect that shows the value of improving economic performance.

Yet between eﬀects in longitudinal studies are often equally illuminating, despite being

by deﬁnition non-changing—as evidenced by the many published studies that rely exclu-

sively on cross-sectional data. Similarly, in cross-sectional studies, the eﬀects of wider

social contexts on individuals can also be extremely relevant. Social science is concerned

with understanding the world as it exists, not just dynamic changes within it. Thus with a

panel dataset for example, it will often be worth modelling associations at the higher level,

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1054

A.Bell et al.

1 3

Table 1 Some hierarchical structures of data common in social science

For more elaboration of hierarchical and non-hierarchical structures, see Rasbash (2008)

Broad category Data type Level 1 Level 2 Level 3

Cross-sectional Clustered survey data (Maimon and Kuhl 2008) Individuals Neighbourhoods –

Cross-sectional Cross-national survey data (Ruiter and van Tubergen 2009) Individuals Countries –

Cross-sectional Surveys with multiple items (Deeming and Jones 2015; Sampson

etal. 1997)

Items Individuals –

Panel Country time-series cross-sectional data (Beck and Katz 1995;

Western 1998)

Occasions Countries –

Panel Individual panel data (Lauen and Gaddis 2013) Occasions Individuals –

Panel at level 1, cross-sectional at level 2 Panel data on individuals who are clustered (Kloosterman etal.

2010)

Occasions Individuals Schools

Cross-sectional at level 1, Panel at level 2 Comparative longitudinal survey data (Fairbrother 2014; Schmidt-

Catran and Spies 2016), or repeated cross-sectional data (Duncan

etal. 1996)

Individuals Country-years/region-years Countries/regions

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1055

Fixed andrandom eﬀects models: making aninformed choice

1 3

in order to understand the ways in which individuals diﬀer—not just the ways in which

they change over time (see, for example, Subramanian etal. 2009). We take it as axiomatic

that we need both micro and macro associations to understand the whole of ‘what is going

on’.

2.1 The most general: within‑between REandMundlak models

We now outline some statistical models that aim to represent these processes. Taking a

panel data example, where individuals i (level 2) are measured on multiple occasions t

(level 1), we can conceive of the following model—the most general of the models that

we consider in this paper. This speciﬁcation is able to model both within- and between-

individual eﬀects concurrently, and also explicitly models heterogeneity in the eﬀect of

predictor variables at the individual level:

Here

yit

is the dependent variable,

xit

is a time-varying (level 1) independent variable,

and

zi

is a time-invariant (level 2) independent variable. The variable

xit

is divided into

two with each part having a separate eﬀect:

𝛽1W

represents the average within eﬀect of

xit

, whilst

𝛽2B

represents the average between eﬀect of

xit

.2 The

𝛽3

parameter represents

the eﬀect of time-invariant variable

zi

, and is therefore in itself a between eﬀect (level 2

variables cannot have within eﬀects since there is no variation within higher-level entities.)

Further variables could be added as required.

The random part of the model includes two terms at level 2—a random eﬀect (

𝜐i0

)

attached to the intercept and a random eﬀect (

𝜐i1

) attached to the within slope—that

between them allow heterogeneity in the within-eﬀect of

xit

across individuals. Each of

these are usually assumed to be Normally distributed (as discussed later in this paper).

We will demonstrate in Sect.4 that specifying heterogeneity at level 2 (with the

𝜐i1

term

in Eq.1) can be important for avoiding biases, in particular in standard errors, and this is a

key problem with FE and ‘standard’ RE models. However, to clarify the initial arguments

of the ﬁrst part of this paper, we consider a simpliﬁed version of this model that assumes

homogeneous eﬀects across level 2 entities:

Here

𝜐i

are the model’s (homogeneous) random eﬀects for individuals i, which are

assumed to be Normally distributed. The

𝜖it

are the model’s (homoscedastic) level 1 residu-

als, which are also assumed to be Normally distributed (we will discuss models for non-

Gaussian outcomes, with diﬀerent distributional assumptions, later).

An alternative parameterisation to Eq.2 (with the same distributional assumptions) is

the ‘Mundlak’ formulation (Mundlak 1978):

(1)

yit =𝜇+𝛽1W(xit −̄xi)+𝛽2B̄xi+𝛽3zi+𝜐i0+𝜐i1(xit −̄xi)+𝜖it0

(2)

yit

=𝛽0+𝛽1

W

(x

it

−̄x

i

)+𝛽2

B

̄x

i

+𝛽3z

i

+

(

𝜐

i

+𝜖

it).

(3)

yit =𝛽0+𝛽1Wxit +𝛽2C̄xi+𝛽4zi+(𝜐i+𝜖it).

2 Note that the variable

̄xi

associated with

𝛽2

could be calculated using only observations for which there is

a full data record, though if more data exists this could be included in the calculation of

̄xi

, to improve the

estimate of

𝛽2

. Alternatively, calculating

(xit −̄xi)

with only observations included in the model ensures

𝛽1

is estimated using only within-unit variation. In practice, the diﬀerence between these modelling choices is

usually negligible.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1056

A.Bell et al.

1 3

Here

xit

is included in its raw form rather than de-meaned form

xit

−̄xi

. Instead of the

between eﬀect

𝛽2B

, the Mundlak model estimates the “contextual eﬀect”

𝛽2C

. The key dif-

ference between these two, as spelled out both graphically and algebraically by Rauden-

bush and Bryk (2002:140) is that the raw value of the time–varying predictor (

xit

) is con-

trolled for in the estimate of the contextual eﬀect in Eq.3, but not in the estimate of the

between eﬀect in Eq.2. Thus if the research question at hand is “what is the eﬀect of a

(level 1) individual moving from one level-2 entity to another”, the contextual eﬀect (

𝛽2C

)

is of more interest, since it holds the level 1 individual characteristics constant. In contrast,

if we simply want to know “what is the eﬀect of changing the level of

̄xi

, without keeping

the level of

xit

constant?”, the between eﬀect (

𝛽2B

) will provide an answer to that. With

longitudinal data, the contextual eﬀect is fairly meaningless: it doesn’t make sense for an

observation (level 1) to move from one (level 2) individual to another, because they are by

deﬁnition belonging to a speciﬁc individual. It therefore makes little sense to control for

those observations in estimating the level 2 eﬀect. As such, the between eﬀect, and thus the

REWB model, is generally more informative. When using cross-sectional data, the contex-

tual eﬀect is of interest (since we can imagine level 1 individuals moving between level 2

entities without altering their own characteristics). It can thus measure the additional eﬀect

of the level 2 entity, once the individual-level characteristic has been accounted for. The

between eﬀect can also be interpreted, but a signiﬁcant eﬀect could be produced as a result

of the composition of level 1 entities, without a country-level construct driving the eﬀect.

Note, however, that these models are equivalent, since

𝛽1W+𝛽2C=𝛽2B

; each model con-

veys the same information and will ﬁt the data equally well and we can obtain one from the

other with some simple arithmetic.3

In a rare recent example using cross-sectional international survey data, Fairbrother

(2016) studied public attitudes towards environmental protection, allowing for separate but

simultaneous tests both among and within countries of the associations between key atti-

tudinal variables. This permitted the identiﬁcation of political trust as an especially critical

correlate of greater support for environmental protection at both the individual and national

level—an important discovery in the substantive literature.

Both the Mundlak model and the within-between random eﬀects (REWB) models

(Eqs. 2 and 3 respectively) are easy to ﬁt in all major software packages (e.g. R, Stata,

SAS, as well as more specialist software like HLM and MLwiN). They are simply random

eﬀects models with the mean of

xit

included as an additional explanatory variable (Howard

2015).

2.2 Constraining thewithin‑between REmodel: xed eects, random eects

andOLS

Having established our ‘encompassing’ model in its two alternative forms (Mundlak, and

within-between), we now present three models that are currently more often used. Show-

ing how each of these is a constrained version of Eqs.2 or 3 above, we demonstrate the

disadvantages of choosing any of them instead of the more general and informative REWB

speciﬁcation.

3 One potential advantage of the within-between model over the Mundlak speciﬁcation is that there will

be zero correlation between

̄xi

and

(xit −̄xi)

, which can facilitate model convergence. Furthermore, if there

is problematic collinearity between multiple

̄xi

’s, some or all of these can be omitted without aﬀecting the

estimates of

𝛽1

.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1057

Fixed andrandom eﬀects models: making aninformed choice

1 3

2.2.1 Random eects withoutwithinandbetween separation

One commonly used model uses the random eﬀects framework, but does not estimate sepa-

rate relationships at each of the two levels:

This approach eﬀectively assumes that

𝛽1W=𝛽2B

, or equivalently that

𝛽2C=0

, in Eqs.2

and 3 (Bell etal. 2018). Where this assumption is valid, this model is a good choice, and

has beneﬁts over the more general model. Speciﬁcally, the estimate of

𝛽RE

1

will be more

eﬃcient than the estimates of

𝛽1

or

𝛽2B

in Eq. 2, because it can utilise variation at both

the higher and lower level (e.g. Fairbrother 2014; Halaby 2004). However, when

𝛽1≠𝛽2B

,

the model will produce a weighted average of the two,4 which will have little substan-

tive meaning (Raudenbush and Bryk 2002:138). Fortunately, it is easy to test whether the

assumption of equal within and between eﬀects is true, by testing the equality of the coef-

ﬁcients in the REWB model), or the signiﬁcance of the contextual eﬀect in the Mundlak

model (for example via a Wald test). If there is a signiﬁcant diﬀerence (and not just that the

between eﬀect is signiﬁcant diﬀerent from zero) the terms should not be combined, and the

encompassing within-between or Mundlak model should be used. This was done by Han-

chane and Mostafa (2012) considering bias with this model for school (level 2) and student

(level 1) performance. They found that in less selective school systems (Finland), there was

little bias and a model like Eq.4 was appropriate, whilst in more selective systems (UK

and Germany) the more encompassing model of Eq.3 was necessary to take account of

schools’ contexts and estimate student eﬀects accurately.

This is, in fact, what is eﬀectively done by the oft-used ‘Hausman test’ (Hausman

1978). Although often (mis)used as a test of whether FE or RE models “should” be used

(see Fielding 2004), it is really a test of whether there is a contextual eﬀect, or whether

the between and within eﬀects are diﬀerent. This equates in the panel case to whether

the changing within eﬀect (e.g. for an eﬀect of income: the eﬀect of being unusually well

paid, such as after receiving a non-regular bonus or a pay rise) is diﬀerent from the cross-

sectional eﬀect (being well paid on average, over the course of the period of observa-

tion). Even when within and between eﬀects are slightly diﬀerent, it may be that the bias

in the estimated eﬀect is a price worth paying for the gains in eﬃciency, depending on

the research question at hand (Clark and Linzer 2015). Either way, it is important to test

whether the multilevel model in its commonly applied form of Eq.4 is an uninterpretable

blend of two diﬀerent processes.

2.2.2 Fixed eects model

Depending on the ﬁeld, perhaps the most commonly used and recommended method

of dealing with diﬀering within and between eﬀects as outlined above is ‘ﬁxed eﬀects’

(4)

y

it

=𝛽

0

+𝛽

RE

1

x

it

+𝛽

RE

3

z

i

+(𝜐

i

+𝜖

it)

4 Speciﬁcally, the estimate will be weighted as:

𝛽

ML =

w

W

𝛽

W

+w

B

𝛽

B

w

W+

w

B

, where

wW

is precision of the within esti-

mate, that is

w

W=

1

SE

𝛽

W

2

and

wB

is precision of the between estimate,

w

B=

1

SE

𝛽

B

2

. Given

the larger sample size (and therefore higher precision) of the within estimate, the model will often tend

towards the within estimate.

𝛽W

and

𝛽B

are the within and between eﬀects, respectively (estimated as

𝛽1

and

𝛽2B

in Eq.2), although this would depend on the extent of the unexplained level 1 and 2 variation in the

model.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1058

A.Bell et al.

1 3

modelling. This approach is equivalent to that represented in Eqs.2 and 3, except that

uj

are speciﬁed as ﬁxed eﬀects: i.e. dummy variables are included for each higher-level entity

(less a reference category) and the

𝜐i

are not treated as draws from any kind of distribution.

The result is that between eﬀects (associations at the higher level) cannot be estimated, and

the model can be reduced to:

Or reduced even further to:

This is the model that most software packages actually estimate, such that they do not

estimate the magnitudes of the ﬁxed eﬀects themselves. Thus, the model provides an esti-

mate of the within eﬀect

𝛽1

, which is not biased by between eﬀects that are diﬀerent from

them.5 This is of course what is achieved by the REWB model and the Mundlak model: the

REWB model employs precisely the same mean-centring as FE models. However, unlike

the REWB and Mundlak speciﬁcation, the de-meaned FE speciﬁcation reveals almost

nothing about the level-2 entities in the model. This means that many research questions

cannot be answered by FE, and it can only ever present a partial picture of the substantive

phenomenon represented by the model. With panel data, for example, FE models can say

nothing about relationships with independent variables that do not change over time—only

about deviations from the mean over time. FE models therefore “throw away important and

useful information about the relation between the explanatory and the explained variables

in a panel” (Nerlove 2005, p. 20).

If a researcher has no substantive interest in the between eﬀects, their exclusion is per-

haps unimportant, though even in such a case, for reasons discussed below, we think there

are still reasons to disfavour the FE approach as the one and only valid approach. To be

clear the REWB and Mundlak will give exactly the same results for the within eﬀect (coef-

ﬁcient and standard error) as the FE model (see Bell and Jones 2015 for simulations; Goet-

geluk and Vansteelandt 2008 for proof of consistency), but retains the between eﬀect which

can be informative and cannot be obtained from a FE model.

2.2.3 Single level OLS regression

An even simpler option is to ignore the structure of the model entirely:

Thus, we assume that all observations in the dataset are conditionally independent.

This has two problems. First, as with the standard RE model, the estimate of

𝛽OLS

1

will

be a potentially uninterpretable weighted average6 of the within and between eﬀects (if

they are not equal). Furthermore, if there are diﬀerences between level 2 entities (that is,

if there are eﬀects of unmeasured higher-level variables), standard errors will be estimated

(5)

yit

=𝛽1(x

it

−̄x

i

)+

(

𝜐

i

+𝜖

it).

(6)

(

y

it

−̄y

i

)=𝛽

1

(x

it

−̄x

i

)+

(

𝜖

it).

(7)

y

it

=𝛽

0

+𝛽

OLS

1

x

it

+𝛽

OLS

4

z

i

+

(

𝜖

it)

6 This will actually be a diﬀerent weighted average to that produced by RE: it is weighted by the proportion

of the variance in

xit

that exists at each level, so where the within-unit variance of

xit

is negligible, the esti-

mate will be close to that of the between eﬀect, and vice versa. More formally,

𝛽SL

=

(

1−𝜌

x)

𝛽

W

+𝜌

x

𝛽

B

,

where

𝜌x

is the proportion of the variance in

xit

occurring at the higher level.

5 Note though that, in the longitudinal setting, between eﬀects will only be fully controlled if those eﬀects

do not change over time (this is the case with the REWB/Mundlak models as well, unless such heterogene-

ity is explicitly modelled).

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1059

Fixed andrandom eﬀects models: making aninformed choice

1 3

as if all observations are independent, and so will be generally underestimated, especially

for parameters associated with higher-level variables, including between and contex-

tual eﬀects.7 Fortunately, the necessity of modelling the nested structure can readily be

evaluated, by running the model both with and without the higher-level random eﬀects

and testing which is the better ﬁtting model by a likelihood ratio test (Snijders and Bosker

2012:97), AIC, or BIC.

2.3 Omitted variable bias inthewithin‑between REmodel

We hope the discussion above has convinced readers of the superiority of the REWB

model, except perhaps when the within and between eﬀects are approximately equal, in

which case the standard RE model (without separated within and between eﬀects) might

be preferable for reasons of eﬃciency.8 Even then, the REWB model should be considered

ﬁrst, or as an alternative, since the equality of the within and between coeﬃcients should

not be assumed. As for FE, except for simplicity there is nothing that such models oﬀer that

a REWB model does not.

All of the models we consider here are subject to a variety of biases, such as if there is

selection bias (Delgado-Rodríguez and Llorca 2004), or the direction of causality assumed

by the model is wrong (e.g. see Bell, Johnston, and Jones 2015). Most signiﬁcantly for our

present purposes is the possibility of omitted variable bias.

As with ﬁxed eﬀects models, the REWB speciﬁcation prevents any bias on level 1 coef-

ﬁcients due to omitted variables at level 2. To put it another way, there can be no correla-

tion between level 1 variables included in the model and the level 2 random eﬀects—such

biases are absorbed into the between eﬀect, as conﬁrmed by simulation studies (Bell and

Jones 2015; Fairbrother 2014). When using panel data with repeated measures on individu-

als, unchanging and/or unmeasured characteristics of an individual (such as intelligence,

ability, etc.) will be controlled out of the estimate of the within eﬀect. However, unob-

served time-varying characteristics can still cause biases at level 1 in either an FE or a

REWB/Mundlak model. Similarly, in a REWB/Mundlak models, unmeasured level 2 char-

acteristics can cause bias in the estimates of between eﬀects and eﬀects of other level 2

variables.

This is a problem if we wish to know the direct causal eﬀect of a level 2 variable: that

is, what happens to Y when a level 2 variable increases or decreases, such as because of an

intervention (Blakely and Woodward 2000). However, this does not mean that those esti-

mated relationships are worthless. Indeed, often we are not looking for the direct, causal

eﬀect of a level 2 variable, but see these variables as proxies for a range of unmeasured

social processes, which might include those omitted variables themselves. As an example,

in a panel data structure when considering the relationship between ethnicity (an unchang-

ing, level 2 variable) and a dependent variable, we would not interpret any association

found to be the direct causal eﬀect of any particular genes or skin pigmentation; rather we

7 One could add a group mean variable to this equation, as in Eqs.2 or 3. Whilst this would solve the issue

of bias of the point estimates, standard errors would still be underestimated.

8 This is not necessarily the case, however: if there are substantive reasons for suspecting that the processes

driving the two eﬀects are diﬀerent then it makes sense to use SEs that treat the processes as separate.

Moreover, it may be that subsequent elaboration of the model (addition of variables, etc.) would lead to

within and between eﬀects diverging—researchers are best served by being cautious about combining the

two.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1060

A.Bell et al.

1 3

are interested in the eﬀects of the myriad of unmeasured social and cultural factors that

are related to ethnicity. If a direct genetic eﬀect is what we are looking for, then our esti-

mates are likely to be ‘biased’, but we hope most reasonable researchers would not inter-

pret such coeﬃcients in this way. As long as we interpret any coeﬃcient estimates with

these unmeasured variables in mind, and are aware that such reasoning is as much concep-

tual and theoretical as it is empirical, such coeﬃcients can be of great value in helping us

to understand patterns in the world through a model-based approach. Note that if we are,

in fact, interested in a direct causal eﬀect and are concerned by possible omitted variables,

then instrumental variable techniques can sometimes be employed within the RE frame-

work (for example, see Chatelain and Ralf 2018; Steele etal. 2007).

The logic above also applies to estimates of between and contextual eﬀects. These

aggregated variables are proxies of group level characteristics that are to some extent

unmeasured. As such, it is not a problem in our view that, in the case of panel data, future

data is being used to form this variable and predict past values of the dependent varia-

ble—these values are being used to get the best possible estimate of the unchanging group-

level characteristic. If researchers want these variables to be more accurately measured,

they could be precision-weighted, to shrink them back to the mean value for small groups

(Grilli and Rampichini 2011; Shin and Raudenbush 2010).

3 Fixed andrandom eects: conceptualising therandom part

ofthemodel

This section aims to clarify further the statistical and conceptual diﬀerences between RE

(including REWB) and FE modelling frameworks. The obvious diﬀerence between the two

models is in the way that that the level-2 entities are treated: that is

𝜐i

in Eqs.2 and 5.

In a RE model (whether standard, REWB or Mundlak) level-2 random eﬀects are

treated as random draws from a Normal distribution, the variance of which is estimated:

In contrast, a FE model treats level-2 entities as unconnected:

𝜐i

in Eq.5 are dummy

variables for higher-level entity i, each with separately estimated coeﬃcients (less a refer-

ence category, or with the intercept suppressed). Because these dummy variables account

for all the higher-level variance, no other variables measured at the higher level can be

identiﬁed.

In both speciﬁcations, the level-1 variance is typically assumed to follow a Normal

distribution:

To us, this is what the ‘random’ and ‘ﬁxed’ in RE and FE mean. In contrast, others

argue that the deﬁning feature of the RE model is an assumption that that model makes.

Vaisey and Miles (2017:47) for example state:

The only diﬀerence between RE and FE lies in the assumption they make about the

relationship between υ [the unobserved time-constant ﬁxed/random eﬀects] and the

observed predictors: RE models assume that the observed predictors in the model are

not correlated with υ while FE models allow them to be correlated.

Such views are also characteristic of mainstream econometrics:

(8)

𝜐i

∼N

(

0, 𝜎2

𝜐).

(9)

𝜖it

∼N

(

0, 𝜎

2

𝜖)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1061

Fixed andrandom eﬀects models: making aninformed choice

1 3

In modern econometric parlance, ‘‘random eﬀect’’ is synonymous with zero cor-

relation between the observed explanatory variables and the unobserved eﬀect …

the term ‘‘ﬁxed eﬀect’’ does not usually mean that ci [

𝜐i

in our notation] is being

treated as nonrandom; rather, it means that one is allowing for arbitrary correlation

between the unobserved eﬀect ci and the observed explanatory variables xit. So, if

ci is called an ‘‘individual ﬁxed eﬀect’’ or a ‘‘ﬁrm ﬁxed eﬀect,’’ then, for practical

purposes, this terminology means that ci is allowed to be correlated with xit. (Wool-

dridge 2002:252)

No doubt this assumption is important (see Sect.2.3). But regardless of how well estab-

lished this deﬁnition is, it is misleading. This assumption is not the only diﬀerence between

RE and FE models, and is far from being either model’s deﬁning feature.

The diﬀerent distributional assumptions aﬀect the extent to which information is con-

sidered exchangeable between higher-level entities: are they unrelated, or is the value of

one level-2 entity related to the values of the others? In the FE framework, nothing can be

known about each level-2 entity from any or all of the others—they are unrelated and each

exist completely independently. At the other extreme, a single-level model assumes there

are no diﬀerences between the higher-level entities, in a sense knowing one is suﬃcient to

know them all. RE models strike a balance between these two extremes, treating higher-

level entities as distinct but not completely unlike each other. In practice, the random inter-

cepts in RE models will correlate strongly with the ﬁxed eﬀects in a ‘dummy variable’ FE

models, but RE estimates will be drawn in or ‘shrunk’ towards their mean—with unreli-

ably estimated and more extreme values shrunk the most.

Why does it matter that the random eﬀects are drawn from a common distribution? We

have already stated that FE models estimate coeﬃcients on higher-level dummy variables

(the ﬁxed eﬀects), and cannot estimate coeﬃcients on other higher-level variables (between

eﬀects). RE models can yield estimates for coeﬃcients on higher-level variables because

the random eﬀects are parameterised as a distribution instead of dummy variables. More-

over, RE automatically provides an estimate of the level 2 variance, allowing an overall

measure of the extent to which level-2 entities diﬀer in comparison to the level 1 variance.

Further, this variance can be used to produce ‘shrunken’ (or ‘Empirical Bayes’) higher-

level residuals which, unlike FE dummy-variable parameter estimates, take account of the

unreliability of those estimates; for an application, see Ard and Fairbrother (2017). The

degree of “shrinkage” (or exchangeability across level 2 entities) in a RE model is deter-

mined from the data, with more shrinkage if there are few observations and/or the esti-

mated variance of the level-2 entities,

𝜎2

𝜐

, is small (see Jones and Bullen 1994; Spiegelhal-

ter 2004).

If we are interested in whether individuals’ responses are related to their speciﬁc con-

texts (neighbourhoods, schools, countries, etc.) a ﬁxed eﬀects model can help answer this

question if dummy variables for level-2 entities are estimated, but this is done unreliably

with small level-2 entities. A RE model can give us more reliable, appropriately conserva-

tive estimates of this (Bell etal. 2018), as well as telling us whether that context matters

in general, based on the signiﬁcance of the estimated variance of the random eﬀects.9 It

can tell us both diﬀerences in higher-level eﬀects (termed ‘type A’ eﬀects in the education

9 This could also be done on the basis of a Wald test of the joint signiﬁcance of FE dummy variables, but

this is not possible with non-linear outcomes where dummy coeﬃcients are not estimated.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1062

A.Bell et al.

1 3

literature, Raudenbush and Willms 1995) and the eﬀects of variables at the higher level

(‘type B’ eﬀects). FE estimators cannot estimate the latter.

The view of FE and RE being deﬁned by their assumptions has led many to character-

ise the REWB model as a ‘hybrid’ between FE and RE, or even a ‘hybrid FE’ model (e.g.

Schempf etal. 2011). We hope the discussion above will convince readers that this model

is a RE model. Indeed, Paul Allison, who (we believe) introduced the terminology of the

Hybrid model (Allison 2005, 2009) now prefers the terminology of ‘within-between RE’

(Allison 2014).

The label matters, because FE models (and indeed ‘hybrid’ models) are often presented

as a technical solution, following and responding to a Hausman test taken to mean that a

RE model cannot be used.10 As such, researchers rarely consider what problem FE actually

solves, and why the RE parameter estimates were wrong. This bias is often described as

‘endogeneity’, a term that covers a wide and disparate range of diﬀerent model misspeci-

ﬁcations (Bell and Jones 2015:138). In fact, the Hausman test simply investigates whether

the between and within eﬀects are diﬀerent—a possibility that the REWB speciﬁcation

allows for. REWB (a) recognises the possibility of diﬀerences between the within and

between eﬀects of a predictor, and (b) explicitly models those separate within and between

eﬀects. The REWB model is a direct, substantive solution to a mis-speciﬁed RE model in

allowing for the possibility of diﬀerent relations at each level; it models between eﬀects,

which may be causing the problem, and are often themselves substantively interesting.

When treated as a FE model, this substance is often lost.

Further, using the REWB model as if it were a FE model leads researchers to use it

without taking full advantage of the beneﬁts that RE models can oﬀer. The RE framework

allows a wider range of research questions to be investigated: involving time-invariant vari-

ables, shrunken random eﬀects, additional hierarchical (e.g. geographical) levels and, as

we discuss in the next section, random slopes estimates that allow relationships to vary

across individuals, or allow variances at any level to vary with variables. As well as yield-

ing new, substantively interesting results, such actions can alter the average associations

found. Describing the REWB, or Hybrid, model as falling under a FE framework therefore

undersells and misrepresents its value and capabilities.

4 Modelling more complexity: random slopes models andthree‑level

models

4.1 Random slopes models

So far, all models have assumed homogeneity in the within eﬀect associated with

xit

.

This is often a problematic assumption. First, such models hide important and interest-

ing heterogeneity. And second, estimates from models that assume homogeneity incor-

rectly will suﬀer from biased estimates, as we show below. The RE/REWB model as

previously described also suﬀers from this shortcoming, but can more easily avoid it by

explicitly modelling such heterogeneity, with the inclusion of random slopes (Western

10 Many (e.g. Greene 2012:421) even argue that the Mundlak or REWB model can be used as a form of the

Hausman test, which could be itself be used to justify the use of FE, even though the REWB model makes

that choice unnecessary.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1063

Fixed andrandom eﬀects models: making aninformed choice

1 3

1998). These allow the coeﬃcients on lower-level covariates to vary across level-2 enti-

ties. Equation2 then becomes:

Here

𝛽1W

is a weighted average (Raudenbush and Bloom 2015) of the within eﬀects

in each level-2 entity;

𝜐i1

measures the extent to which these within eﬀects vary between

level-2 entities (such that each level-2 entity i has a within eﬀect estimated as

𝛽1W+𝜐i1

).

The two random terms

𝜐i1

and

𝜐i0

are assumed to be draws from a bivariate Normal dis-

tribution, meaning Eq.8 is extended to:

The meaning of individual coeﬃcients can vary depending on how variables are

scaled and centred. However, the covariance term indicates the extent of ‘fanning in’

(with negative

𝜎𝜐01

) or ‘fanning out’ (positive

𝜎𝜐01

) from a covariate value of zero (Bul-

len etal. 1997). In many cases, there is substantive heterogeneity in the size of associa-

tions among level-2 entities. Table2 shows two examples of reanalyses where including

random coeﬃcients makes a real diﬀerence to the results. Both are analyses of coun-

tries, rather than individuals, but the methodological issues are similar. The ﬁrst is a

reanalysis of an inﬂuential study in political science (Milner and Kubota 2005) which

claims that political democracy leads to economic globalisation (measured by countries’

tariﬀ rates). When including random coeﬃcients in the model, not only does the overall

within eﬀect disappear, but a single outlying country, Bangladesh, turns out to be driv-

ing the relationship (Bell and Jones 2015, Appendix). The second example is the now

infamous study in economics by Reinhart and Rogoﬀ (2010), which claimed that higher

levels of public debt cause lower national economic growth (a conclusion that remained

even after the Herndon etal. (2014) corrections). In this case, although the coeﬃcient

does not change with the introduction of random slopes, the standard error triples in

size, and the within eﬀect is no longer statistically signiﬁcant when, in addition, time is

appropriately controlled (Bell etal. 2015).

(10)

yit =𝛽0+𝛽1W(xit −̄xi)+𝛽2B̄xi+𝛽3zi+𝜐i0+𝜐i1(xit −̄xi)+𝜖it

(11)

[

𝜐i0

𝜐

i

1

]

∼N

(

0,

[

𝜎

2

𝜐0

𝜎

𝜐

01 𝜎2

𝜐1])

Table 2 Results from reanalyses of Milner and Kubota (2005) and Reinhart and Rogoﬀ (2010)

Standard errors are in parentheses

For full details of the models used, see the reanalysis papers themselves

NS not signiﬁcant

P values *** < 0.001; ** < 0.01; * < 0.05

Original study/studies Milner and Kubota (2005) Reinhart and Rogoﬀ

(2010) and Herndon

etal. (2014)

Reanalysis Bell and Jones (2015) (appendix) Bell etal. (2015)

Dependent variable Tariﬀ rates Economic growth (ΔGDP)

Independent variable of interest Democracy (polity score) National debt (%GDP)

REWB/FE within estimate (SE) − 0.227 (0.086)** − 0.021 (0.003)***

Random slopes estimate (SE) − 0.143 (0.187) (NS) − 0.021 (0.009)*

Notes Eﬀect further reduced by the removal of a

single outlying country, Bangladesh.

Eﬀect becomes insigniﬁ-

cant when time is appro-

priately controlled.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1064

A.Bell et al.

1 3

In both cases, not only is substantively interesting heterogeneity missed in models

assuming homogenous associations, but also within eﬀects are anticonservative (that is,

SEs are underestimated). Leaving aside the substantive interest that can be gained from

seeing how diﬀerent contexts can lead to diﬀerent relationships, failing to consider how

associations diﬀer across level-2 entities can produce misleading results if such diﬀerences

exist. Although such heterogeneity can be modelled in a FE framework with the addition

of multiple interaction terms, it rarely is in practice, and that heterogeneity does not ben-

eﬁt from shrinkage as in the RE framework. Thus, a FE model can lead an analyst to miss

problematic assumptions of homogeneity that the model is making. A RE model—includ-

ing the REWB model—allows for the modelling of important complexities, such as hetero-

geneity across level-2 entities.

We further demonstrate this using a simulation study. We simulated data sets with:

either 60 groups of 10, or 30 groups of 20; random intercepts distributed Normally, Chi

square, Normally but with a single large outlier, or with unbalanced groups; with only ran-

dom intercepts, or both random intercepts and random slopes; and with y either Normal or

binary (logit). This produced 32 data-generating processes (DGPs) in total. We then ﬁtted

three diﬀerent models to each simulated dataset: FE, random intercept, and random slope.

For the FE models, we calculated both naive and robust SEs.

Figure1 shows the ‘optimism’—the ratio of the true sampling variability to the sam-

pling variability estimated by the standard error (see Shor etal. 2007)—for a single covari-

ate, in a variety of scenarios.11 In the scenarios presented in the top row, the DGP included

only random intercepts, not random slopes; the lower row represents DGPs with both

Fig. 1 Optimism of the standard errors in various models. Note Triangles are for logistic models, circles for

Normal models; blue means 60 groups of ten, red 30 groups of 20. (Color ﬁgure online)

11 See the “Appendix” of the present paper for the full explanation and R code to replicate these simula-

tions in ESM.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1065

Fixed andrandom eﬀects models: making aninformed choice

1 3

random intercepts and random slopes. FE models are in the ﬁrst two columns (with naïve

and robust standard errors), random-intercepts models the third column, and random slopes

models in the right-hand column.

Figure1 shows that where random slopes are not included in the analysis model (all but

the right-most column), but exist in the data in reality (bottom row), the standard errors

are overoptimistic—they are too small relative to the true sampling variability. When there

is variation in the slopes across level-2 entities, there is more uncertainty in the beta esti-

mates, but this is not reﬂected in the standard error estimates unless those random slopes

are explicitly speciﬁed. In the top row, in contrast, all four columns look the same: here

there is no mismatch between the invariant relationships assumed by the analysis models

and present in the data. In the presence of heterogeneity, note that while FE models with

naive SEs are the most anticonservative, neither FE models with “robust” standard errors

nor RE models with only random intercepts are much better.

These results support the strong critique by Barr etal. (2013) that not to include random

slopes is anticonservative. On the other hand, Matuschek etal. (2017) counter that ana-

lytical models should also be parsimonious, and ﬁtting models with many random eﬀects

quickly multiplies the number of parameters to be estimated, particularly since random

slopes are generally given covariances as well as variances. Sometimes the data available

will not be suﬃcient to estimate such a model. Still, it will make sense in much applied

work to test whether a statistically signiﬁcant coeﬃcient remains so when allowed to vary

randomly. We discuss this further in the conclusions.

4.2 Three (and more) levels, andcross‑classications

Datasets often have structures that span more than two levels. A further advantage of the

multilevel/random eﬀects framework over ﬁxed eﬀects is its allowing for complex data

structures of this kind. Fixed eﬀects models are not problematic when additional higher

levels exist (insofar as they can still estimate a within eﬀect), but they are unable to include

a third level (if the levels are hierarchically structured), because the dummy variables at the

second level will automatically use up all degrees of freedom for any levels further up the

hierarchy. Multilevel models allow competing explanations to be considered, speciﬁcally at

which level in a hierarchy matters the most, with a highly parsimonious speciﬁcation (esti-

mating a variance parameter at each level).12

For example, cross-national surveys are increasingly being ﬁelded multiple times in the

same set of countries, yielding survey data that are both comparative and longitudinal. This

presents a three-level hierarchical structure, with observations nested within country-years,

which are in turn nested in countries (Fairbrother 2014).13

12 The capability of analysing at multiple scales net of other scales can be exploited in a model- based

approach to segregation where the variance at a scale conveys the degree of segregation (Jones etal. 2015).

13 See Schmidt-Catran and Fairbrother (2015) for a further extension that includes a cross-classiﬁed year or

wave level.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1066

A.Bell et al.

1 3

4.3 Complex level 1 heterogeneity

A ﬁnal way in which the random part of the model can be expanded is by allowing the

variance at level 1 to be structured by one or more covariates at any level. Thus, Eq.10 is

extended to:

where the level 1 variance has two parts, one independent and the other related to

(xit −̄xi)

.

Equation9 is extended to:

Often this is important to do, because what is apparent higher-level variance14 between

level-2 entities, can in fact be complex variance at level 1. It is only by specifying both,

as in Eq. 12, that we can be sure how variance, and varying variance, can be attributed

between levels (Vallejo etal. 2015).15

5 Generalising theREmodel: binary andcount dependent variables

So far, this paper has considered only models with continuous dependent variables, using

an identity link function. Do the claims of this paper apply to Generalised Linear mod-

els? These include other dependent variables and link functions (Neuhaus and McCull-

och 2006), such as logit and probit models (for binary/proportion dependent variables) and

Poisson models (for count dependent variables). Although this question has not been con-

sidered to a great extent in the social and political sciences, the biostatistics literature does

provide some answers (for an accessible discussion of this, see Allison 2014). Here we

brieﬂy outline some of the issues.

Unlike models using the identity link function, results using the REWB model with

other link functions do not produce results that are identical to FE (or the equivalent con-

ditional likelihood model). In other words, the inclusion of the group mean in the model

does not reliably partition any higher-level processes from the within eﬀect, meaning both

within and between estimates of cluster-speciﬁc eﬀects16 can be biased. This is the case

when the relationship between the between component of X (

̄xi

) and the higher-level resid-

ual (

𝜐i

) is non-linear. How big a problem is this? Brumback etal. (2010:1651) found that,

in running simulations, “it was diﬃcult to ﬁnd an example in which the problem is severe”

(12)

yit =𝜇+𝛽1(xit −̄xi)+𝛽2̄xi+𝛽3zi+𝜐i0+𝜐i1(xit −̄xi)+𝜖it0+𝜖it1(xit −̄xi),

(13)

[

𝜖it0

𝜖

it1]

∼N

(

0,

[

𝜎

2

𝜖0

𝜎

𝜖01

𝜎2

𝜖1])

14 Note the random slopes described in 4.1 can also be conceived as varying variance. Variance could vary

by both level 1 and level 2 variables. The approach used here is standard in the multilevel literature (Gold-

stein 2010), but other approaches are possible (for example modelling the log of the variance as a function

of covariates - e.g. see Hedeker and Mermelstein 2007).

15 Although diﬃcult to implement in some standard software packages (it cannot be implemented in the

mixed package in Stata, or lme4 in R), it can be implemented in MLwiN, which can in turn be accessed

from Stata/R using the packages runmlwin/R2MLwiN (Leckie and Charlton 2013; Zhang etal. 2016).

16 Note: we do not consider the diﬀerences between population average and cluster speciﬁc estimates in

this paper—all models considered in this section of the paper produce the latter. This debate is beyond

the scope of the paper (but see Jones and Subramanian 2013; Subramanian and O’Malley 2010 for more

on this). Both cluster speciﬁc and population average estimates may be needed depending on the research

question; this is not a debate that can or should be technically resolved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1067

Fixed andrandom eﬀects models: making aninformed choice

1 3

(see also Goetgeluk and Vansteelandt 2008). In a later paper, however, Brumback et al.

(2013) did identify one such example, but only with properties unlikely to be found in real

life data (Allison 2014)—

̄xi

and

𝜐i

very highly correlated, and few observations per level-2

entity.

Whether the REWB model should be used, or a conditional likelihood (FE) model

should be used instead, depends on three factors: (1) the link function, (2) the nature of the

research question, and (3) the researcher’s willingness to accept low levels of bias. Regard-

ing (1), many link functions, including negative binomial models, ordered logit models,

and probit models, do not have a conditional likelihood estimator associated with them. If

such models are to be used, the REWB model may be the best method available to produce

within eﬀects that are (relatively) unbiased by omitted higher-level variables. Regarding

(2), conditional likelihood methods have all the disadvantages of FE mentioned above; they

are unable to provide level-2 eﬀects, random slopes cannot be ﬁtted, and so on, meaning

there is a risk of producing misleading and anti-conservative results. These will often be

important to the research question at hand, to provide a realistic level of complexity to the

modelling of the scenarios at hand. The level of bias is easily ascertained by comparing

the estimate of the REWB model to that of the conditional likelihood model (where avail-

able). If the results are deemed similar enough, the researcher can be relatively sure that the

results produced by the REWB model are likely to be reasonable.

6 Assumptions ofrandom eects models: howmuch dothey matter?

A key assumption of RE models is that the random eﬀects representing the level-2 enti-

ties are drawn from a Normal distribution. However, “the Normality of [the random coeﬃ-

cients] is clearly an assumption driven more by mathematical convenience than by empiri-

cal reality” (Beck and Katz 2007:90). Indeed, it is often an unrealistic assumption, and it is

important to know the extent to which diﬀerent estimates are biased when that assumption

is broken.

The evidence from prior simulations studies is somewhat mixed, and depends on what

speciﬁcally in the RE model is of interest. For linear models with a continuous response

variable, and on the positive side, Beck and Katz (2007) ﬁnd that both average parameter

estimates and random eﬀects are well estimated, both when the random eﬀects are assumed

to be Normally distributed but in fact have a Chi square distribution, or there are a number

of outliers in the dataset.17 Others concur that beta estimates are generally unbiased by

non-Normal random eﬀects, as are estimates of the random eﬀects variances (Maas and

Hox 2004; McCulloch and Neuhaus 2011a). Random eﬀects are only biased to a signiﬁcant

degree in extreme scenarios (McCulloch and Neuhaus 2011b), and even then (for example

for random eﬀects with a Chi square(1) distribution), the ranked order of estimated ran-

dom eﬀects remains highly correlated (Correlation > 0.8) to the rankings of the true ran-

dom eﬀects (Arpino and Varriale 2010), meaning substantive interpretation is likely to be

aﬀected only minimally. This is the case whether or not the DGP includes random slopes.

In other words, a badly speciﬁed random distribution may result in some biases, but these

are usually small enough not to worry the applied researcher. If there is a concern about

17 In the latter case, outlying random eﬀects can easily be identiﬁed and ‘dummied out’, allowing the distri-

bution of the rest of the random eﬀects to be estimated.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1068

A.Bell et al.

1 3

bias, it may be wise to check the ﬁndings are robust to other speciﬁcations, and potentially

use models that allow for non-Normal random eﬀects, such as Non-Parametric Maximum

Likelihood techniques (Aitkin 1999; Fotouhi 2003).

With non-linear models, the evidence is somewhat less positive. Where the Normality

assumption of the higher-level variance is violated, there can be signiﬁcant biases, particu-

larly when the true level 2 variance is large (as is often the case with panel data, but not in

cross-sectional data (Heagerty and Kurland 2001). For a review of these simulation stud-

ies, see Grilli and Rampichini (2015).

Our simulations, for the most part, back up these ﬁndings and this is illustrated in

Fig. 2, which presents the consequences for various parameters if the random intercepts

have a Chi square(2) distribution, or have a single substantial outlier, and if the groups

are unbalanced. First, beta estimates are unbiased (upper-left panel), as are their standard

errors (upper-right), regardless of the true distribution of the random eﬀects and the type

of model. Non-Normality does however have consequences for the estimate of the level-2

Fig. 2 Biases and RMSE under various (mis-)speciﬁcations. Note Triangles are for logistic models, circles

for Normal models; blue means 60 groups of ten—red 30 groups of 20. Clockwise from the upper-left, the

parameters are beta (bias), optimism of the standard errors (bias), random intercepts (RMSE), and level-2

variance (bias). (Color ﬁgure online)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1069

Fixed andrandom eﬀects models: making aninformed choice

1 3

variance (lower-left panel). When the true distribution is skewed (in a Chi square(2) dis-

tribution), for logistic models there is notable downward bias in the estimate of the level-

two variance, and a slight increase in the error associated with the random eﬀects them-

selves (lower-right). We found no evidence of any similar bias in models with a continuous

response. In contrast, when the non-Normality of the random eﬀects is due to an outlying

level-2 entity, there is an impact on the estimated variance for models with a continuous

response, and the estimated random intercepts for both logistic and Normal models. How-

ever, as noted above, the latter does not need to be problematic, because outliers can be

easily identiﬁed and ‘dummied out’, eﬀectively removing that speciﬁc random eﬀect from

the estimated distribution. Note that the high RMSE associated with unbalanced datasets

(lower-right) is related to the smaller sample size in some level 2 groups, rather than being

evidence of any bias.

In sum, even substantial violations of the Normality assumption of the higher-level ran-

dom eﬀects do not have much impact on estimates in the ﬁxed part of the model, nor the

standard errors. Such violations can however aﬀect the random eﬀects estimates, particu-

larly in models with a non-continuous response.

7 Conclusion: what should researchers do?

We hope that this article has presented a clear picture of the key properties, capabilities,

and limitations of FE and RE models, including REWB models. We have considered what

each of these models are, what they do, what they assume, and how much those assump-

tions matter in diﬀerent real-life scenarios.

There are a number of practical points that researchers should take away from this

paper. First and perhaps most obviously is that the REWB model is a more general and

encompassing option than either FE or conventional RE, which do not distinguish between

within and between eﬀects. Even when using non-identity link functions, or when the Nor-

mality assumption of the random eﬀects is violated, the small biases that can arise in such

models will often be a price worth paying for the added ﬂexibility that the REWB model

provides. This is especially the case since FE is unable to provide any estimates at all of the

parameters that are most biased by violations of Normality (speciﬁcally random eﬀects and

variance estimates). The only reason to choose FE is if (1) higher-level variables are of no

interest whatsoever, (2) there are no random slopes in the true DGP, or (3) there are so few

level-2 entities that random slopes are unlikely to be estimable. Regarding (1) we would

argue this is rarely the case in social science, where a full understanding of the world and

how it operates is often the end goal. Regarding (2), testing this requires ﬁtting a RE model

in any case, so the beneﬁts of reverting to FE are moot. Regarding (3), the REWB model

will still be robust for ﬁxed-part parameter estimates (although maximum likelihood esti-

mation may be biased—McNeish 2017; Stegmueller 2013), though it’s eﬃcacy relative to

FE would be very limited, since higher level parameters would be estimated with a lot of

uncertainty.

Second, the question of whether to include random slopes is important and requires

careful consideration. On the one hand, in a world of limited computing power and limited

data, it is often not feasible to allow the eﬀects of all variables to vary between level-2

entities. On the other hand, we have shown that results can change in substantive ways

when slopes are allowed to vary randomly. We would argue that, at the least, where there

is a single substantive predictor variable of interest, it would make sense to check that the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1070

A.Bell et al.

1 3

conclusions hold when the eﬀect of that variable is allowed to vary across clusters. One

option in this regard is to use robust standard errors, not as a correction per se, but as

a diagnostic procedure—a ‘canary down the mine’—following King and Roberts (2015).

Any diﬀerence between conventional and robust standard errors suggests there is some

kind of misspeciﬁcation in the model, and that misspeciﬁcation might well include the

failure to model random slopes. The two leftmost panels in the lower row in Fig.1 show

precisely how robust standard errors will diﬀer when a model is mis-speciﬁed in omitting

relevant random eﬀects.

Third, and in contrast to much of the applied literature, we argue that researchers should

not use a Hausman test to decide between ﬁxed and random eﬀects models. Rather, they

can use this test, or models equivalent to it, to verify the equivalence of the within and

between relationships. A lack of equality should be in itself of interest and worthy of fur-

ther investigation through the REWB model.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-

tional License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution,

and reproduction in any medium, provided you give appropriate credit to the original author(s) and the

source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix: The simulations

We generated datasets according to the formula

or in other words with random intercepts only, and also according to

In this latter case, the data-generated process (DGP) included both random intercepts

and random slopes, and these random eﬀects were distributed according to

That is, the random eﬀects were in all cases uncorrelated. We also generated binary data

based on similar models (both random intercept-only and random intercept, random slope

models), using a logit link. In all cases,

𝜎2

𝜐0

and

𝜎2

𝜐1

were set to 4, and (for the Normally dis-

tributed data) the variance of

𝜖it

to 1. The overall intercept

𝛽0

and the overall slope

𝛽1

were

also set to 1. The

xit

data were drawn from a Normal distribution with a mean of 0 and a

variance of 0.25^2.

We ﬁtted models to simulated data sets with either 60 groups of 10 or 30 groups of

20, yielding a total N of 600 either way.18 The 30 × 20 condition reﬂected that time-series

cross-sectional datasets often possess roughly those N’s at each level, and that many cross-

national survey datasets include about 30 countries. The 60x10 condition allowed for a

useful contrast testing the implications of varying the N at either level. We did not conduct

simulations with groups larger than 20 because of the high time costs of doing so, and

yit =𝛽0+𝛽1xit +𝜐i0+𝜖it,

yit =𝛽0+𝛽1xit +𝜐i0+𝜐i1xit +𝜖it.

[

𝜐i0

𝜐

i1]

∼N

(

0,

[

𝜎

2

𝜐0

0𝜎2

𝜐1]).

18 The N’s at each level are not typical of published studies using multilevel models. But most studies use

large N’s that would have made the simulation studies much more time-consuming to run, with no beneﬁt

in terms of insights.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1071

Fixed andrandom eﬀects models: making aninformed choice

1 3

because previous simulation studies have not revealed anything particularly notable about

studies conducted with large rather than small groups (Bryan and Jenkins 2016; Schmidt-

Catran and Fairbrother 2015).

In some cases, instead of drawing the

𝜐i0

’s from a Normal distribution, we drew them

from a Chi squared distribution, or from a Normal distribution but with a single large out-

lier. Where they were drawn from a Chi squared distribution, the distribution’s degrees of

freedom was set at 2, and we also subtracted 2 from each randomly drawn value, yielding

a ﬁnal population mean of 0 and variance of 4—the same as in scenarios where the

𝜐i0

’s

were drawn from a Normal distribution. For the scenarios with the outlier, we tripled the

value of the element of

𝜐i0

with the largest absolute value.

As a fourth possibility, we made the simulated dataset unbalanced, by resampling with

replacement a dataset of the same total size from the values of the original, with equal

probability of selection. This yielded groups of randomly varying sizes.

In sum, under each of these four conditions (Normal, Chi squared, outlier, unbalanced),

we simulated datasets using only random intercepts or both random intercepts and random

slopes, with y either Normal or binary, and with one combination of N’s or the other—

yielding 32 distinct DGPs (4 × 2 × 2 × 2). We conducted 1000 simulations with each DGP.

We then ﬁtted three diﬀerent models to each simulated dataset: a ﬁxed eﬀects model

(with naïve and clustered standard errors), a random intercepts-only model, and a random

intercepts-random slopes model.

We conducted the simulations in R. For ﬁtting multilevel models we used the package

lme4 (Bates etal. 2015). For deriving clustered standard errors from the ﬁxed eﬀects mod-

els, we used the plm package (Croissant and Millo 2008). We caught false or questionable

convergences and simply removed them, simulating a new dataset instead (this should not

bias the results, although it should be noted as an advantage of FE is that it is unlikely to

show convergence problems due to being estimated by OLS). We tried multiple runs of

simulations, and found stable results beyond about 200 simulations per DGP.

References

Aitkin, M.: A general maximum likelihood analysis of variance components in generalized linear models.

Biometrics 55(1), 117–128 (1999)

Allison, P.D.: Using panel data to estimate the eﬀects of events. Sociol. Methods Res. 23(2), 174–199

(1994)

Allison, P.D.: Fixed Eﬀects Regression Methods for Longitudinal Data using SAS. SAS Press, Cary, NC

(2005)

Allison, P.D.: Fixed Eﬀects Regression Models. Sage, London (2009)

Allison, P.D.: Problems with the hybrid method. Stat. Horiz. http://www.stati stica lhori zons.com/probl ems-

with-the-hybri d-metho d (2014). Accessed 16 July 2015

Ard, K., Fairbrother, M.: Pollution prophylaxis? social capital and environmental inequality*. Soc. Sci. Q.

98(2), 584–607 (2017)

Arpino, B., Varriale, R.: Assessing the quality of institutions’ rankings obtained through multilevel linear

regression models. J. Appl. Econ. Sci. 5(1), 7–22 (2010)

Barr, D.J., Levy, R., Scheepers, C., Tily, H.J.: Random eﬀects structure for conﬁrmatory hypothesis testing:

keep it maximal. J. Mem. Lang. 68(3), 255–278 (2013)

Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-eﬀects models using lme4. J. Stat.

Softw. 67(1), 1–48 (2015)

Beck, N., Katz, J.N.: What to do (and not to do) with time-series cross-section data. Am. Polit. Sci. Rev.

89(3), 634–647 (1995)

Beck, N., Katz, J.N.: Random coeﬃcient models for time-series-cross-section data: Monte Carlo experi-

ments. Polit. Anal. 15(2), 182–195 (2007)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1072

A.Bell et al.

1 3

Bell, A., Jones, K.: Explaining ﬁxed eﬀects: random eﬀects modelling of time-series cross-sectional and

panel data. Polit. Sci. Res. Methods 3(1), 133–153 (2015)

Bell, A., Johnston, R., Jones, K.: Stylised fact or situated messiness? The diverse eﬀects of increasing debt

on national economic growth. J. Econ. Geogr. 15(2), 449–472 (2015)

Bell, A., Jones, K., Fairbrother, M.: Understanding and misunderstanding group mean centering: a commen-

tary on Kelley etal’.s dangerous practice. Qual. Quant. 52(5), 2031–2036 (2018)

Bell, A., Holman, D., Jones, K.: Using shrinkage in multilevel models to understand intersectionality: a

simulation study and a guide for best practice (2018) (in review)

Blakely, T.A., Woodward, A.J.: Ecological eﬀects in multi-level studies. J. Epidemiol. Community Health

54(5), 367–374 (2000)

Brumback, B.A., Dailey, A.B., Brumback, L.C., Livingston, M.D., He, Z.: Adjusting for confounding by

cluster using generalized linear mixed models. Stat. Probab. Lett. 80(21–22), 1650–1654 (2010)

Brumback, B.A., Zheng, H.W., Dailey, A.B.: Adjusting for confounding by neighborhood using generalized

linear mixed models and complex survey data. Stat. Med. 32(8), 1313–1324 (2013)

Bryan, M.L., Jenkins, S.P.: Multilevel modelling of country eﬀects: a cautionary tale. Eur. Sociol. Rev.

32(1), 3–22 (2016)

Bullen, N., Jones, K., Duncan, C.: Modelling complexity: analysing between-individual and between-place

variation—a multilevel tutorial. Environ. Plann. A 29(4), 585–609 (1997)

Chatelain, J.-B., Ralf, K.: Inference on time-invariant variables using panel data: a pre-test estimator with an

application to the returns to schooling. PSE Working Paper. https ://ideas .repec .org/p/hal/wpape r/halsh

s-01719 835.html (2018). Accessed 24 Apr 2018

Christmann, P.: Economic performance, quality of democracy and satisfaction with democracy. Electoral.

Stud. 53, 79–89 (2018). https ://doi.org/10.1016/J.ELECT STUD.2018.04.004

Clark, T.S., Linzer, D.A.: Should I use ﬁxed or random eﬀects? Polit. Sci. Res. Methods 3(2), 399–408

(2015)

Croissant, Y., Millo, G.: Panel data econometrics in R: the plm package. J. Stat. Softw. 27(2), 1–43 (2008)

Deeming, C., Jones, K.: Investigating the macro determinants of self-rated health and well-being using the

European social survey: methodological innovations across countries and time. Int. J. Sociol. 45(4),

256–285 (2015)

Delgado-Rodríguez, M., Llorca, J.: Bias. J. Epidemiol. Community Health 58(8), 635–641 (2004)

Duncan, C., Jones, K., Moon, G.: Health-related behaviour in context: a multilevel modelling approach.

Soc. Sci. Med. 42(6), 817–830 (1996)

Fairbrother, M.: Two multilevel modeling techniques for analyzing comparative longitudinal survey data-

sets. Polit. Sci. Res. Methods 2(1), 119–140 (2014)

Fairbrother, M.: Trust and public support for environmental protection in diverse national contexts. Sociol.

Sci. 3, 359–382 (2016). https ://doi.org/10.15195 /v3.a17

Fielding, A.: The role of the Hausman test and whether higher level eﬀects should be treated as random or

ﬁxed. Multilevel Model. Newsl. 16(2), 3–9 (2004)

Fotouhi, A.R.: Comparisons of estimation procedures for nonlinear multilevel models. J. Stat. Softw. 8(9),

1–39 (2003)

Gelman, A.: Red State, Blue State, Rich State, Poor State : Why Americans Vote the Way They Do. Prince-

ton University Press, Princeton (2008)

Gelman, A.: Why I don’t use the term “ﬁxed and random eﬀects”. Stat. Model. Causal Inference Soc. Sci.

http://andre wgelm an.com/2005/01/25/why_i_dont_use/ (2005). Accessed 19 Nov 2015

Goetgeluk, S., Vansteelandt, S.: Conditional generalized estimating equations for the analysis of clustered

and longitudinal data. Biometrics 64(3), 772–780 (2008)

Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, Chichester (2010)

Greene, W.H.: Econometric Analysis, 7th edn. Pearson, Harlow (2012)

Grilli, L., Rampichini, C.: The role of sample cluster means in multilevel models: a view on endogeneity

and measurement error issues. Methodology 7(4), 121–133 (2011)

Grilli, L., Rampichini, C.: Speciﬁcation of random eﬀects in multilevel models: a review. Qual. Quant.

49(3), 967–976 (2015)

Halaby, C.N.: Panel models in sociological research: theory into practice. Ann. Rev. Sociol. 30(1), 507–544

(2004)

Hanchane, S., Mostafa, T.: Solving endogeneity problems in multilevel estimation: an example using educa-

tion production functions. J. Appl. Stat. 39(5), 1101–1114 (2012)

Hausman, J.A.: Speciﬁcation tests in econometrics. Econometrica 46(6), 1251–1271 (1978)

Heagerty, P.J., Kurland, B.F.: Misspeciﬁed maximum likelihood estimates and generalised linear mixed

models. Biometrika 88(4), 973–985 (2001)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1073

Fixed andrandom eﬀects models: making aninformed choice

1 3

Hedeker, D., Mermelstein, R.J.: Mixed-eﬀects regression models with heterogeneous variance: Analyzing

ecological momentary assessment (EMA) data of smoking. In: Little, T.D., Bovaird, J.A., Card, N.A.

(eds.) Modeling Contextual Eﬀects in Longitudinal Studies. Erlbaum, Mahwah, NJ (2007)

Herndon, T., Ash, M., Pollin, R.: Does high public debt consistently stiﬂe economic growth? A critique

of Reinhart and Rogoﬀ. Camb. J. Econ. 38(2), 257–279 (2014)

Howard, A.L.: Leveraging time-varying covariates to test within- and between-person eﬀects and inter-

actions in the multilevel linear model. Emerg. Adulthood 3(6), 400–412 (2015)

Jones, K., Bullen, N.: Contextual models of urban house prices—a comparison of ﬁxed-coeﬃcient and

random-coeﬃcient models developed by expansion. Econ. Geogr. 70(3), 252–272 (1994)

Jones, K., Subramanian, S.V.: Developing Multilevel Models for Analysing Contextuality, Heterogeneity

and Change, vol. 2. University of Bristol, Bristol (2013)

Jones, K., Johnston, R., Manley, D., Owen, D., Charlton, C.: Ethnic residential segregation: a multilevel,

multigroup, multiscale approach exempliﬁed by London in 2011. Demography 52(6), 1995–2019

(2015)

King, G., Roberts, M.: How robust standard errors expose methodological problems they do not ﬁx.

Polit. Anal. 23(2), 159–179 (2015)

Kloosterman, R., Notten, N., Tolsma, J., Kraaykamp, G.: The eﬀects of parental reading socialization

and early school involvement on children’s academic performance: a panel study of primary school

pupils in the Netherlands. Eur. Sociol. Rev. 27(3), 291–306 (2010)

Lauen, D.L., Gaddis, S.M.: Exposure to classroom poverty and test score achievement: contextual eﬀects

or selection? Am. J. Sociol. 118(4), 943–979 (2013)

Leckie, G., Charlton, C.: runmlwin: a program to run the MLwiN multilevel modelling software from

within Stata. J. Stat. Softw. 52(11), 1–40 (2013). https ://doi.org/10.18637 /jss.v052.i11

Maas, C.J.M., Hox, J.J.: Robustness issues in multilevel regression analysis. Stat. Neerl. 58(2), 127–137

(2004)

Maimon, D., Kuhl, D.C.: Social control and youth suicidality: situating durkheim’s ideas in a multilevel

framework. Am. Sociol. Rev. 73(6), 921–943 (2008)

Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., Bates, D.M.: Balancing type I error and power in

linear mixed models. J. Mem. Lang. 94, 305–315 (2017)

McCulloch, C.E., Neuhaus, J.M.: Misspecifying the shape of a random eﬀects distribution: why getting

it wrong may not matter. Stat. Sci. 26(3), 388–402 (2011a)

McCulloch, C.E., Neuhaus, J.M.: Prediction of random eﬀects in linear and generalized linear models

under model misspeciﬁcation. Biometrics 67(1), 270–279 (2011b)

McNeish, D.: Small sample methods for multilevel modeling: a colloquial elucidation of REML and the

Kenward–Roger correction. Multivar. Behav. Res. 52(5), 661–670 (2017)

Milner, H.V., Kubota, K.: Why the move to free trade? Democracy and trade policy in the developing

countries. Int. Org. 59(1), 107–143 (2005)

Mundlak, Y.: Pooling of time-series and cross-section data. Econometrica 46(1), 69–85 (1978)

Nerlove, M.: Essays in Panel Data Econometrics. Cambridge University Press, Cambridge (2005)

Neuhaus, J.M., McCulloch, C.E.: Separating between- and within-cluster covariate eﬀects by using con-

ditional and partitioning methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 68, 859–872 (2006)

Rasbash, J.: Module 4: multilevel structures and classiﬁcations. LEMMA VLE. http://www.brist ol.ac.

uk/media -libra ry/sites /cmm/migra ted/docum ents/4-conce pts-sampl e.pdf (2008). Accessed 19 Nov

2015

Raudenbush, S.W., Bloom, H.S.: Learning about and from a distribution of program impacts using mul-

tisite trials. Am. J. Eval. 36(4), 475–499 (2015)

Raudenbush, S.W., Bryk, A.: Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd

edn. Sage, London (2002)

Raudenbush, S.W., Willms, J.: The estimation of school eﬀects. J. Educ. Behav. Stat. 20(4), 307–335 (1995)

Reinhart, C.M., Rogoﬀ, K.S.: Growth in a time of Debt. Am. Econ. Rev. 100(2), 573–578 (2010)

Ruiter, S., van Tubergen, F.: Religious attendance in cross-national perspective: a multilevel analysis of

60 countries. Am. J. Sociol. 115(3), 863–895 (2009)

Sampson, R.J., Raudenbush, S.W., Earls, F.: Neighborhoods and violent crime: a multilevel study of col-

lective eﬃcacy. Science 277(5328), 918–924 (1997)

Schempf, A.H., Kaufman, J.S., Messer, L., Mendola, P.: The neighborhood contribution to black-white

perinatal disparities: an example from two north Carolina counties, 1999–2001. Am. J. Epidemiol.

174(6), 744–752 (2011)

Schmidt-Catran, A.W.: Economic inequality and public demand for redistribution: combining cross-sec-

tional and longitudinal evidence. Socio Econ. Rev. 14(1), 119–140 (2016)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1074

A.Bell et al.

1 3

Schmidt-Catran, A.W., Fairbrother, M.: The random eﬀects in multilevel models: getting them wrong

and getting them right. Eur. Sociol. Rev. 32(1), 23–38 (2015)

Schmidt-Catran, A.W., Spies, D.C.: Immigration and welfare support in germany. Am. Sociol. Rev.

(2016). https ://doi.org/10.1177/00031 22416 63314 0

Schurer, S., Yong, J.: Personality, well-being and the marginal utility of income: what can we learn from

random coeﬃcient models? Working Paper. https ://ideas .repec .org/p/yor/hectd g/12-01.html (2012).

Accessed 28 Apr 2018

Shin, Y., Raudenbush, S.W.: A latent cluster-mean approach to the contextual eﬀects model with missing

data. J. Educ. Behav. Stat. 35(1), 26–53 (2010)

Shor, B., Bafumi, J., Keele, L., Park, D.: A Bayesian multilevel modeling approach to time-series cross-

sectional data. Polit. Anal. 15(2), 165–181 (2007)

Snijders, T.A.B., Bosker, R.J.: Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Mod-

elling, 2nd edn. Sage, London (2012)

Spiegelhalter, D.J.: Incorporating Bayesian ideas into health-care evaluation. Stat. Sci. 19(1), 156–174

(2004)

Steele, F., Vignoles, A., Jenkins, A.: The eﬀect of school resources on pupil attainment: a multilevel simul-

taneous equation modelling approach. J. R. Stat. Soc. Ser. A Stat. Soc. 170, 801–824 (2007)

Stegmueller, D.: How many countries do you need for multilevel modeling? A comparison of frequentist

and Bayesian approaches. Am. J. Polit. Sci. 57(3), 748–761 (2013)

Subramanian, S.V., O’Malley, A.J.: Modeling neighborhood eﬀects the futility of comparing mixed and

marginal approaches. Epidemiology 21(4), 475–478 (2010)

Subramanian, S.V., Jones, K., Kaddour, A., Krieger, N.: Revisiting Robinson: the perils of individualistic

and ecologic fallacy. Int. J. Epidemiol. 38(2), 342–360 (2009)

Vaisey, S., Miles, A.: What you can—and can’t—do with three-wave panel data. Sociol. Methods Res.

46(1), 44–67 (2017)

Vallejo, G., Fernández, P., Cuesta, M., Livacic-Rojas, P.E.: Eﬀects of modeling the heterogeneity on infer-

ences drawn from multilevel designs. Multivar. Behav. Res. 50(1), 75–90 (2015)

Western, B.: Causal heterogeneity in comparative research: a bayesian hierarchical modelling approach.

Am. J. Polit. Sci. 42(4), 1233–1259 (1998)

Wooldridge, J.M.: Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA

(2002)

Zhang, Z., Parker, R.M.A., Charlton, C.M.J., Leckie, G., Browne, W.J.: R2MLwiN: a package to run

MLwiN from within R. J. Stat. Softw. 72(10), 1–43 (2016)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

1.

2.

3.

4.

5.

6.

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center

GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers

and authorised users (“Users”), for small-scale personal, non-commercial use provided that all

copyright, trade and service marks and other proprietary notices are maintained. By accessing,

sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of

use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and

students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and

conditions, a relevant site licence or a personal subscription. These Terms will prevail over any

conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to

the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of

the Creative Commons license used will apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may

also use these personal data internally within ResearchGate and Springer Nature and as agreed share

it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise

disclose your personal data outside the ResearchGate or the Springer Nature group of companies

unless we have your permission as detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial

use, it is important to note that Users may not:

use such content for the purpose of providing other users with access on a regular or large scale

basis or as a means to circumvent access control;

use such content where to do so would be considered a criminal or statutory offence in any

jurisdiction, or gives rise to civil liability, or is otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association

unless explicitly agreed to by Springer Nature in writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a

systematic database of Springer Nature journal content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a

product or service that creates revenue, royalties, rent or income from our content or its inclusion as

part of a paid for service or for other commercial gain. Springer Nature journal content cannot be

used for inter-library loans and librarians may not upload Springer Nature journal content on a large

scale into their, or any other, institutional repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not

obligated to publish any information or content on this website and may remove it or features or

functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke

this licence to you at any time and remove access to any copies of the Springer Nature journal content

which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or

guarantees to Users, either express or implied with respect to the Springer nature journal content and

all parties disclaim and waive any implied warranties or warranties imposed by law, including

merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published

by Springer Nature that may be licensed from third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a

regular basis or in any other manner not expressly permitted by these Terms, please contact Springer

Nature at

onlineservice@springernature.com

Content uploaded by Andrew Bell

Author content

All content in this area was uploaded by Andrew Bell on Aug 07, 2018

Content may be subject to copyright.

Content uploaded by Andrew Bell

Author content

All content in this area was uploaded by Andrew Bell on Jul 20, 2017

Content may be subject to copyright.