Page 1

Missing Not at Random Models for Latent Growth Curve Analyses

Craig K. Enders

Arizona State University

The past decade has seen a noticeable shift in missing data handling techniques that assume a missing at

random (MAR) mechanism, where the propensity for missing data on an outcome is related to other analysis

variables. Although MAR is often reasonable, there are situations where this assumption is unlikely to hold,

leading to biased parameter estimates. One such example is a longitudinal study of substance use where

participants with the highest frequency of use also have the highest likelihood of attrition, even after

controlling for other correlates of missingness. There is a large body of literature on missing not at random

(MNAR) analysis models for longitudinal data, particularly in the field of biostatistics. Because these methods

allow for a relationship between the outcome variable and the propensity for missing data, they require a

weaker assumption about the missing data mechanism. This article describes 2 classic MNAR modeling

approaches for longitudinal data: the selection model and the pattern mixture model. To date, these models

have been slow to migrate to the social sciences, in part because they required complicated custom computer

programs. These models are now quite easy to estimate in popular structural equation modeling programs,

particularly Mplus. The purpose of this article is to describe these MNAR modeling frameworks and to

illustrate their application on a real data set. Despite their potential advantages, MNAR-based analyses are not

without problems and also rely on untestable assumptions. This article offers practical advice for implement-

ing and choosing among different longitudinal models.

Keywords: missing data, pattern mixture model, selection model, attrition, missing not at random

Supplemental materials: http://dx.doi.org/10.1037/a0022640.supp

Missing data handling techniques have received considerable

attention in the methodological literature during the past 40 years.

This literature has largely discredited most of the simple proce-

dures that have enjoyed widespread use for decades, including

methods that discard incomplete cases (e.g., listwise deletion,

pairwise deletion) and approaches that impute the data with a

single set of replacement values (e.g., mean imputation, regression

imputation, last observation carried forward). The past decade has

seen a noticeable shift to analytic techniques that assume a missing

at random (MAR) mechanism, whereby an individual’s propensity

for missing data on a variable Y is potentially related to other

variables in the analysis (or in the imputation model) but not to the

unobserved values of Y itself (Little & Rubin, 2002; Rubin, 1976).

Maximum likelihood estimation and multiple imputation are ar-

guably the predominant MAR-based approaches, although inverse

probability weighting methods have gained traction in the statistics

literature (e.g., Carpenter, Kenward, & Vansteelandt, 2006; Robins

& Rotnitzky, 1995; Scharfstein, Rotnitzky, & Robins, 1999). A

number of resources are available to readers who are interested in

additional details on these methods (e.g., Carpenter et al., 2006;

Enders, 2010; Little & Rubin, 2002; Rotnitzky, 2009; Schafer,

1997; Schafer & Graham, 2002).

Although the MAR mechanism is often reasonable, there are situ-

ations where this assumption is unlikely to hold. For example, in a

longitudinal study of substance use, it is reasonable to expect partic-

ipants with the highest frequency of use to have the highest likelihood

of attrition, even after controlling for other correlates of missingness.

Similarly, in a study that examines quality of life changes throughout

the course of a clinical trial for a new cancer medication, it is likely

that patients with rapidly decreasing quality of life scores are more

likely to leave the study because they die or become too ill to

participate. The previous scenarios are characterized by a relationship

between the outcome variable (i.e., substance use, quality of life) and

the propensity for missing data. This so-called missing not at random

(MNAR) mechanism is problematic because MAR-based analyses

are likely to produce biased parameter estimates. Unfortunately, there

is no empirical test of the MAR mechanism, so it is generally

impossible to fully rule out MNAR missingness. This underscores the

need for MNAR analysis methods.

ThereisaratherlargebodyofliteratureonMNARanalysismodels

for longitudinal data, particularly in the field of biostatistics (e.g.,

Albert & Follmann, 2000, 2009; Diggle & Kenward, 1994; Follmann

& Wu, 1995; Little, 1995, 2009; Molenberghs & Kenward, 2007;

Verbeke, Molenberghs, & Kenward, 2000; Wu & Bailey, 1989; Wu

& Carroll, 1988). This literature addresses a wide variety of substan-

tive applications and includes models for categorical outcomes, count

data, and continuous variables, to name a few. Although researchers

are sometimes quick to discount MAR-based analyses, MNAR mod-

els are not without their own problems. In particular, MNAR analyses

rely heavily on untestable assumptions (e.g., normally distributed

latent variables), and even relatively minor violations of these as-

sumptions can introduce substantial bias. This fact has led some

methodologists to caution against the routine use of these models

(Demirtas & Schafer, 2003; Schafer, 2003). A common viewpoint is

that MNAR models are most appropriate for exploring the sensitivity

Correspondence concerning this article should be addressed to Craig K.

Enders, Box 871104, Department of Psychology, Arizona State University,

Tempe, AZ 85287–1104. E-mail: craig.enders@asu.edu

Psychological Methods

2011, Vol. 16, No. 1, 1–16

© 2011 American Psychological Association

1082-989X/11/$12.00DOI: 10.1037/a0022640

1

Page 2

of one’s results to a variety of different assumptions and conditions.

Despite their potential problems, MNAR models are important op-

tions to consider, particularly when outcome-related attrition seems

plausible. At the very least, these procedures can augment the results

from an MAR-based analysis.

Although MNAR analysis models have been in the literature for

many years, they have been slow to migrate to the social and the

behavioral sciences. To date, most substantive applications have

appeared in the medical literature (e.g., Hogan, Roy, & Korkont-

zelou, 2004; Kenward, 1998; Michiels, Molenberghs, Bijnens,

Vangeneugden, & Thijs, 2002). The adoption of any novel statis-

tical procedure is partially a function of awareness but is also

driven by software availability. MNAR analyses were traditionally

difficult to implement because they required complicated custom

programming. These models are now quite easy to estimate in

popular structural equation modeling programs, particularly Mplus

(L. K. Muthe ´n & Muthe ´n, 1998–2010). Consequently, the purpose

of this article is to describe two classic MNAR modeling families

for longitudinal data—selection models and pattern mixture mod-

els—and illustrate their use on a real data set. Methodologists

continue to develop MNAR analysis methods, most of which

extend the models that I describe in this article (e.g., Beunckens,

Molenberghs, Verbeke, & Mallinckrodt, 2008; Dantan, Proust-

Lima, Letenneur, & Jacqmin-Gadda, 2008; Lin, McCulloch, &

Rosenheck, 2004; B. Muthe ´n, Asparouhov, Hunter, & Leuchter,

2011; Roy, 2003; Roy & Daniels, 2008; Yuan & Little, 2009). By

limiting the scope of this article to classic techniques, I hope to

provide readers with the necessary background information for

accessing these newer approaches. B. Muthe ´n et al. (2011) has

provided an excellent overview of these recent innovations.

The organization of this article is as follows. I begin with an

overview of Rubin’s (1976) missing data theory, including a

discussion of how selection models and pattern mixture models fit

into Rubin’s definition of an MNAR mechanism. After a brief

review of growth curve models, I then describe classic selection

models and pattern mixture models for longitudinal data. Next, I

use a series of data analysis examples to illustrate the estimation

and interpretation of the models. I then conclude with a discussion

of model selection and sensitivity analyses.

Theoretical Background

Some background information on Rubin’s (1976) missing data

theory is useful for understanding the rationale behind MNAR

analysis models. According to Rubin, the propensity for missing

data is a random variable that has a distribution. In practical terms,

this implies that each variable potentially yields a pair of scores: an

underlying Y value that may or may not be observed and a

corresponding R value that denotes whether Y is observed or is

missing (e.g., R ? 0 if Y is observed and R ? 1 if Y is missing).

Under an MNAR mechanism, the data and the probability of

missingness have a joint distribution:

p?Yi, Ri??, ??, (1)

where p denotes a probability distribution, Yiis the outcome

variable for case i, Riis the corresponding missing data indicator,

? is a set of parameters that describes the distribution of Y (e.g.,

growth model parameters), and ? contains parameters that de-

scribe the propensity for missing data on Y (e.g., a set of logistic

regression coefficients that predict R). Collectively, the parameters

of the joint distribution dictate the mutual occurrence of different

Y values and missing data.

Under an MAR mechanism, Equation 1 simplifies, and it is

unnecessary to estimate the parameters that dictate missingness

(i.e., ?). For this reason, an MAR mechanism is often referred to

as ignorable missingness. In contrast, an MNAR mechanism re-

quires an analysis model that includes all parameters of the joint

distribution, not just those that are of substantive interest. In

practical terms, this means that the statistical analysis must incor-

porate a submodel that describes the propensity for missing data

(e.g., a logistic regression that predicts R). Both the selection

model and the pattern mixture model incorporate a model for R

into the analysis, but they do so in different ways.

The selection model and the pattern mixture model factor the

joint distribution of Y and R into the product of two separate

distributions. In the selection modeling framework, the joint dis-

tribution is as follows:

p?Yi, Ri??, ?? ? p?Yi???p?Ri?Yi, ??,(2)

where p?Yi??? is the marginal distribution of Y, and p?Ri?Yi,?? is the

conditional distribution of missing data, given Y. The preceding

factorization implies a two-part model where the marginal distri-

bution corresponds to the substantive analysis (e.g., a growth

model) and where the conditional distribution corresponds to a

regression model that uses Y to predict the probability of missing

data. The regression of R on Y is inherently inestimable because Y

is always missing whenever R equals one. The selection model

achieves identification by imposing strict distributional assump-

tions, typically multivariate normality. The model tends to be

highly sensitive to this assumption, and even slight departures

from normality can produce substantial bias.

In the pattern mixture modeling framework, the factorization

reverses the role of Y and R as follows:

p?Yi, Ri??, ?? ? p?Yi?Ri, ??p?Ri???,(3)

where p?Yi?Ri, ?? is the conditional distribution of Y, given a

particular value of R, and p?Ri??? is the marginal distribution of

R. The preceding factorization implies a two-part model where

the conditional distribution of Y represents the substantive

model parameters for a group of cases that shares the same

missing data pattern and where the marginal distribution of R

describes the incidence of different missing data patterns. This

factorization implies the following strategy: Stratify the sample

into subgroups that share a common missing data pattern, and

estimate the substantive model separately within each pattern.

Although it is not immediately obvious, the pattern mixture

model is also inestimable without invoking additional assump-

tions. For example, a growth model is underidentified in a group

of cases with only two observed data points. Therefore, these

assumptions would take the form of assumed values for the

inestimable parameters. I discuss these assumptions in detail

later in the article, but suffice it to say that the model is prone

to bias when its assumptions are incorrect.

The selection model and pattern mixture model are equivalent

in the sense that they describe the same joint distribution. However,

2

ENDERS

Page 3

because the two frameworks require different assumptions, they

can (and often do) produce very different estimates of the substan-

tive model parameters. There is usually no way to judge the

relative accuracy of the two models because both rely heavily on

untestable assumptions. For this reason, methodologists generally

recommend sensitivity analyses that apply different models (and

thus different assumptions) to the same data. I illustrate the appli-

cation of these models to longitudinal data later in the article.

Brief Overview of Growth Curve Models

Much of the methodological work on MNAR models has cen-

tered on longitudinal data analyses, particularly growth curve

models (also known as mixed effects models, random coefficient

models, and multilevel models). Because this article focuses solely

on longitudinal data analyses, a brief overview of the growth curve

model is warranted before proceeding. A growth model expresses

the outcome variable as a function of a temporal predictor variable

that captures the passage of time. For example, the unconditional

linear growth curve model is as follows:

Yti? ?0? ?1?TIMEti? ? b0i? b1i?TIMEti? ? εti, (4)

where Ytiis the outcome score for case i at time t, TIMEtiis the

value of the temporal predictor for case i at time t (e.g., the elapsed

time since the onset of the study), ?0is the mean intercept, ?1is

the mean growth rate, b0iand b1iare residuals (i.e., random effects)

that allow the intercepts and the change rates, respectively, to vary

across individuals, and εtiis a time-specific residual that captures

the difference between an individual’s fitted linear trajectory and

his or her observed data. The model can readily incorporate non-

linear change by means of polynomial terms. For example, the

unconditional quadratic growth model is as follows:

Yti? ?0? ?1?TIMEti? ? ?2?TIMEti

2? ? b0i? b1i?TIMEti?

? b2i?TIMEti

2? ? εti,(5)

where ?0is the mean intercept, ?1is the average instantaneous

linear change when TIME equals zero, and ?2is the mean curva-

ture. As before, the model uses a set of random effects to incor-

porate individual heterogeneity into the developmental trajectories

(i.e., b0i, b1i,and b2i), and εtiis a time-specific residual.

The previous models are estimable from the multilevel, mixed

model or from the structural equation modeling frameworks.

Structural equation modeling—and the Mplus software package, in

particular—provides a convenient platform for estimating MNAR

models. Cast as a structural equation model, the individual growth

components (i.e., b0i, b1i, and b2i) are latent variables, the means of

which (i.e., ?0, ?1, and ?2) define the average growth trajectory.

To illustrate, Figure 1 shows a path diagram of a linear growth

model from a longitudinal study with four equally spaced assess-

ments. The unit factor loadings for the intercept latent variable

reflect the fact that the intercept is a constant component of each

individual’s idealized growth trajectory, and the loadings for the

linear latent variable capture the timing of the assessments (i.e., the

TIME scores in Equation 4). A quadratic growth model incorporates

an additional latent factor with loadings equal to the square of the

linear factor loadings. A number of resources are available to readers

who want additional details on growth curve models (Bollen &

Curran, 2006; Hancock & Lawrence, 2006; Hedeker & Gibbons,

2006; Singer & Willett, 2003). As an aside, mixed modeling software

programs (e.g., PROC MIXED in SAS) can also estimate some of the

MNAR models that I describe in this article (e.g., the selection

models). Although different modeling frameworks often yield iden-

tical parameter estimates, the latent growth curve approach is argu-

ably more convenient for implementing MNAR models.

Selection Models for Longitudinal Data

Heckman (1976, 1979) originally proposed the selection model

as a bias correction method for regression analyses with MNAR

data on the outcome variable. Like their classic predecessor, se-

lection models for longitudinal data combine a substantive model

(i.e., a growth curve model) with a set of regression equations that

predict missingness. The two parts of the model correspond to the

factorization on the right side of Equation 2. The literature de-

scribes two classes of longitudinal models that posit different

linkages between the repeated measures variables and the missing

data indicators. Wu and Carroll’s (1988) model indirectly links the

repeated measures variables to the response probabilities through

the individual intercepts and slopes (i.e., the b0i, and b1i, terms in

Equation 4). This approach is commonly referred to as the random

coefficient selection model or the shared parameter model.1In

contrast, Diggle and Kenward’s (1994) selection model directly

relates the probability of missing data at time t to the outcome

variable at time t. Although these models have commonalities,

1Authors often treat the shared parameter model as a distinct MNAR

approach. Because the structural features of Wu and Carroll’s (1988)

model are similar to those of Diggle and Kenward’s (1994) model (i.e., one

or more variables from the substantive model predict missingness), I treat

both as selection models.

Y1

Intercept

b0i

1

Slope

b1i

Y2

Y3

11

13

1

2

Y4

11

01

Figure 1.

?1? mean growth rate; b0iand b1i? residuals that allow the intercepts and

the change rates, respectively, to vary across individuals; Y1–Y4? outcome

variables; ε1–ε4? time-specific residuals.

Path diagram of a linear growth model. ?0? mean intercept;

3

MISSING NOT AT RANDOM MODELS

Page 4

they require somewhat different assumptions and may produce

different estimates. This section provides a brief description of the

two models, and a number of resources are available to readers

who are interested in additional technical details (Albert & Foll-

mann, 2009; Diggle & Kenward, 1994; Little, 2009; Molenberghs

& Kenward, 2007; Verbeke, Molenberghs, & Kenward, 2000).

Wu and Carroll’s (1988) Model

Wu and Carroll’s (1988) model uses the individual growth

trajectories to predict the probability of missing data at time t. To

illustrate, Figure 2 shows a path diagram of a linear growth curve

model of the type developed by Wu and Carroll. The rectangles

labeled R2, R3, and R4are missing data indicators that denote

whether the outcome variable is observed at a particular assess-

ment (e.g., Rt? 0 if Ytis observed, and Rt? 1 if Ytis missing).

Note that the model does not require an R1indicator when the

baseline assessment is complete, as is the case in the figure. The

dashed arrows that link the latent variables (i.e., the individual

intercepts and slopes) to the missing data indicators represent

logistic regression equations.2Regressing the indicator variables

on the intercepts and slopes effectively allows the probability of

missing data to depend on the entire set of repeated measures

variables, including the unobserved scores from later assessments.

Although this proposition may seem awkward, linking the re-

sponse probabilities to the intercepts and slopes is useful when

missingness is potentially dependent on an individual’s overall

developmental trajectory rather than a single error-prone realiza-

tion of the outcome variable (Albert & Follmann, 2009; Little,

1995).

Diggle and Kenward’s (1994) Model

Diggle and Kenward’s (1994) model also combines a growth

curve model with a set of regression equations that predict miss-

ingness. However, unlike Wu and Carroll’s (1988) model, the

probability of missing data at wave t depends directly on the

repeated measures variables. To illustrate, Figure 3 shows a path

diagram of a linear Diggle and Kenward growth curve model. As

before, the rectangles labeled R2, R3, and R4are missing data

indicators that denote whether the outcome variable is observed or

missing, and the dashed arrows represent logistic regression equa-

tions. Notice that the probability of missing data at time t now

depends directly on the outcome variable at time t as well as on the

outcome variable from the preceding assessment (e.g., Y1and Y2

predict R2, Y2and Y3predict R3, and so on).

As an aside, the logistic regression equations in the previous

models potentially carry information about the missing data mech-

anism. For example, in Diggle and Kenward’s (1994) model, a

significant path between Rtand Ytimplies an MNAR mechanism

because dropout at wave t is concurrently related to the outcome.

Similarly, a significant association between Rtand Yt?1provides

evidence for an MAR mechanism because dropout at time t is

related to the outcome at the previous assessment. Finally, the

absence of any relationship between the outcomes and the missing

data indicators is consistent with an MCAR mechanism because

dropout is unrelated to the variables in the model. Although it is

tempting to use the logistic regressions to make inferences about

the missing data mechanism, it is important to reiterate that these

associations are estimable only because of strict distributional

assumptions. Consequently, using the logistic regressions to eval-

uate the missing data mechanism is tenuous, at best.

Selection Model Assumptions

Although it is not immediately obvious, longitudinal selection

models rely on distributional assumptions to achieve identification,

and these distributional assumptions dictate the accuracy of the

resulting parameter estimates. For Wu and Carroll’s (1988) model,

identification is driven by distributional assumptions for the ran-

dom effects (i.e., the individual intercepts and slopes), whereas

Diggle and Kenward’s (1994) model requires distributional as-

sumptions for the repeated measures variables. Without these

assumptions, the models are inestimable (e.g., in Diggle & Ken-

ward’s, 1994, model, the regression of Rton Ytis inestimable

because Y is always missing whenever R equals one). With con-

tinuous outcomes, the typical practice is to assume a multivariate

normal distribution for the individual intercepts and slopes or for

the repeated measures variables. Wu and Carroll’s model addition-

ally assumes that the repeated measures variables and the missing

data indicators are conditionally independent, given the random

effects (i.e., after controlling for the individual growth trajectories,

there is no residual correlation between Ytand Rt). Collectively,

these requirements are difficult to assess with missing data, so the

accuracy of the resulting parameter estimates ultimately relies on

one or more untestable assumptions.

2A logistic model is not the only possibility for the missing data

indicators. Probit models are also common.

Y1

Intercept

b0i

1

Slope

b1i

Y2

Y3

11

13

1

2

Y4

11

R2

R3

R4

0

1

Figure 2.

R2–R4? missing data indicators; ?0? mean intercept; ?1? mean growth

rate; b0iand b1i? residuals that allow the intercepts and the change rates,

respectively, to vary across individuals; Y1–Y4? outcome variables; ε1–

ε4? time-specific residuals.

Path diagram of a linear Wu and Carroll (1988) growth model.

4

ENDERS

Page 5

Coding the Missing Data Indicators

Thus far, I have been purposefully vague about the missing data

indicators because the appropriate coding scheme depends on the

exact configuration of missing values. The models of Wu and

Carroll (1988) and Diggle and Kenward (1994) were originally

developed for studies with permanent attrition (i.e., a monotone

missing data pattern). In this scenario, it makes sense to utilize

discrete-time survival indicators, such that Rttakes on a value of

zero prior to dropout, a value of one at the assessment where

dropout occurs, and a missing value code at all subsequent assess-

ments (e.g., B. Muthe ´n & Masyn, 2005; Singer & Willett, 2003).

In contrast, when a study has only intermittent missing values, it is

reasonable to represent the indicators as a series of independent

Bernoulli trials, such that Rttakes on a value of zero at any

assessment where Ytis observed and takes on a value of one at any

assessment where Ytis missing.

Most longitudinal studies have a mixture of sporadic missing-

ness and permanent attrition. One option for dealing with this

configuration of missingness is to use discrete-time survival indi-

cators to represent the dropout patterns and code intermittent

missing values as though they were observed (i.e., for intermit-

tently missing values, Rttakes on a value of zero). Because

intermittent missingness is not treated as a target event, this coding

strategy effectively assumes that these values are consistent with

an MAR mechanism. A second option for dealing with intermittent

missingness and permanent attrition is to create indicators that are

consistent with a multinomial logistic regression (Albert & Foll-

mann, 2009; Albert, Follmann, Wang, & Suh, 2002), such that the

two types of missingness have distinct numeric codes. I illustrate

these various coding strategies in the subsequent data analysis

examples.

Pattern Mixture Models for Longitudinal Data

Like the selection model, the pattern mixture approach inte-

grates a model for the missing data into the analysis, but it does

so in a very different way. Specifically, a pattern mixture

analysis stratifies the sample into subgroups that share the same

missing data pattern and estimates a growth model separately

within each pattern. For example, in a four-wave study with a

monotone missing data pattern, the complete cases would form

one pattern, the cases that drop out following the baseline

assessment would constitute a second pattern, the cases that

leave the study after the second wave would form a third

pattern, and the cases with missing data at the final assessment

only would form the fourth pattern. Assuming a sufficient

sample size within each pattern, the four missing data groups

would yield unique estimates of the growth model parameters.

Returning to Equation 3, these pattern-specific estimates corre-

spond to the conditional distribution p?Yi?Ri, ??, and the group

proportions correspond to p?Ri???.

Although the pattern-specific estimates are often informative,

the usual substantive goal is to estimate the population growth

trajectory. Computing the weighted average of the pattern-specific

estimates yields a marginal estimate that averages over the distri-

bution of missingness. For example, the average intercept from the

hypothetical four-wave study is as follows:

ˆ??0? ? ˆ?1??ˆ0

?1?? ? ˆ?2??ˆ0

?2?? ? ˆ?3??ˆ0

?3?? ? ˆ?4??ˆ0

?4?, (6)

where the numeric superscript denotes the missing data pattern,

? ˆ? p?is the proportion of cases in missing data pattern p, and ?ˆ0

is the pattern-specific intercept estimate. Of importance, a pattern

mixture analysis does not automatically produce standard errors

for the average estimates because these quantities are a function of

the model parameters. Consequently, it is necessary to use the

multivariate delta method to derive an approximate standard error

(Hedeker & Gibbons, 1997; Hogan & Laird, 1997). Fortunately,

performing these additional computations is unnecessary because

Mplus can readily compute the average estimates and their stan-

dard errors.

As an aside, stratifying cases by missing data pattern is also an

old MAR-based strategy that predates current maximum likelihood

missing data handling techniques (B. Muthe ´n, Kaplan, & Hollis,

1987). This so-called multiple group approach used between-

pattern equality constraints on the model parameters to trick ex-

isting structural equation modeling programs into producing a

single set of MAR-based estimates. Although this procedure

closely resembles a pattern mixture model, forcing the missing

data patterns to have the same parameter estimates effectively

ignores the pattern-specific conditioning that is central to the

MNAR factorization in Equation 3.

? p?

Model Identification

Although its resemblance to a multiple group analysis makes the

pattern mixture model conceptually straightforward, implementing

Y1

Intercept

b0i

1

Slope

b1i

Y2

Y3

11

13

1

2

Y4

11

R2

R3

R4

0

1

Figure 3.

model. ?0? mean intercept; ?1? mean growth rate; b0iand b1i?

residuals that allow the intercepts and the change rates, respectively, to

vary across individuals; Y1–Y4? outcome variables; ε1–ε4? time-specific

residuals; R2–R4? missing data indicators.

Path diagram of Diggle and Kenward’s (1994) linear growth

5

MISSING NOT AT RANDOM MODELS