Page 1

Missing Not at Random Models for Latent Growth Curve Analyses

Craig K. Enders

Arizona State University

The past decade has seen a noticeable shift in missing data handling techniques that assume a missing at

random (MAR) mechanism, where the propensity for missing data on an outcome is related to other analysis

variables. Although MAR is often reasonable, there are situations where this assumption is unlikely to hold,

leading to biased parameter estimates. One such example is a longitudinal study of substance use where

participants with the highest frequency of use also have the highest likelihood of attrition, even after

controlling for other correlates of missingness. There is a large body of literature on missing not at random

(MNAR) analysis models for longitudinal data, particularly in the field of biostatistics. Because these methods

allow for a relationship between the outcome variable and the propensity for missing data, they require a

weaker assumption about the missing data mechanism. This article describes 2 classic MNAR modeling

approaches for longitudinal data: the selection model and the pattern mixture model. To date, these models

have been slow to migrate to the social sciences, in part because they required complicated custom computer

programs. These models are now quite easy to estimate in popular structural equation modeling programs,

particularly Mplus. The purpose of this article is to describe these MNAR modeling frameworks and to

illustrate their application on a real data set. Despite their potential advantages, MNAR-based analyses are not

without problems and also rely on untestable assumptions. This article offers practical advice for implement-

ing and choosing among different longitudinal models.

Keywords: missing data, pattern mixture model, selection model, attrition, missing not at random

Supplemental materials: http://dx.doi.org/10.1037/a0022640.supp

Missing data handling techniques have received considerable

attention in the methodological literature during the past 40 years.

This literature has largely discredited most of the simple proce-

dures that have enjoyed widespread use for decades, including

methods that discard incomplete cases (e.g., listwise deletion,

pairwise deletion) and approaches that impute the data with a

single set of replacement values (e.g., mean imputation, regression

imputation, last observation carried forward). The past decade has

seen a noticeable shift to analytic techniques that assume a missing

at random (MAR) mechanism, whereby an individual’s propensity

for missing data on a variable Y is potentially related to other

variables in the analysis (or in the imputation model) but not to the

unobserved values of Y itself (Little & Rubin, 2002; Rubin, 1976).

Maximum likelihood estimation and multiple imputation are ar-

guably the predominant MAR-based approaches, although inverse

probability weighting methods have gained traction in the statistics

literature (e.g., Carpenter, Kenward, & Vansteelandt, 2006; Robins

& Rotnitzky, 1995; Scharfstein, Rotnitzky, & Robins, 1999). A

number of resources are available to readers who are interested in

additional details on these methods (e.g., Carpenter et al., 2006;

Enders, 2010; Little & Rubin, 2002; Rotnitzky, 2009; Schafer,

1997; Schafer & Graham, 2002).

Although the MAR mechanism is often reasonable, there are situ-

ations where this assumption is unlikely to hold. For example, in a

longitudinal study of substance use, it is reasonable to expect partic-

ipants with the highest frequency of use to have the highest likelihood

of attrition, even after controlling for other correlates of missingness.

Similarly, in a study that examines quality of life changes throughout

the course of a clinical trial for a new cancer medication, it is likely

that patients with rapidly decreasing quality of life scores are more

likely to leave the study because they die or become too ill to

participate. The previous scenarios are characterized by a relationship

between the outcome variable (i.e., substance use, quality of life) and

the propensity for missing data. This so-called missing not at random

(MNAR) mechanism is problematic because MAR-based analyses

are likely to produce biased parameter estimates. Unfortunately, there

is no empirical test of the MAR mechanism, so it is generally

impossible to fully rule out MNAR missingness. This underscores the

need for MNAR analysis methods.

ThereisaratherlargebodyofliteratureonMNARanalysismodels

for longitudinal data, particularly in the field of biostatistics (e.g.,

Albert & Follmann, 2000, 2009; Diggle & Kenward, 1994; Follmann

& Wu, 1995; Little, 1995, 2009; Molenberghs & Kenward, 2007;

Verbeke, Molenberghs, & Kenward, 2000; Wu & Bailey, 1989; Wu

& Carroll, 1988). This literature addresses a wide variety of substan-

tive applications and includes models for categorical outcomes, count

data, and continuous variables, to name a few. Although researchers

are sometimes quick to discount MAR-based analyses, MNAR mod-

els are not without their own problems. In particular, MNAR analyses

rely heavily on untestable assumptions (e.g., normally distributed

latent variables), and even relatively minor violations of these as-

sumptions can introduce substantial bias. This fact has led some

methodologists to caution against the routine use of these models

(Demirtas & Schafer, 2003; Schafer, 2003). A common viewpoint is

that MNAR models are most appropriate for exploring the sensitivity

Correspondence concerning this article should be addressed to Craig K.

Enders, Box 871104, Department of Psychology, Arizona State University,

Tempe, AZ 85287–1104. E-mail: craig.enders@asu.edu

Psychological Methods

2011, Vol. 16, No. 1, 1–16

© 2011 American Psychological Association

1082-989X/11/$12.00DOI: 10.1037/a0022640

1

Page 2

of one’s results to a variety of different assumptions and conditions.

Despite their potential problems, MNAR models are important op-

tions to consider, particularly when outcome-related attrition seems

plausible. At the very least, these procedures can augment the results

from an MAR-based analysis.

Although MNAR analysis models have been in the literature for

many years, they have been slow to migrate to the social and the

behavioral sciences. To date, most substantive applications have

appeared in the medical literature (e.g., Hogan, Roy, & Korkont-

zelou, 2004; Kenward, 1998; Michiels, Molenberghs, Bijnens,

Vangeneugden, & Thijs, 2002). The adoption of any novel statis-

tical procedure is partially a function of awareness but is also

driven by software availability. MNAR analyses were traditionally

difficult to implement because they required complicated custom

programming. These models are now quite easy to estimate in

popular structural equation modeling programs, particularly Mplus

(L. K. Muthe ´n & Muthe ´n, 1998–2010). Consequently, the purpose

of this article is to describe two classic MNAR modeling families

for longitudinal data—selection models and pattern mixture mod-

els—and illustrate their use on a real data set. Methodologists

continue to develop MNAR analysis methods, most of which

extend the models that I describe in this article (e.g., Beunckens,

Molenberghs, Verbeke, & Mallinckrodt, 2008; Dantan, Proust-

Lima, Letenneur, & Jacqmin-Gadda, 2008; Lin, McCulloch, &

Rosenheck, 2004; B. Muthe ´n, Asparouhov, Hunter, & Leuchter,

2011; Roy, 2003; Roy & Daniels, 2008; Yuan & Little, 2009). By

limiting the scope of this article to classic techniques, I hope to

provide readers with the necessary background information for

accessing these newer approaches. B. Muthe ´n et al. (2011) has

provided an excellent overview of these recent innovations.

The organization of this article is as follows. I begin with an

overview of Rubin’s (1976) missing data theory, including a

discussion of how selection models and pattern mixture models fit

into Rubin’s definition of an MNAR mechanism. After a brief

review of growth curve models, I then describe classic selection

models and pattern mixture models for longitudinal data. Next, I

use a series of data analysis examples to illustrate the estimation

and interpretation of the models. I then conclude with a discussion

of model selection and sensitivity analyses.

Theoretical Background

Some background information on Rubin’s (1976) missing data

theory is useful for understanding the rationale behind MNAR

analysis models. According to Rubin, the propensity for missing

data is a random variable that has a distribution. In practical terms,

this implies that each variable potentially yields a pair of scores: an

underlying Y value that may or may not be observed and a

corresponding R value that denotes whether Y is observed or is

missing (e.g., R ? 0 if Y is observed and R ? 1 if Y is missing).

Under an MNAR mechanism, the data and the probability of

missingness have a joint distribution:

p?Yi, Ri??, ??, (1)

where p denotes a probability distribution, Yiis the outcome

variable for case i, Riis the corresponding missing data indicator,

? is a set of parameters that describes the distribution of Y (e.g.,

growth model parameters), and ? contains parameters that de-

scribe the propensity for missing data on Y (e.g., a set of logistic

regression coefficients that predict R). Collectively, the parameters

of the joint distribution dictate the mutual occurrence of different

Y values and missing data.

Under an MAR mechanism, Equation 1 simplifies, and it is

unnecessary to estimate the parameters that dictate missingness

(i.e., ?). For this reason, an MAR mechanism is often referred to

as ignorable missingness. In contrast, an MNAR mechanism re-

quires an analysis model that includes all parameters of the joint

distribution, not just those that are of substantive interest. In

practical terms, this means that the statistical analysis must incor-

porate a submodel that describes the propensity for missing data

(e.g., a logistic regression that predicts R). Both the selection

model and the pattern mixture model incorporate a model for R

into the analysis, but they do so in different ways.

The selection model and the pattern mixture model factor the

joint distribution of Y and R into the product of two separate

distributions. In the selection modeling framework, the joint dis-

tribution is as follows:

p?Yi, Ri??, ?? ? p?Yi???p?Ri?Yi, ??,(2)

where p?Yi??? is the marginal distribution of Y, and p?Ri?Yi,?? is the

conditional distribution of missing data, given Y. The preceding

factorization implies a two-part model where the marginal distri-

bution corresponds to the substantive analysis (e.g., a growth

model) and where the conditional distribution corresponds to a

regression model that uses Y to predict the probability of missing

data. The regression of R on Y is inherently inestimable because Y

is always missing whenever R equals one. The selection model

achieves identification by imposing strict distributional assump-

tions, typically multivariate normality. The model tends to be

highly sensitive to this assumption, and even slight departures

from normality can produce substantial bias.

In the pattern mixture modeling framework, the factorization

reverses the role of Y and R as follows:

p?Yi, Ri??, ?? ? p?Yi?Ri, ??p?Ri???,(3)

where p?Yi?Ri, ?? is the conditional distribution of Y, given a

particular value of R, and p?Ri??? is the marginal distribution of

R. The preceding factorization implies a two-part model where

the conditional distribution of Y represents the substantive

model parameters for a group of cases that shares the same

missing data pattern and where the marginal distribution of R

describes the incidence of different missing data patterns. This

factorization implies the following strategy: Stratify the sample

into subgroups that share a common missing data pattern, and

estimate the substantive model separately within each pattern.

Although it is not immediately obvious, the pattern mixture

model is also inestimable without invoking additional assump-

tions. For example, a growth model is underidentified in a group

of cases with only two observed data points. Therefore, these

assumptions would take the form of assumed values for the

inestimable parameters. I discuss these assumptions in detail

later in the article, but suffice it to say that the model is prone

to bias when its assumptions are incorrect.

The selection model and pattern mixture model are equivalent

in the sense that they describe the same joint distribution. However,

2

ENDERS

Page 3

because the two frameworks require different assumptions, they

can (and often do) produce very different estimates of the substan-

tive model parameters. There is usually no way to judge the

relative accuracy of the two models because both rely heavily on

untestable assumptions. For this reason, methodologists generally

recommend sensitivity analyses that apply different models (and

thus different assumptions) to the same data. I illustrate the appli-

cation of these models to longitudinal data later in the article.

Brief Overview of Growth Curve Models

Much of the methodological work on MNAR models has cen-

tered on longitudinal data analyses, particularly growth curve

models (also known as mixed effects models, random coefficient

models, and multilevel models). Because this article focuses solely

on longitudinal data analyses, a brief overview of the growth curve

model is warranted before proceeding. A growth model expresses

the outcome variable as a function of a temporal predictor variable

that captures the passage of time. For example, the unconditional

linear growth curve model is as follows:

Yti? ?0? ?1?TIMEti? ? b0i? b1i?TIMEti? ? εti, (4)

where Ytiis the outcome score for case i at time t, TIMEtiis the

value of the temporal predictor for case i at time t (e.g., the elapsed

time since the onset of the study), ?0is the mean intercept, ?1is

the mean growth rate, b0iand b1iare residuals (i.e., random effects)

that allow the intercepts and the change rates, respectively, to vary

across individuals, and εtiis a time-specific residual that captures

the difference between an individual’s fitted linear trajectory and

his or her observed data. The model can readily incorporate non-

linear change by means of polynomial terms. For example, the

unconditional quadratic growth model is as follows:

Yti? ?0? ?1?TIMEti? ? ?2?TIMEti

2? ? b0i? b1i?TIMEti?

? b2i?TIMEti

2? ? εti,(5)

where ?0is the mean intercept, ?1is the average instantaneous

linear change when TIME equals zero, and ?2is the mean curva-

ture. As before, the model uses a set of random effects to incor-

porate individual heterogeneity into the developmental trajectories

(i.e., b0i, b1i,and b2i), and εtiis a time-specific residual.

The previous models are estimable from the multilevel, mixed

model or from the structural equation modeling frameworks.

Structural equation modeling—and the Mplus software package, in

particular—provides a convenient platform for estimating MNAR

models. Cast as a structural equation model, the individual growth

components (i.e., b0i, b1i, and b2i) are latent variables, the means of

which (i.e., ?0, ?1, and ?2) define the average growth trajectory.

To illustrate, Figure 1 shows a path diagram of a linear growth

model from a longitudinal study with four equally spaced assess-

ments. The unit factor loadings for the intercept latent variable

reflect the fact that the intercept is a constant component of each

individual’s idealized growth trajectory, and the loadings for the

linear latent variable capture the timing of the assessments (i.e., the

TIME scores in Equation 4). A quadratic growth model incorporates

an additional latent factor with loadings equal to the square of the

linear factor loadings. A number of resources are available to readers

who want additional details on growth curve models (Bollen &

Curran, 2006; Hancock & Lawrence, 2006; Hedeker & Gibbons,

2006; Singer & Willett, 2003). As an aside, mixed modeling software

programs (e.g., PROC MIXED in SAS) can also estimate some of the

MNAR models that I describe in this article (e.g., the selection

models). Although different modeling frameworks often yield iden-

tical parameter estimates, the latent growth curve approach is argu-

ably more convenient for implementing MNAR models.

Selection Models for Longitudinal Data

Heckman (1976, 1979) originally proposed the selection model

as a bias correction method for regression analyses with MNAR

data on the outcome variable. Like their classic predecessor, se-

lection models for longitudinal data combine a substantive model

(i.e., a growth curve model) with a set of regression equations that

predict missingness. The two parts of the model correspond to the

factorization on the right side of Equation 2. The literature de-

scribes two classes of longitudinal models that posit different

linkages between the repeated measures variables and the missing

data indicators. Wu and Carroll’s (1988) model indirectly links the

repeated measures variables to the response probabilities through

the individual intercepts and slopes (i.e., the b0i, and b1i, terms in

Equation 4). This approach is commonly referred to as the random

coefficient selection model or the shared parameter model.1In

contrast, Diggle and Kenward’s (1994) selection model directly

relates the probability of missing data at time t to the outcome

variable at time t. Although these models have commonalities,

1Authors often treat the shared parameter model as a distinct MNAR

approach. Because the structural features of Wu and Carroll’s (1988)

model are similar to those of Diggle and Kenward’s (1994) model (i.e., one

or more variables from the substantive model predict missingness), I treat

both as selection models.

Y1

Intercept

b0i

1

Slope

b1i

Y2

Y3

11

13

1

2

Y4

11

01

Figure 1.

?1? mean growth rate; b0iand b1i? residuals that allow the intercepts and

the change rates, respectively, to vary across individuals; Y1–Y4? outcome

variables; ε1–ε4? time-specific residuals.

Path diagram of a linear growth model. ?0? mean intercept;

3

MISSING NOT AT RANDOM MODELS

Page 4

they require somewhat different assumptions and may produce

different estimates. This section provides a brief description of the

two models, and a number of resources are available to readers

who are interested in additional technical details (Albert & Foll-

mann, 2009; Diggle & Kenward, 1994; Little, 2009; Molenberghs

& Kenward, 2007; Verbeke, Molenberghs, & Kenward, 2000).

Wu and Carroll’s (1988) Model

Wu and Carroll’s (1988) model uses the individual growth

trajectories to predict the probability of missing data at time t. To

illustrate, Figure 2 shows a path diagram of a linear growth curve

model of the type developed by Wu and Carroll. The rectangles

labeled R2, R3, and R4are missing data indicators that denote

whether the outcome variable is observed at a particular assess-

ment (e.g., Rt? 0 if Ytis observed, and Rt? 1 if Ytis missing).

Note that the model does not require an R1indicator when the

baseline assessment is complete, as is the case in the figure. The

dashed arrows that link the latent variables (i.e., the individual

intercepts and slopes) to the missing data indicators represent

logistic regression equations.2Regressing the indicator variables

on the intercepts and slopes effectively allows the probability of

missing data to depend on the entire set of repeated measures

variables, including the unobserved scores from later assessments.

Although this proposition may seem awkward, linking the re-

sponse probabilities to the intercepts and slopes is useful when

missingness is potentially dependent on an individual’s overall

developmental trajectory rather than a single error-prone realiza-

tion of the outcome variable (Albert & Follmann, 2009; Little,

1995).

Diggle and Kenward’s (1994) Model

Diggle and Kenward’s (1994) model also combines a growth

curve model with a set of regression equations that predict miss-

ingness. However, unlike Wu and Carroll’s (1988) model, the

probability of missing data at wave t depends directly on the

repeated measures variables. To illustrate, Figure 3 shows a path

diagram of a linear Diggle and Kenward growth curve model. As

before, the rectangles labeled R2, R3, and R4are missing data

indicators that denote whether the outcome variable is observed or

missing, and the dashed arrows represent logistic regression equa-

tions. Notice that the probability of missing data at time t now

depends directly on the outcome variable at time t as well as on the

outcome variable from the preceding assessment (e.g., Y1and Y2

predict R2, Y2and Y3predict R3, and so on).

As an aside, the logistic regression equations in the previous

models potentially carry information about the missing data mech-

anism. For example, in Diggle and Kenward’s (1994) model, a

significant path between Rtand Ytimplies an MNAR mechanism

because dropout at wave t is concurrently related to the outcome.

Similarly, a significant association between Rtand Yt?1provides

evidence for an MAR mechanism because dropout at time t is

related to the outcome at the previous assessment. Finally, the

absence of any relationship between the outcomes and the missing

data indicators is consistent with an MCAR mechanism because

dropout is unrelated to the variables in the model. Although it is

tempting to use the logistic regressions to make inferences about

the missing data mechanism, it is important to reiterate that these

associations are estimable only because of strict distributional

assumptions. Consequently, using the logistic regressions to eval-

uate the missing data mechanism is tenuous, at best.

Selection Model Assumptions

Although it is not immediately obvious, longitudinal selection

models rely on distributional assumptions to achieve identification,

and these distributional assumptions dictate the accuracy of the

resulting parameter estimates. For Wu and Carroll’s (1988) model,

identification is driven by distributional assumptions for the ran-

dom effects (i.e., the individual intercepts and slopes), whereas

Diggle and Kenward’s (1994) model requires distributional as-

sumptions for the repeated measures variables. Without these

assumptions, the models are inestimable (e.g., in Diggle & Ken-

ward’s, 1994, model, the regression of Rton Ytis inestimable

because Y is always missing whenever R equals one). With con-

tinuous outcomes, the typical practice is to assume a multivariate

normal distribution for the individual intercepts and slopes or for

the repeated measures variables. Wu and Carroll’s model addition-

ally assumes that the repeated measures variables and the missing

data indicators are conditionally independent, given the random

effects (i.e., after controlling for the individual growth trajectories,

there is no residual correlation between Ytand Rt). Collectively,

these requirements are difficult to assess with missing data, so the

accuracy of the resulting parameter estimates ultimately relies on

one or more untestable assumptions.

2A logistic model is not the only possibility for the missing data

indicators. Probit models are also common.

Y1

Intercept

b0i

1

Slope

b1i

Y2

Y3

11

13

1

2

Y4

11

R2

R3

R4

0

1

Figure 2.

R2–R4? missing data indicators; ?0? mean intercept; ?1? mean growth

rate; b0iand b1i? residuals that allow the intercepts and the change rates,

respectively, to vary across individuals; Y1–Y4? outcome variables; ε1–

ε4? time-specific residuals.

Path diagram of a linear Wu and Carroll (1988) growth model.

4

ENDERS

Page 5

Coding the Missing Data Indicators

Thus far, I have been purposefully vague about the missing data

indicators because the appropriate coding scheme depends on the

exact configuration of missing values. The models of Wu and

Carroll (1988) and Diggle and Kenward (1994) were originally

developed for studies with permanent attrition (i.e., a monotone

missing data pattern). In this scenario, it makes sense to utilize

discrete-time survival indicators, such that Rttakes on a value of

zero prior to dropout, a value of one at the assessment where

dropout occurs, and a missing value code at all subsequent assess-

ments (e.g., B. Muthe ´n & Masyn, 2005; Singer & Willett, 2003).

In contrast, when a study has only intermittent missing values, it is

reasonable to represent the indicators as a series of independent

Bernoulli trials, such that Rttakes on a value of zero at any

assessment where Ytis observed and takes on a value of one at any

assessment where Ytis missing.

Most longitudinal studies have a mixture of sporadic missing-

ness and permanent attrition. One option for dealing with this

configuration of missingness is to use discrete-time survival indi-

cators to represent the dropout patterns and code intermittent

missing values as though they were observed (i.e., for intermit-

tently missing values, Rttakes on a value of zero). Because

intermittent missingness is not treated as a target event, this coding

strategy effectively assumes that these values are consistent with

an MAR mechanism. A second option for dealing with intermittent

missingness and permanent attrition is to create indicators that are

consistent with a multinomial logistic regression (Albert & Foll-

mann, 2009; Albert, Follmann, Wang, & Suh, 2002), such that the

two types of missingness have distinct numeric codes. I illustrate

these various coding strategies in the subsequent data analysis

examples.

Pattern Mixture Models for Longitudinal Data

Like the selection model, the pattern mixture approach inte-

grates a model for the missing data into the analysis, but it does

so in a very different way. Specifically, a pattern mixture

analysis stratifies the sample into subgroups that share the same

missing data pattern and estimates a growth model separately

within each pattern. For example, in a four-wave study with a

monotone missing data pattern, the complete cases would form

one pattern, the cases that drop out following the baseline

assessment would constitute a second pattern, the cases that

leave the study after the second wave would form a third

pattern, and the cases with missing data at the final assessment

only would form the fourth pattern. Assuming a sufficient

sample size within each pattern, the four missing data groups

would yield unique estimates of the growth model parameters.

Returning to Equation 3, these pattern-specific estimates corre-

spond to the conditional distribution p?Yi?Ri, ??, and the group

proportions correspond to p?Ri???.

Although the pattern-specific estimates are often informative,

the usual substantive goal is to estimate the population growth

trajectory. Computing the weighted average of the pattern-specific

estimates yields a marginal estimate that averages over the distri-

bution of missingness. For example, the average intercept from the

hypothetical four-wave study is as follows:

ˆ??0? ? ˆ?1??ˆ0

?1?? ? ˆ?2??ˆ0

?2?? ? ˆ?3??ˆ0

?3?? ? ˆ?4??ˆ0

?4?, (6)

where the numeric superscript denotes the missing data pattern,

? ˆ? p?is the proportion of cases in missing data pattern p, and ?ˆ0

is the pattern-specific intercept estimate. Of importance, a pattern

mixture analysis does not automatically produce standard errors

for the average estimates because these quantities are a function of

the model parameters. Consequently, it is necessary to use the

multivariate delta method to derive an approximate standard error

(Hedeker & Gibbons, 1997; Hogan & Laird, 1997). Fortunately,

performing these additional computations is unnecessary because

Mplus can readily compute the average estimates and their stan-

dard errors.

As an aside, stratifying cases by missing data pattern is also an

old MAR-based strategy that predates current maximum likelihood

missing data handling techniques (B. Muthe ´n, Kaplan, & Hollis,

1987). This so-called multiple group approach used between-

pattern equality constraints on the model parameters to trick ex-

isting structural equation modeling programs into producing a

single set of MAR-based estimates. Although this procedure

closely resembles a pattern mixture model, forcing the missing

data patterns to have the same parameter estimates effectively

ignores the pattern-specific conditioning that is central to the

MNAR factorization in Equation 3.

? p?

Model Identification

Although its resemblance to a multiple group analysis makes the

pattern mixture model conceptually straightforward, implementing

Y1

Intercept

b0i

1

Slope

b1i

Y2

Y3

11

13

1

2

Y4

11

R2

R3

R4

0

1

Figure 3.

model. ?0? mean intercept; ?1? mean growth rate; b0iand b1i?

residuals that allow the intercepts and the change rates, respectively, to

vary across individuals; Y1–Y4? outcome variables; ε1–ε4? time-specific

residuals; R2–R4? missing data indicators.

Path diagram of Diggle and Kenward’s (1994) linear growth

5

MISSING NOT AT RANDOM MODELS

Page 6

the procedure is made difficult by the fact that one or more of the

pattern-specific parameters are usually inestimable. To illustrate,

consider a four-wave study that uses a quadratic growth model.

The model is identified only for the subgroup of participants with

complete data. For cases with two complete observations, the

linear trend is estimable, but the quadratic coefficient and certain

variance components are not. The identification issue is most

evident in the subgroup that drops out following the baseline

assessment, where neither the linear nor the quadratic coefficients

are estimable.

Estimating a pattern mixture model requires the user to specify

values for the inestimable parameters, either explicitly or implic-

itly. Using code variables as predictors in a growth model is one

way to accomplish this (Hedeker & Gibbons, 1997, 2006). For

example, Hedeker and Gibbons (1997) classified participants from

a psychiatric drug trial as completers (cases with data at every

wave) or dropouts (cases that left the study at some point after the

baseline assessment), and they subsequently included the binary

missing data indicator as a predictor of the intercepts and slopes in

a linear growth model. A linear model with the missing data

indicator as the only predictor would be as follows:

Yti? ?0? ?1?TIMEti? ? ?2?DROPOUTi?

? ?3?DROPOUTi??TIMEti? ? b0i? b1i?TIMEti? ? εti, (7)

where DROPOUT denotes the missing data pattern (0 ? com-

pleters, 1 ? dropouts), ?0and ?1are the mean intercept and slope,

respectively, for the complete cases, ?2is the intercept difference

for the dropouts, and ?3is the slope difference for dropouts.

Hedeker and Gibbons’s (1997, 2006) approach achieves identifi-

cation by sharing information across patterns. For example, the

model in Equation 7 implicitly assumes that early dropouts have

the same developmental trajectory as the cases that drop out later

in the study. The model also assumes that all missing data patterns

share the same covariance structure.

A second estimation strategy is to implement so-called iden-

tifying restrictions that explicitly equate the inestimable param-

eters from one pattern to the estimable parameters from one or

more of the other patterns. Later in the article, I illustrate three

such restrictions: the complete case missing variable restriction,

the neighboring case missing variable restriction, and the avail-

able case missing variable restriction. As its name implies, the

complete case missing variable restriction equates the inesti-

mable parameters to the estimates from the complete cases. The

neighboring case missing variable restriction replaces inestima-

ble parameters with estimates from a group of cases that share

a comparable missing data pattern. For example, in a four-wave

study, the cases that drop out after the third wave can serve as

a donor pattern for the cases that drop out after the second

wave, such that the two patterns share the same quadratic

coefficient. Finally, the available case missing variable restric-

tion replaces inestimable growth parameters with the weighted

average of the estimates from other patterns. Still considering a

group of cases with two observations, this identifying restric-

tion would replace the inestimable quadratic term with the

average coefficient from the complete cases and the cases that

drop out following the third wave. Additional details and ex-

amples of various identification strategies are available else-

where in the literature (Demirtas & Schafer, 2003; Enders,

2010; Molenberghs, Michiels, Kenward, & Diggle, 1998; Thijs,

Molenberghs, Michiels, & Curran, 2002; Verbeke et al., 2000).

Pattern Mixture Model Assumptions

The assumed values for the inestimable parameters dictate

the accuracy of the pattern mixture model. To the extent that the

values are correct, the model can reduce or eliminate the bias

from an MNAR mechanism. However, like the selection model,

there is ultimately no way to gauge the accuracy of the resulting

estimates, and implementing different identification constraints

can (and often does) produce disparate sets of results. At first

glance, the need to specify values for inestimable parameters

may appear to be a serious weakness of the pattern mixture

model. However, some methodologists argue that this require-

ment is advantageous because it forces researchers to make

their assumptions explicit. This is in contrast to the selection

model, which relies on implicit distributional assumptions that

are not obvious. This aspect of the pattern mixture model also

provides flexibility because it allows researchers to explore the

sensitivity of the substantive model parameters to a number of

different identification constraints (i.e., assumed parameter val-

ues). In truth, the previous identifying restrictions are simply

arbitrary rules of thumb for generating parameter values. Any

number of other restrictions is possible (e.g., a restriction that

specifies a flat trajectory shape after the last observed data

point; Little, 2009), and performing a sensitivity analysis that

applies a variety of identification strategies to the same data is

usually a good idea.

Data Analysis Examples

To date, applications of longitudinal models for MNAR data are

relatively rare in the social and the behavioral sciences, perhaps

because these analyses have traditionally required complex custom

programming. Software availability is no longer a limiting factor

because the Mplus package provides a straightforward platform for

estimating a variety of selection models and pattern mixture mod-

els. This section describes a series of data analyses that apply the

MNAR models from earlier in the article. The Mplus 6 syntax files

for the analyses are available at www.appliedmissingdata.com/

papers.

The analysis examples use the psychiatric trial data from

Hedeker and Gibbons (1997, 2006).3Briefly, the data were

collected as part of the National Institute of Mental Health

Schizophrenia Collaborative Study and consist of repeated mea-

surements from 437 individuals. In the original study, partici-

pants were assigned to one of four experimental conditions (a

placebo condition and three drug regimens), but the subsequent

analyses collapsed these categories into a dichotomous treat-

ment indicator (0 ? placebo, 1 ? drug). The primary substan-

tive goal was to assess treatment-related changes in illness

severity over time. The outcome was measured on a 7-point

scale, such that higher scores reflect greater severity (e.g., 1 ?

3The data are used here with Hedeker’s permission and are available at

his website: http://tigger.uic.edu/?he

6

ENDERS

Page 7

normal, not at all ill; 7 ? among the most extremely ill). Most

of the measurements were collected at baseline, Week 1, Week

3, and Week 6, but a small number of participants also had

measurements at Week 2, Week 4, or Week 5. To simplify the

presentation, I excluded these irregular observations from all

analyses. Finally, note that the discrete measurement scale

violates multivariate normality, by definition. Although these

data are still useful for illustration purposes, the normality

violation is likely problematic for the selection model analyses.

The data set contains nine distinct missing data patterns that

represent a mixture of permanent attrition and intermittent miss-

ingness. The left column of Table 1 summarizes these patterns. To

provide some sense about the developmental trends, Figure 4

shows the observed means for each pattern by treatment condition.

The fitted trajectories in the figure suggest nonlinear growth. In

their analyses of the same data, Hedeker and Gibbons (1997, 2006)

linearize the trajectories by modeling illness severity as a function

of the square root of weeks. Although this decision is very sensi-

ble, I used a quadratic growth model for the subsequent analyses

because it provides an opportunity to illustrate the complexities

that arise with MNAR models, particularly pattern mixture models

with identifying restrictions. The analysis model is as follows:

Yti? ?0? ?1?TIMEti? ? ?2?TIMEti

2? ? ?4?DRUGi?

? ?5?DRUGi??TIMEti? ? ?6?DRUGi??TIMEti

2? ? b0i

? b1i?TIMEti? ? b2i?TIMEti

2? ? εti,(8)

where ?0, ?1, and ?2define the average growth trajectory for the

placebo cases (i.e., DRUG ? 0), and ?4, ?5, and ?6capture the

mean differences between the treatment conditions.

In an intervention study, the usual goal is to assess treatment

group differences at the end of the study. Centering the temporal

predictor at the final assessment (e.g., by fixing the final slope

factor loading to a value of 0) addresses this question because the

regression of the intercept on treatment group membership quan-

tifies the mean difference. However, implementing identifying

restrictions in a pattern mixture model is made easier by centering

the intercept at the baseline assessment, particularly when perma-

nent attrition is the primary source of missingness. Consequently,

I fixed the linear slope factor loadings to values of 0, 1, 3, and 6

for all subsequent analyses (the quadratic factor loadings are the

squares of these values). Despite this parameterization, it is

straightforward to construct a test of the endpoint mean difference.

Algebraically manipulating the growth model parameters gives the

model-implied mean difference at the final assessment as follows:

? ˆDrug? ? ˆPlacebo? ?ˆ3? 6 ? ?ˆ4? 36 ? ?ˆ5, (9)

where the ?ˆ terms are the regression coefficients that link the

growth factors to the treatment indicator (i.e., the latent mean

differences), 6 is the value of the linear factor loading at the final

assessment (i.e., the time score, weeks since baseline), and 36 is

the corresponding quadratic factor loading. Among other things,

the MODEL CONSTRAINT command in Mplus allows users to

define new parameters that are functions of the estimated param-

eters. In the subsequent analyses, I used this command to estimate

the mean difference in Equation 9 and its standard error.

MAR-Based Growth Curve Model

As a starting point, I used MAR-based maximum likelihood

missing data handling to estimate the quadratic growth curve

model. Figure 5 shows the path diagram for the analysis. Table 2

lists the estimates and the standard errors for selected parameters,

and Figure 6 displays the corresponding model-implied trajecto-

ries. The figure clearly suggests that participants in the drug

condition experienced greater reductions in illness severity relative

to the placebo group. However, it is important to emphasize that

these estimates assume that an individual’s propensity for missing

data at week t is completely determined by treatment group mem-

bership or by his or her severity score at earlier assessments (i.e.,

the missing values conform to an MAR mechanism). Substituting

the appropriate quantities from the maximum likelihood analysis

into Equation 9 gives a mean difference of ?1.424 (SE ? 0.182,

p ? .001). Expressed relative to the model-implied estimate of the

baseline standard deviation (i.e., the square root of the sum of the

intercept variance and the residual variance), the standardized

mean difference is d ? 1.563.

Diggle and Kenward’s (1994) Selection Models

Mplus is ideally suited for estimating selection models because

it can accommodate normally distributed (e.g., the repeated mea-

sures) and categorical outcomes (e.g., the missing data indicators)

in the same model. To illustrate, I fit two selection models of the

type developed by Diggle and Kenward (1994) to the psychiatric

trial data. The first analysis treated permanent attrition (Patterns

2–4) as MNAR and treated intermittent missingness (Patterns 5–9)

as MAR. As noted previously, missing data indicators that are

consistent with a discrete-time survival model are appropriate

when modeling permanent dropout. Normally, a set of three miss-

ing data indicators could represent the dropout patterns in Table 1,

but the small number of cases in Pattern 4 made it impossible to

model attrition at the second assessment. Consequently, the model

incorporated indicator variables at the final two waves with the

following coding scheme:

Table 1

Missing Data Patterns and Indicator Codes for Data

Analysis Examples

Pattern

n

Repeated measure

Dropout

code

Multinomial

code

Y1

Y2

Y3

Y4

R3

R4

R2

R3

R4

1

2

3

4

5

6

7

8

9

312

53

45

O

O

O

O

O

O

O

O

M

O

O

O

M

M

O

M

M

O

O

O

M

M

O

M

M

O

O

O

M

M

M

M

O

O

O

O

0

0

1

1

0

0

0

0

0

0

1

2

2

2

2

0

2

0

0

2

2

2

1

1

2

0

0

2

2

2

1

1

1

1

2

2

2

2

99

993

11

0

0

0

0

13

2

5

3

Note.

1 ? dropout, and 99 ? a missing value code. For multinomial coding, 0 ?

intermittent missingness, 1 ? dropout, and 2 ? observed.

O ? observed; M ? missing. For dropout codes, 0 ? observed,

7

MISSING NOT AT RANDOM MODELS

Page 8

Rt??

0

1

99

observed or intermittent missingness

dropout at time t

dropout at previous time,

where 99 represents a missing value code. Of importance, assign-

ing a code of 0 to the intermittently missing values effectively

defines sporadic missingness (Patterns 5–9) as MAR. Finally, note

that the Pattern 4 cases had indicator codes of R3? 1 and R4? 99.

This treats the missing Y2values as MAR and the missing Y3

values as MNAR dropout. The middle columns of Table 1 sum-

marize the indicator codes for each missing data pattern.

Figure 7 shows a path diagram of Diggle and Kenward’s (1994)

selection model. The different types of dashed arrows represent

equality constraints on the regression coefficients in the logistic

part of the model (e.g., the regression of R4on Y4is set as equal

to the regression of R3on Y3). Describing the specification of a

discrete-time survival model is beyond the scope of this article, but

readers who are interested in the rationale behind these constraints

can consult Singer and Willett (2003) and B. Muthe ´n and Masyn

(2005), among others.

Table 3 gives selected parameter estimates and standard errors

from the analysis. The model-implied growth trajectories were

quite similar to those in Figure 6, although the selection model

produced a larger mean difference between the treatment condi-

tions at Week 6. Specifically, substituting the appropriate estimates

into Equation 9 yields a model-implied mean difference of ?1.665

(SE ? 0.198, p ? .001) at the final assessment. Expressed relative

to the model-implied estimate of the baseline standard deviation,

this mean difference corresponds to a standardized effect size of

d ? 1.810. Notice that the selection model produced the same

substantive conclusion as the maximum likelihood analysis (i.e.,

the drug condition experienced greater reductions in illness sever-

ity), albeit with a larger effect size. Again, the normality violation

should cast doubt on the validity of the selection model estimates.

Turning to the logistic portion of the model, the regression

coefficients quantify the influence of treatment group membership

and the repeated measures variables on the hazard probabilities

(i.e., the conditional probability of dropout, given participation at

the previous assessment). For example, the significant positive

association between Rtand Ytsuggests that participants with

higher illness severity scores at wave t were more likely to drop

out, even after controlling for treatment group membership and

scores from the previous assessment. Although the accuracy of this

coefficient depends on untenable distributional assumptions, it

does provide some evidence for an MNAR mechanism. It is

important to note that estimating the model from 100 random

starting values produced two sets of solutions with different logis-

tic regression coefficients (the log likelihood values were

?2,565.814 and ?2,573.115). In the second solution, the associ-

ation between Rtand Ytswitched signs, such that cases with lower

illness severity scores were more likely to drop out. It is unclear

Figure 4.

trial data. The shaded circles denote the drug condition means, and the clear circles represent the placebo group

means.

Observed means and fitted trajectories for each of the nine missing data patterns in the psychiatric

8

ENDERS

Page 9

whether this sensitivity to different starting values is a symptom of

model misspecification (e.g., the logistic portion of the model

omits an important predictor of missingness) or normality viola-

tion. Because these models are weakly identified to begin with, a

quadratic model may be too complex, although a linear model

showed similar instability. Regardless of the underlying cause, this

finding underscores the importance of using random starting val-

ues when estimating these models.

The previous analysis treated intermittent missing values as

MAR. As an alternative, creating indicators that are consistent

with a multinomial logistic regression can distinguish between

intermittent and permanent missing values (Albert & Follmann,

2009; Albert, Follmann, Wang, & Suh, 2002). The following

coding scheme is one such example:

Rt??

0

1

2

99

intermittent missingness at time t

dropout at time t

observed at time t

dropout at an earlier time,

where 99 is a missing value code. By default, Mplus treats the

highest nonmissing category (e.g., 2) in a multinomial logistic

regression as the reference group. Therefore, assigning the highest

code to the observed values yields logistic regression coefficients

that quantify the probability of each type of missingness relative to

complete data. After minor alterations to accommodate sparse

missing data patterns, I estimated Diggle and Kenward’s (1994)

model under this alternate coding scheme. The right columns of

Table 1 summarize the indicator coding for the analysis. The

model with multinomial indicators produced mean difference and

Y1

Intercept

b0i

1

Quadratic

b2i

Y2

Y3

11

16

1

3

Y4

Linear

b1i

36

9

1

Drug

Figure 5.

figure omits the latent variable intercepts and the residual covariances

among the latent variables to reduce visual clutter. b0i, b1i, and b2i?

individual growth components; Y1–Y4? outcome variables; ε1–ε4?

time-specific residuals.

Quadratic growth model for the psychiatric data. Note that the

Figure 6.

imum likelihood analysis. MAR ? missing at random.

Model-implied growth trajectories from the MAR-based max-

Y1

Intercept

b0i

1

Quadratic

b2i

Y2

Y3

11

16

1

3

Y4

R3

R4

Linear

b1i

36

9

1

Drug

Figure 7.

psychiatric data. Note that the figure omits the latent variable intercepts and

the residual covariances among the latent variables to reduce visual clutter.

b0i, b1i, and b2i? individual growth components; Y1–Y4? outcome

variables; ε1–ε4? time-specific residuals; R3and R4? missing data

indicators.

Diggle and Kenward’s (1994) quadratic growth model for the

Table 2

MAR-Based Maximum Likelihood Estimates

ParameterEstimate

SEp

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

Week 6 difference

MAR ? missing at random.

5.293

?0.226

0.013

?0.023

?0.481

0.041

?1.424

0.083

0.083

0.013

0.096

0.095

0.014

0.181

?.001

.001

.210

.811

?.001

.001

?.001

Note.

9

MISSING NOT AT RANDOM MODELS

Page 10

effect size estimates that were quite similar to those of the previous

Diggle and Kenward model. The logistic portion of the model was

also comparable. The similarity of the two coding schemes sug-

gests that treating intermittent missing values as MAR had very

little impact on the final estimates, perhaps because permanent

attrition accounts for the vast majority of the missing data.

Wu and Carroll’s (1988) Selection Model

In Diggle and Kenward’s (1994) models, the probability of

missing data was directly related to the repeated measures vari-

ables. In contrast, Wu and Carroll’s (1988) selection model uses

individual intercepts and slopes as predictors of missingness. Al-

though it is possible to apply the previous missing data indicator

codes to Wu and Carroll’s models, only the model with discrete-

time survival indicators converged to a proper solution. Conse-

quently, I limit the subsequent discussion to an analysis that treated

permanent attrition (Patterns 2–4) as MNAR and treated intermit-

tent missingness (Patterns 5–9) as MAR. The missing data indi-

cators were identical to the discrete-time coding scheme in the

middle columns of Table 1 (i.e., 0 ? observed or intermittent

missingness, 1 ? dropout at time t, 99 ? dropout at a previous

time). An initial analysis failed to converge because the latent

variable covariance matrix was not positive definite. Constraining

the quadratic factor variance to zero eliminated this problem and

produced plausible parameter estimates. Because of this modifi-

cation, the final model used treatment group membership and the

individual intercepts and linear slopes to predict attrition. Figure 8

shows a path diagram of the final model. As before, different types

of dashed arrows represent equality constraints on the regression

coefficients in the logistic part of the model.

It is important to note that estimating the model from 100

random starting values produced 85 convergence failures, even

after eliminating the quadratic variance from the model. The 15

sets of starts that successfully converged produced comparable log

likelihood values but slightly different parameter estimates. Sim-

plifying the model by examining change as a linear function of the

square root of time reduced this problem and produced sets of

solutions with identical estimates and identical log likelihood

values. This finding suggests that a quadratic model is too complex

for these data, but it could also be the case that model misspeci-

fication or normality violations contributed to the convergence

failures. For illustration purposes, I report the quadratic model

estimates from the solution with the highest log likelihood, but

these results should be viewed with caution.

Table 4 gives selected parameter estimates and standard

errors from Wu and Carroll’s (1988) selection model. Wu and

Carroll’s model produced a smaller effect size than Diggle and

Kenward’s (1994) selection model. Specifically, substituting

the appropriate estimates into Equation 9 yields a mean differ-

ence of ?1.363 (SE ? 0.183, p ? .001) at the final assessment

and a standardized effect size of d ? 1.576. Turning to the

logistic portion of the model, the regression coefficients quan-

tify the influence of the individual intercepts and linear slopes

on the hazard probability. Because the time scores are centered

at the baseline assessment, the linear slope represents instanta-

Table 3

Diggle and Kenward’s (1994) Selection Model Estimates With

Missing Not at Random Dropout

Parameter Estimate

SEp

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

Week 6 difference

Yt3 Rt

Yt?13 Rt

Treatment 3 Rt

5.259

?0.137

0.014

?0.011

?0.509

0.039

?1.665

2.266

?1.749

0.347

0.081

0.071

0.011

0.094

0.087

0.014

0.198

0.531

0.388

0.358

?.001

.052

.199

.905

?.001

.004

?.001

?.001

?.001

.333

Y1

Intercept

b0i

1

Quadratic

b2i

Y2

Y3

11

16

1

3

Y4

R3

R4

Linear

b1i

36

9

1

Drug

Figure 8.

psychiatric data. Note that the figure omits the latent variable intercepts and

the residual covariances among the latent variables to reduce visual clutter.

R3and R4? missing data indicators; b0i, b1i, and b2i? individual growth

components; Y1–Y4? outcome variables; ε1–ε4? time-specific residuals.

Wu and Carroll’s (1988) quadratic growth model for the

Table 4

Wu and Carroll’s (1988) Selection Model Estimates With

Missing Not at Random Dropout

ParameterEstimate

SEp

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

Week 6 difference

Intercepts 3 Rt

Linear slopes 3 Rt

Treatment 3 Rt

5.274

?0.199

0.002

?0.05

?0.435

0.036

?1.363

0.482

?5.825

?3.458

0.080

0.051

0.009

0.094

0.068

0.011

0.183

0.437

2.681

1.622

?.001

?.001

.791

.599

?.001

.001

?.001

.271

.030

.033

10

ENDERS

Page 11

neous change at the beginning of the study. Consequently, the

negative coefficient for the regression of Rton the linear growth

factor suggests that participants who experienced immediate

reductions in illness severity were most likely to drop out, even

after controlling for initial severity level (i.e., the intercept) and

treatment group membership.

Overview of the Pattern Mixture Models

Hedeker and Gibbons (1997, 2006) illustrated a pattern mix-

ture modeling approach that uses the missing data pattern

(represented by one or more dummy variables) as a predictor in

the growth model. This method is advantageous because stan-

dard mixed modeling procedures (e.g., the MIXED procedures

in SPSS and SAS) can estimate the model. Mplus offers finite

mixture modeling options (B. Muthe ´n & Shedden, 1999) that

are ideally suited for implementing a variety of other pattern

mixture models that are difficult or impossible to estimate with

standard software (e.g., pattern mixture models with identifying

restrictions). Because Hedeker and Gibbons thoroughly de-

scribed the use of pattern indicators as predictors of growth, I

limit the subsequent examples to pattern mixture models with

identifying restrictions. Interested readers can consult B.

Muthe ´n et al. (2011) for other interesting variations of the

pattern mixture model.

Within the Mplus finite mixture modeling framework, each

missing data pattern functions as a pseudolatent class. In the

conventional pattern mixture model, these classes simply reflect

a manifest grouping variable that is derived from the observed

missing data patterns. For example, in a simple model, the

complete cases could form one class, and the cases with one or

more missing values could form a second class. The KNOWN-

CLASS subcommand in Mplus uses a grouping variable from

the input data set to assign cases to classes with a probability of

zero or one. Although the pattern mixture models in this section

are effectively multiple group growth models, the finite mixture

modeling framework provides a convenient mechanism for

implementing various identifying restrictions (a multiple group

model does not allow the user to specify equality constraints for

inestimable parameters). Roy (2003) and B. Muthe ´n et al.

(2011) described modeling variations that treat class member-

ship as a true latent variable.

Returning to the psychiatric trial data, Figure 4 shows the

observed means and the fitted trajectories for each of the nine

missing data patterns. With a small number of patterns and a

sufficiently large sample size, it would be possible to define

each pattern as a distinct class, but the number of cases in

Patterns 4 through 9 precludes this option. To simplify the

models, I reduced the number of classes by aggregating patterns

with comparable trajectory shapes. Considering the first three

patterns, there appears to be a relationship between dropout

time and the rate of initial decline, such that rapid improvement

is associated with earlier dropout, at least in the drug condition.

Consequently, it is reasonable to treat the first three patterns as

distinct classes. Although the decision was somewhat arbitrary,

I combined Patterns 3 and 4 because these groups were com-

parable with respect to the timing of dropout. Next, consider the

cases with intermittent missingness (Patterns 5–9). Although it

is reasonable to treat these patterns as a distinct group, the

trajectory shapes roughly resemble the growth curves for the

complete cases. Because the Bayesian information criterion

values from a series of preliminary analyses clearly favored a

model that combined Patterns 5 through 9 with Pattern 1, the

final set of pattern mixture models used three classes: (a) cases

with complete data and intermittent missing values, (b) cases

that dropped out after the third assessment, and (c) cases that

dropped out after the first or the second assessment.

Recall that pattern mixture models are inherently underiden-

tified because they typically involve one or more inestimable

parameters. With respect to the mean structure, Classes 1 and 2

have sufficient data to estimate a quadratic trend, but the

quadratic intercept and the regression of the quadratic growth

factor on the treatment group indicator are inestimable for Class

3. The subsequent models used one of three identifying restric-

tions to achieve identification. The complete case missing vari-

able restriction equated the inestimable quadratic parameters to

those of Class 1 (complete data and intermittent missingness).

The second model implemented the neighboring case missing

variable restriction by replacing the inestimable parameters

with those of Class 2 (dropout after the third assessment). The

final model used the available case missing variable restriction

and equated the quadratic parameters for Class 3 to the

weighted average of the estimates from Classes 1 and 2. In

Mplus, specifying between-class equality constraints (e.g., us-

ing the MODEL CONSTRAINT command) implements these

restrictions. Although the same identification strategies are ap-

plicable to the covariance structure, the subsequent models

assumed a common covariance matrix for the three classes.

Complete Case Missing Variable Restriction

Recall that the pattern mixture model produces unique param-

eter estimates for each class (i.e., estimates that are conditional on

the missing data pattern). Although the substantive goal is to

generate a single set of estimates that averages across the distri-

bution of missing data, it is important to inspect the class-specific

results. To better illustrate the estimates, Figure 9A shows the

model-implied growth curves for each class. Notice that the fitted

trajectories for the Class 3 drug condition and the Class 2 placebo

condition fall outside the plausible score range. For Class 3, the

identifying restriction clearly underestimated the degree of curva-

ture. For Class 2, the mean structure was identified, but attrition at

the final assessment produced an inaccurate extrapolation. After

some experimentation, changing the constrained value of the Class

3 regression coefficient from .021 to .080 produced a reasonable

trajectory that stayed within bounds. Similarly, constraining the

Class 2 intercept to a value of .070 or lower returned plausible

estimates.

At first glance, it may seem unreasonable to arbitrarily change

parameter values. However, it is important to remember that the

identifying constraints essentially represent assumptions about tra-

jectory shapes that would have been observed if the data had been

complete. Because the growth curves in Figure 9A clearly repre-

sent incorrect predictions about the unobserved data points, it is

difficult to defend a set of marginal estimates that average across

the missing data patterns. Consequently, model modification

seems necessary in this case. In truth, the identifying restrictions

are nothing more than arbitrary rules of thumb for generating

11

MISSING NOT AT RANDOM MODELS

Page 12

plausible parameter values, so viewing the restrictions as tentative

starting points for estimation and altering them as needed is a

sensible strategy.

After implementing new parameter constraints, the model pro-

duced plausible class-specific estimates. The top section of Table

5 gives the updated estimates, and Figure 9B displays the corre-

sponding model-implied trajectories. Computing the weighted

mean of the class-specific values yields an estimate of the popu-

lation growth trajectory that averages over the distribution of

missingness. For these analyses, the population estimate is as

follows:

??? ? ˆ?1??ˆ?1?? ? ˆ?2??ˆ?2?? ? ˆ?3??ˆ?3?, (10)

where the numeric superscript denotes the missing data pattern,

? ˆ? p?is the proportion of cases in missing data class p, and ?ˆ? p?is

the class-specific estimate. Because the averaging process is iden-

tical for all estimates, ?ˆgenerically denotes a model parameter.

Table 6 gives the average estimates and the standard errors

for selected parameters. The trajectory shapes from the pattern

mixture model resemble those from the previous analyses, but

the mean difference at the final assessment is somewhat larger.

Specifically, using the class-specific coefficients to construct a

mean difference for each missing data pattern and computing

the weighted average of these estimates gives a difference of

?1.827 (SE ? 0.374, p ? .001) at the final assessment. Ex-

pressed relative to the model-implied estimate of the baseline

standard deviation, this mean difference equates to a standard-

ized effect size of d ? 2.019. Because the marginal estimates

(i.e., the result of Equation 10) are a function of the model

parameters, a pattern mixture analysis does not automatically

produce standard errors. Consequently, it is necessary to use the

multivariate delta method to derive an approximate standard

error (Hedeker & Gibbons, 1997; Hogan & Laird, 1997). For-

tunately, the Mplus MODEL CONSTRAINT command can

generate the average estimates and their standard errors, so

further computations are unnecessary. Descriptions of the mul-

tivariate delta method are available elsewhere in the literature

(MacKinnon, 2008; Raykov & Marcoulides, 2004), and Enders

Figure 9.

generated panels A and B, the neighboring case missing variable restriction produced panels C and D, and the

available case missing variable restriction produced panels E and F. Panels A, C, and E are implausible

trajectories from initial analyses.

Class-specific model-implied growth trajectories. The complete case missing variable restriction

12

ENDERS

Page 13

(2010) sketches the computational details for various identify-

ing restrictions.

Neighboring Case Missing Variable Restriction

The second pattern mixture model analysis used the neighboring

case missing variable restriction to equate the inestimable qua-

dratic parameters for Class 3 (dropout after the second assessment)

to the estimates from Class 2 (dropout after the third assessment).

Consistent with the complete case restriction, the initial estimates

produced fitted trajectories that fell outside the plausible score

range. Figure 9C shows the model-implied growth trajectories

from the initial analysis. Some experimentation revealed that con-

straining the quadratic intercept for Pattern 2 (and, by extension,

the quadratic intercept for Pattern 3) to a value of .070 or lower

produced growth curves that stayed within bounds. The middle

portion of Table 5 lists the class-specific estimates from the revised

model, and Figure 9D displays the corresponding trajectories.

Table 6 gives the average estimates and the standard errors from

the neighboring case missing variable restriction. The model-

implied mean difference is ?1.957 (SE ? 0.522, p ? .001), and

the corresponding effect size is d ? 2.163. The effect size differ-

ence is largely due to the elevated growth trajectory for the placebo

condition in Class 3. Again, it is important to reiterate that the

differences between the two models result from applying different

sets of assumptions about the unobserved data. There is no way to

empirically assess the accuracy of competing estimates.

Available Case Missing Variable Restriction

The final analysis implemented the available case missing vari-

able restriction. Recall that this approach achieves identification by

equating an inestimable parameter to the weighted average of the

estimates from other patterns. Applied to the current example, the

available case restriction replaced the quadratic intercept for Class

3 (dropout after the second assessment) with the weighted average

of the intercept estimates from the first two classes. The weight for

Class 1 was 336/389 ? .864, and the weight for Class 2 was

53/389 ? .136. Consistent with the previous analyses, the initial

model produced trajectories that fell outside the plausible score

range (see Figure 9E). Because the available case restriction ap-

plied the largest weight to the estimates from the complete cases,

the growth curves in Figure 9E closely resemble those from the

complete case missing variable restriction in Figure 9A. Changing

Class 1’s contribution to the inestimable regression coefficient

from .021 to .080 and constraining the quadratic intercept for Class

2 to a value of .070 or lower produced plausible growth curves.

Notice that these modifications are the same as those from the

previous analysis. The bottom section of Table 5 gives the class-

specific estimates from the revised model, and Figure 9F displays

the corresponding trajectories.

Table 6 displays the population estimates and their standard

errors. It is, perhaps, not surprising that the available case restric-

tion produced estimates that were virtually identical to those of the

complete case missing variable restriction. The similarity is owed

to the fact that complete cases primarily determined the values of

the inestimable parameters (the weight for this group was .864,

compared with .136 for Class 2). The mean difference and stan-

dardized effect size values from the analysis (?1.845 and 2.038,

respectively) were also virtually identical to those of the complete

case restriction (?1.827 and 2.019, respectively).

Analysis Summary

The preceding analysis examples applied seven different mod-

els—and thus seven sets of assumptions—to the psychiatric trial

data. Although the analyses produced the same substantive con-

clusion (i.e., the drug group exhibited dramatic improvement rel-

ative to the placebo group), the standardized effect size estimates

Table 6

Pattern Mixture Model Estimates Averaged Across

Missing Data Patterns

Parameter

CCMVNCMV ACMV

Estimate

SE

Estimate

SE

Estimate

SE

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

Week 6 difference

Note.CCMV ? complete case missing variable restriction; NCMV ?

neighboring case missing variable restriction; ACMV ? available case

missing variable restriction.

5.253

?0.271

0.027

0.027

?0.488

0.030

?1.827

0.079

0.069

0.010

0.092

0.093

0.014

0.374

5.253

?0.276

0.033

0.027

?0.484

0.026

?1.957

0.079

0.068

0.008

0.092

0.096

0.020

0.522

5.253

?0.271

0.028

0.026

?0.487

0.029

?1.845

0.079

0.069

0.009

0.092

0.093

0.014

0.388

Table 5

Class-Specific Estimates From Pattern Mixture Models

Parameter

Class 1

(n ? 336)

Class 2

(n ? 53)

Class 3

(n ? 48)

Complete case identifying restriction

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

5.154

?0.278

0.021

0.125

?0.342

0.021

5.456

?0.276

0.070

?0.250

?0.878

0.040

5.722

?0.216

0.021

?0.356

?1.076

0.080

Neighboring case identifying restriction

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

5.154

?0.278

0.021

0.125

?0.342

0.021

5.456

?0.276

0.070

?0.250

?0.878

0.040

5.722

?0.264

0.070

?0.356

?1.037

0.040

Available case identifying restriction

Placebo intercept

Placebo linear

Placebo quadratic

Intercept difference

Linear difference

Quadratic difference

Note.Italic typeface denotes donor estimates for Class 3. Bold typeface

denotes constrained parameters.

5.154

?0.278

0.021

0.125

?0.342

0.021

5.456

?0.276

0.070

?0.250

?0.878

0.040

5.722

?0.222

0.028

?0.356

?1.071

0.075

13

MISSING NOT AT RANDOM MODELS

Page 14

had a range of nearly seven tenths of a standard deviation unit.

Because the models applied different assumptions, this variation

might not come as a surprise. Nevertheless, the fluctuation in the

effect size estimates is disconcerting. Had the intervention effect

not been so dramatic, it could have easily been the case that the

models produced conflicting evidence about the efficacy of the

drug condition. Unfortunately, it is relatively common for sensi-

tivity analyses to produce discrepant estimates (Demirtas & Scha-

fer, 2003; Foster & Fang, 2004). The next section offers some

practical advice on model selection.

Choosing Among Competing Models

MNAR modeling is an active area of methodological research,

and the procedures in this article represent just a few possible

options. Given the wide array of analytic choices, model selection

becomes an important practical consideration; this is particularly

true when different models produce disparate estimates, as they do

in the preceding examples. Although somewhat disconcerting, it is

impossible to provide general recommendations about model se-

lection because every analytic option—MAR or MNAR—relies on

one or more untestable assumptions. Although an MAR and an

MNAR model may produce identical fit to the observed data, they

make fundamentally different predictions about the unobserved

score values (Molenberghs & Kenward, 2007). Because there is no

way to empirically assess the validity of these predictions, model

selection is not about choosing a single correct model. Rather,

researchers must choose the model with the most defensible set of

assumptions and construct a logical argument that defends that

choice. In some situations, it is possible to discount certain models

a priori (e.g., the preceding selection model analyses are suspect

because of the normality violations). In other situations, substan-

tive considerations may lead researchers to prefer one model over

the other. This section outlines a few such considerations.

To begin, consider the selection modeling framework. Although

Wu and Carroll’s (1988) and Diggle and Kenward’s (1994) models

have commonalities, study-specific features may influence model

selection. To illustrate, consider two hypothetical research scenar-

ios. First, suppose that a psychologist is studying quality of life in

a clinical trial for a new cancer medication and finds that a number

of patients become so ill (i.e., their quality of life becomes so poor)

that they can no longer participate in the study. In this situation, it

is reasonable to believe that attrition is related to one’s develop-

mental trajectory, such that patients with rapidly decreasing quality

of life scores are most likely to leave the study because they die or

become too ill to participate. To the extent that this assumption is

correct, Wu and Carroll’s model may be preferred because the

developmental trajectories—as opposed to single realizations of

the quality of life measure—are probable determinants of miss-

ingness. Methodologists have also suggested that the random co-

efficient model is well-suited for situations where the outcome

measure is highly variable over time (Albert & Follmann, 2009) or

is an unreliable indicator of an underlying latent construct (Little,

1995).

As a second example, consider a drug treatment study that tracks

substance use in the weeks following an intervention. In this

situation, it seems plausible that attrition is related to the actual

outcome at time t, such that participants who use drugs prior to an

assessment fail to show up because they will screen positive for

substance use. Diggle and Kenward’s (1994) model may be most

appropriate for this scenario because the outcome scores at a

particular time point—as opposed to the developmental trends—

are likely to determine missingness. Although the substantive

research problem may favor one selection model over the other, it

is important to reiterate that the data provide no basis for empir-

ically comparing the two models. Consequently, conducting a

sensitivity analysis that fits both models to the same data is usually

a good strategy.

Substantive and practical considerations also come into play

with pattern mixture models. The idea of estimating developmental

trajectories separately for each missing data pattern is intuitively

appealing, particularly for researchers who are familiar with mul-

tiple group structural equation models. In some situations, the

class-specific estimates can provide additional insight into one’s

substantive hypotheses. For example, in an intervention study, it

may be interesting to examine the response to treatment within

each dropout class in addition to estimating a marginal treatment

effect that averages over missing data patterns. Although the

previous analysis examples did not illustrate this possibility, the

pattern mixture model can incorporate predictors of dropout class

membership. This too can provide useful substantive information

(e.g., by identifying factors that are related to dropout or to a

particular developmental trajectory). One of the pattern mixture

model’s often-cited advantages is that it forces researchers to

explicitly state their assumptions in the form of values for the

inestimable parameters. The identifying restrictions that I imple-

mented in the earlier analysis examples are just a few possibilities,

and experimenting with different options is quite easy in Mplus.

The ability to identify the members of each missing data pattern is

potentially useful in this regard. For example, if the members of a

particular dropout group share a common set of characteristics

(e.g., in a school-based study, the early dropout class has a high

proportion of learning disabled children), it might be possible to

use previous research or substantive knowledge to formulate rea-

sonable predictions for the inestimable parameters. The flexibility

of the pattern mixture model makes it a highly useful tool for

conducting sensitivity analyses.

Sensitivity Analyses

In the missing data literature, a common viewpoint is that

researchers should explore the stability of their substantive con-

clusions by fitting alternate models to the same data. I previously

illustrated this procedure by fitting seven different models to the

psychiatric trial data. Exploring alternate models is just one form

of sensitivity analysis, and methodologists have outlined many

other procedures. Although it is impossible to briefly summarize

the broad range of viewpoints and analytic approaches from the

sensitivity analysis literature, it is nevertheless important to raise

awareness of this topic. Molenberghs and colleagues (Molen-

berghs & Kenward, 2007; Molenberghs, Verbeke, & Kenward,

2009) provided a detailed discussion of these procedures, and this

section summarizes a few of their key points.

Within a given modeling framework, it is useful to explore the

sensitivity of key parameter estimates to various model modifica-

tions. As an example, consider the selection modeling framework.

Both Diggle and Kenward’s (1994) and Wu and Carroll’s (1988)

models are sensitive to minor violations of distributional assump-

14

ENDERS

Page 15

tions; the former assumes that the repeated measures variables are

multivariate normal, and the latter assumes that the random effects

(i.e., the individual intercepts and slopes) are normal. Examining

the change in key parameter estimates after modifying distribu-

tional assumptions is an important type of sensitivity analysis.

Although it is not the only method for doing so, finite mixture

modeling (e.g., growth mixtures) is a useful tool for representing

nonnormal manifest variables as well as nonnormal random effects

(McLachlan & Peel, 2000; B. Muthe ´n & Asparouhov, 2009). In

the context of MNAR analyses, methodologists have outlined

latent class versions of selection models of the types developed by

Diggle and Kenward and Wu and Carroll (Beunckens et al., 2008;

B. Muthe ´n et al., 2011) that are readily estimable with Mplus.

Similar strategies are available for pattern mixture models (B.

Muthe ´n et al., 2011; Roy, 2003; Roy & Daniels, 2008).

Modifying the growth model’s covariance structure is a second

option for exploring sensitivity within a given modeling frame-

work. Conventional wisdom suggests that modifying the covari-

ance structure has little to no impact on average growth rate

estimates (Singer & Willett, 2003). In large part, this is because the

mean and the covariance structure are independent in a complete-

data maximum likelihood analysis (i.e., the off-diagonal elements

in the parameter covariance matrix equal zero). Because this

independence is lost with missing data, modifying the covariance

structure (e.g., estimating class-specific variance components; es-

timating residual covariances; introducing an alternate covariance

structure) can potentially alter the latent variable means; Molen-

berghs et al. (2009) gave an example that illustrates this point.

Although it is unclear whether these modifications materially

affect the performance of MNAR models, they are nevertheless

easy to implement.

Finally, methodologists have developed local influence statistics

that attempt to identify cases that unduly impact the parameters of

the missingness model (e.g., the logistic regressions from Diggle &

Kenward’s, 1994, model) or the substantive model. These statistics

are conceptually similar to familiar measures from the ordinary

least squares regression literature (e.g., Cook’s D). Although these

influence statistics do not necessarily identify respondents with an

MNAR missingness mechanism, they can provide important in-

sight into the behavior of a model. For example, there is evidence

to suggest that a complete case with an anomalous score profile

can influence estimates in a way that gives credence to an MNAR

mechanism (Jansen et al., 2006; Kenward, 1998). Interested read-

ers can consult various work by Molenberghs and colleagues for a

detailed overview of local influence measures for missing data

analyses (Jansen et al., 2006; Molenberghs & Kenward, 2007;

Molenberghs et al., 2009).

Discussion

Methodologists have long advocated for the use of MAR-based

missing data handling procedures. The MAR assumption is often

very reasonable, but there are many situations where missingness

is related to the outcome variable itself. This so-called MNAR

mechanism is problematic because MAR-based procedures pro-

duce biased estimates. MNAR analysis models have received

considerable attention in the biostatistics literature, particularly in

the context of longitudinal data. Although some of these models

have been in the literature for many years, they have been slow to

migrate to the social and the behavioral sciences. The purpose of

this article is to describe two classic MNAR modeling frame-

works: the selection model and pattern mixture model. The com-

monality among MNAR models is that they integrate a submodel

that describes the propensity for missing data into the analysis. The

selection model augments the growth curve analysis with a set of

logistic regressions that describe the probability of missing data at

each occasion. The pattern mixture approach estimates the growth

model separately within each missing data pattern and subse-

quently averages over the missing data patterns.

The fundamental problem with missing data analyses is that it is

generally impossible to fully rule out MNAR missingness; by the

same token, it is impossible to disprove the MAR assumption.

Despite their intuitive appeal, MNAR analyses rely on untestable

assumptions (e.g., normally distributed latent variables, accurate

values for inestimable parameters), and relatively minor violations

of these assumptions can introduce substantial bias. The fact that

MNAR models produce accurate estimates under a relatively nar-

row range of conditions has led some methodologists to caution

against their routine use. A common opinion is that these models

are most appropriate for sensitivity analyses that apply different

models (and thus different assumptions) to the same data.

MNAR analysis techniques continue to receive a great deal of

attention in the methodological literature, and they are likely to

gain in popularity. Despite their limitations, these models are

important options to consider, particularly when outcome-related

attrition seems plausible. At the very least, MNAR models can

augment the results from an MAR-based analysis. Although sen-

sitivity analyses are useful for exploring the impact of modeling

choices on key parameter estimates, the observed data provide no

basis for model selection. Ultimately, choosing a missing data

handling technique—be it MAR or MNAR—is really a matter of

choosing among a set of competing assumptions. Consequently,

researchers should choose a model with the most defensible set of

assumptions, and they should provide a logical argument that

supports this choice.

References

Albert, P. S., & Follmann, D. A. (2000). Modeling repeated count data

subject to informative dropout. Biometrics, 56, 667–677.

Albert, P. S., & Follmann, D. A. (2009). Shared-parameter models. In G.

Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.),

Longitudinal data analysis (pp. 433–452). Boca Raton, FL: Chapman &

Hall.

Albert, P. S., Follmann, D. A., Wang, S. A., & Suh, E. B. (2002). A latent

autoregressive model for longitudinal binary data subject to informative

missingness. Biometrics, 58, 631–642.

Beunckens, C., Molenberghs, G., Verbeke, G., & Mallinckrodt, C. (2008).

A latent-class mixture model for incomplete longitudinal Gaussian data.

Biometrics, 64, 96–105.

Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural

equation approach. Hoboken, NJ: Wiley.

Carpenter, J. R., Kenward, M. G., & Vansteelandt, S. (2006). A compar-

ison of multiple imputation and doubly robust estimation for analyses

with missing data. Journal of the Royal Statistical Society, Series A, 169,

571–584.

Dantan, E., Proust-Lima, C., Letenneur, L., & Jacqmin-Gadda, H. (2008).

Pattern mixture models and latent class models for the analysis of

15

MISSING NOT AT RANDOM MODELS

Page 16

multivariate longitudinal data with informative dropouts. International

Journal of Biostatistics, 4, 1–26.

Demirtas, H., & Schafer, J. L. (2003). On the performance of random-

coefficient pattern-mixture models for non-ignorable drop-out. Statistics

in Medicine, 22, 2553–2575.

Diggle, P., & Kenward, M. G. (1994). Informative dropout in longitudinal

data analysis. Applied Statistics, 43, 49–94.

Enders, C. K. (2010). Applied missing data analysis. New York, NY:

Guilford Press.

Follmann, D., & Wu, M. (1995). An approximate generalized model with

random effects for informative missing data. Biometrics, 51, 151–168.

Foster, E. M., & Fang, G. Y. (2004). Alternative methods for handling

attrition: An illustration using data from the Fast Track evaluation.

Evaluation Review, 28, 434–464.

Hancock, G. R., & Lawrence, F. R. (2006). Using latent growth models to

evaluate longitudinal change. In G. R. Hancock & R. O. Mueller (Eds.),

Structural equation modeling: A second course (pp. 171–196). Green-

wood, CT: Information Age.

Heckman, J. T. (1976). The common structure of statistical models of

truncation, sample selection and limited dependent variables and a

simple estimator for such models. Annals of Economic and Social

Measurement, 5, 475–492.

Heckman, J. T. (1979). Sample selection bias as a specification error.

Econometrica, 47, 153–161.

Hedeker, D., & Gibbons, R. D. (1997). Application of random-effects

pattern-mixture models for missing data in longitudinal studies. Psycho-

logical Methods, 2, 64–78.

Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Hobo-

ken, NJ: Wiley.

Hogan, J. W., & Laird, N. M. (1997). Mixture models for the joint

distribution of repeated measures and event times. Statistics in Medi-

cine, 16, 239–257.

Hogan, J. W., Roy, J., & Korkontzelou, C. (2004). Handling drop-out in

longitudinal studies. Statistics in Medicine, 23, 1455–1497.

Jansen, I., Hens, N., Molenberghs, G., Aerts, M., Verbeke, G., & Kenward,

M. G. (2006). The nature of sensitivity in missing not at random models.

Computational Statistics and Data Analysis, 50, 830–858.

Kenward, M. G. (1998). Selection models for repeated measurements with

non-random dropout: An illustration of sensitivity. Statistics in Medi-

cine, 17, 2723–2732.

Lin, H., McCulloch, C. E., & Rosenheck, R. A. (2004). Latent pattern

mixture models for informative intermittent missing data in longitudinal

studies. Biometrics, 60, 295–305.

Little, R. (1995). Modeling the drop-out mechanism in repeated-measures

studies. Journal of the American Statistical Association, 90, 1112–1121.

Little, R. (2009). Selection and pattern mixture models. In G. Fitzmaurice,

M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longitudinal data

analysis (pp. 409–431). Boca Raton, FL: Chapman & Hall.

Little, R., & Rubin, D. B. (2002). Statistical analysis with missing data

(2nd ed.). Hoboken, NJ: Wiley.

MacKinnon, D. P. (2008). Introduction to statistical mediation analysis.

Mahwah, NJ: Erlbaum.

McLachlan, G., & Peel, D. (2000). Finite mixture models. New York, NY:

Wiley.

Michiels, B., Molenberghs, G., Bijnens, L., Vangeneugden, T., & Thijs, H.

(2002). Selection models and pattern-mixture models to analyse longi-

tudinal quality of life data subject to drop-out. Statistics in Medicine, 21,

1023–1041.

Molenberghs, G., & Kenward, M. G. (2007). Missing data in clinical

studies. West Sussex, England: Wiley.

Molenberghs, G., Michiels, B., Kenward, M. G., & Diggle, P. J. (1998).

Monotone missing data and pattern-mixture models. Statistica Neer-

landica, 52, 153–161.

Molenberghs, G., Verbeke, G., & Kenward, M. G. (2009). Sensitivity

analysis for incomplete data. In G. Fitzmaurice, M. Davidian, G. Ver-

beke, & G. Molenberghs (Eds.), Longitudinal data analysis (pp. 501–

551). Boca Raton, FL: Chapman & Hall.

Muthe ´n, B., & Asparouhov, T. (2009). Growth mixture modeling: Analysis

with non-Gaussian random effects. In G. Fitzmaurice, M. Davidian, G.

Verbeke, & G. Molenberghs (Eds.), Longitudinal data analysis (pp.

144–165). Boca Raton, FL: Chapman & Hall.

Muthe ´n, B., Asparouhov, T., Hunter, A., & Leuchter, A. (2011). Growth

modeling with non-ignorable dropout: Alternative analyses of the

STAR*D antidepressant trial. Psychological Methods, 16.

Muthe ´n, B., Kaplan, D., & Hollis, M. (1987). On structural equation

modeling with data that are not missing completely at random. Psy-

chometrika, 52, 431–462.

Muthe ´n, B., & Masyn, K. (2005). Discrete-time survival mixture analysis.

Journal of Educational and Behavioral Statistics, 20, 27–58.

Muthe ´n, B., & Shedden, K. (1999). Finite mixture modeling with mixture

outcomes using the EM algorithm. Biometrics, 55, 463–469.

Muthe ´n, L. K., & Muthe ´n, B. O. (1998–2010). Mplus user’s guide (6th

ed.). Los Angeles, CA: Muthe ´n & Muthe ´n.

Raykov, T., & Marcoulides, G. A. (2004). Using the delta method for

approximate interval estimation of parameter functions in SEM. Struc-

tural Equation Modeling: A Multidisciplinary Journal, 11, 621–637.

Robins, J. M., & Rotnitzky, A. (1995). Semiparametric efficiency in

multivariate regression models with missing data. Journal of the Amer-

ican Statistical Association, 90, 122–129.

Rotnitzky, A. (2009). Inverse probability weighted methods. In G. Fitz-

maurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longi-

tudinal data analysis (pp. 453–476). Boca Raton, FL: Chapman & Hall.

Roy, J. (2003). Modeling longitudinal data with nonignorable dropout

using a latent dropout class model. Biometrics, 59, 829–836.

Roy, J., & Daniels, M. J. (2008). A general class of pattern mixture models

for nonignorable dropout with many possible dropout times. Biometrics,

64, 538–545.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. London,

England: Chapman & Hall.

Schafer, J. L. (2003). Multiple imputation in multivariate problems when

the imputation and analysis models differ. Statistica Neerlandica, 57,

19–35.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state

of the art. Psychological Methods, 7, 147–177.

Scharfstein, D. O., Rotnitzky, A., & Robins, J. M. (1999). Adjusting for

nonigorable drop-out using semi-parametric nonresponse models. Jour-

nal of the American Statistical Association, 94, 1096–1146.

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis:

Modeling change and event occurrence. New York, NY: Oxford Uni-

versity Press.

Thijs, H., Molenberghs, G., Michiels, B., & Curran, D. (2002). Strategies

to fit pattern-mixture models. Biostatistics, 3, 245–265.

Verbeke, G., & Molenberghs, G., & Kenward, M. G. (2000). Linear mixed

models for longitudinal data. New York, NY: Springer-Verlag.

Wu, M. C., & Bailey, K. R. (1989). Estimation and comparison of changes

in the presence of informative right censoring: Conditional linear model.

Biometrics, 45, 939–955.

Wu, M. C., & Carroll, R. J. (1988). Estimation and comparison of changes

in the presence of informative right censoring by modeling the censoring

process. Biometrics, 44, 175–188.

Yuan, Y., & Little, R. J. A. (2009). Mixed-effect hybrid models for

longitudinal data with nonignorable dropout. Biometrics, 65, 478–486.

Received January 4, 2010

Revision received August 3, 2010

Accepted November 11, 2010 ?

16

ENDERS