Page 1

A Bayesian Two-Part Latent Class Model for Longitudinal Medical

Expenditure Data: Assessing the Impact of Mental Health and Substance

Abuse Parity

Brian Neelon1,∗, A. James O’Malley2, and Sharon-Lise T. Normand2,3

1Nicholas School of the Environment, Duke University, Durham, North Carolina, U.S.A.

2Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, U.S.A.

3Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, U.S.A.

email: brian.neelon@duke.edu

Summary: In 2001, the U.S. Office of Personnel Management required all health plans participating

in the Federal Employees Health Benefits Program to offer mental health and substance abuse

benefits on par with general medical benefits. The initial evaluation found that, on average, parity did

not result in either large spending increases or increased service use over the four-year observational

period. However, some groups of enrollees may have benefited from parity more than others. To

address this question, we propose a Bayesian two-part latent class model to characterize the effect

of parity on mental health use and expenditures. Within each class, we fit a two-part random

effects model to separately model the probability of mental health or substance abuse use and mean

spending trajectories among those having used services. The regression coefficients and random effect

covariances vary across classes, thus permitting class-varying correlation structures between the two

components of the model. Our analysis identified three classes of subjects: a group of low spenders

that tended to be male, had relatively rare use of services, and decreased their spending pattern

over time; a group of moderate spenders, primarily female, that had an increase in both use and

mean spending after the introduction of parity; and a group of high spenders that tended to have

chronic service use and constant spending patterns. By examining the joint 95% highest probability

density regions of expected changes in use and spending for each class, we confirmed that parity had

an impact only on the moderate spender class.

Page 2

Biometrics xx, 0–21DOI: xxx

xx 20yy

Key words: Bayesian analysis; Growth mixture model; Latent class model; Mental health parity;

Semi-continuous data; Two-part model.

1

Page 3

Bayesian Two-Part Latent Class Model1

1. Introduction

The Federal Employees Health Benefits (FEHB) Program sponsors health insurance benefits

for more than 8.5 million federal employees and retirees, plus their spouses and dependents.

Over 250 health plans currently participate in the FEHB program. At the beginning of

2001, the U.S. Office of Personnel Management implemented a parity policy that required

all health plans participating in the FEHB Program to offer mental health and substance

abuse benefits on par with general medical benefits (U.S. OPM, 2000). An early evaluation

of the policy examined changes in total mental health expenditures, including out-of-pocket

and plan spending, from 1999 to 2002, and found that, on average, parity did not result

in either the large increases in spending predicted by opponents of parity or the increased

service use anticipated by mental health advocates (Goldman et al., 2006). Because most

of the literature on the impact of parity has focused on the average effect of the policy on

costs and access to mental health and substance abuse care, little is known about its impact

on specific enrollee subpopulations—for example, the sickest patients or those carrying the

greatest financial burden of illness.

To answer this question, there are three key features of longitudinal medical expendi-

ture data that must be addressed. The data are semi-continuous, assuming non-negative

values with a spike at zero for those who use no services, followed by a continuous, right-

skewed distribution for those who have used services. Table 1 provides a description of the

total spending data for a sample of 1581 FEHB enrollees from one state, each with four

years of data, yielding a total of 6324 observations. Over 80% of enrollees had no annual

mental health expenditures, while a small fraction had large expenditures. The percentage of

spenders increased steadily over time, while median spending increased immediately following

introduction of the parity directive and then returned to baseline levels by 2002.

[Table 1 about here.]

Page 4

2 Biometrics, xx 20yy

Another important feature of the data concerns repeated measurements. In the FEHB

data, each enrollee contributes an observation for each of the four study years, introducing

within-subject correlation. Moreover, in each year, there are two outcomes per enrollee:

use of mental health/substance abuse services, and if use, the level of use as measured by

expenditures. Further, it may be reasonable to assume that the probability of some use

is correlated with the expected level of spending. An appropriate statistical model should

address these multiple sources of correlation.

One modeling strategy is to apply a longitudinal two-part model (Olsen and Schafer, 2001;

Tooze, Grunwald, and Jones, 2002; Ghosh and Albert, 2009). Two-part models are mixtures

of a point mass at zero followed by a right-skewed distribution (e.g., lognormal) for the

nonzero values. The two mixture components are modeled in stages. First, the probability

of service use is modeled via mixed effects probit or logistic regression. Next, conditional on

some usage, the expected spending level is modeled through (most commonly) a lognormal

mixed effects model. The random effects for the two components are typically assumed to

be correlated; ignoring this potential correlation can yield biased inferences (Su, Tom, and

Farewell, 2009).

Finally, because enrollees tend to share characteristics related to spending, it is reasonable

to assume that FEHB enrollees’ trajectories fall into a small number of classes. One natural

mechanism to handle this feature is to use latent class models, in particular latent class

“heterogeneity” or “growth mixture” models (Verbeke and Lesaffre, 1996; Muth´ en and

Shedden, 1999; Muth´ en et al., 2002). Growth mixture models (GMMs) assume that subjects

first fall into one of a finite number of latent classes characterized by a class-specific mean

trajectory; then, about these class means, subjects have their own unique longitudinal

trajectories defined by a set of random effects with class-specific variance parameters. As

such, GMMs can be viewed as finite mixtures of random effects models.

Page 5

Bayesian Two-Part Latent Class Model3

Growth mixtures have become increasingly popular as a way of decomposing complex

heterogeneity in longitudinal models. Lin et al. (2000) developed a GMM to estimate class-

specific PSA trajectories among men at risk for prostate cancer. Proust-Lima, Letenneur,

and Jacqmin-Gadda (2007) proposed a GMM to jointly model a set of correlated longitudinal

biomarkers and a binary event. Lin et al. (2002) and Proust-Lima et al. (2009) developed

related models to analyze longitudinal biomarkers and a time to event. Beunckens et al.

(2008) proposed a GMM for incomplete longitudinal data. In the Bayesian setting, Lenk and

DeSarbo (2000) describe computational strategies for fitting GMMs; Elliott et al. (2005)

developed a Bayesian GMM to jointly analyze daily affect and negative event occurrences

during a 35-day study period; and recently, Leiby et al. (2009) fitted a Bayesian latent class

factor-analytic model to analyze multiple outcomes from a clinical trial evaluating a new

treatment for interstitial cytitis.

We build on this previous work to develop a Bayesian two-part growth mixture model for

characterizing the effect of parity on mental health use and expenditures. The advantages

of Bayesian inference are well-known and include elicitation of prior beliefs, avoidance of

asymptotic approximations, and, as we demonstrate below, practical estimation of parameter

contrasts and multidimensional credible regions. Within each class, we fit a probit-lognormal

model with class-specific regression coefficients and random effects. An attractive feature

of the model is that it permits the random effect covariance to vary across the classes.

For example, one class might comprise enrollees with frequent high expenditures (positive

correlation between the probability of spending and the actual amount spent), whereas

another class might comprise enrollees with frequent but modest expenditure (negative

correlation between probability of spending and amount spent).

The remainder of this paper is organized as follows: Section 2 outlines the proposed

model; Section 3 describes prior elicitation, posterior computation, model comparison, and

Page 6

4Biometrics, xx 20yy

evaluation of model fit; Section 4 describes a small simulation study; Section 5 applies the

method to the FEHB study; and the final section provides a discussion and directions for

future work.

2. The Two-Part Growth Mixture Model

The two-part model for semi-continuous data is a mixture of a degenerate distribution at

zero and a positive continuous distribution, such as a lognormal (LN), for the nonzero values.

The probability distribution is expressed as

f(yi) = (1 − φ)1−di?φ × LN(yi;µ,τ2)?di, i = 1,...,n; 0 ? p ? 1,

where yiis the observed response of the random variable Yi; diis an indicator that yi> 0;

φ = Pr(Yi > 0); and LN(yi;µ,τ2) denotes the lognormal density evaluated at yi, with µ

and τ2representing the mean and variance of log(Yi|Yi> 0). When φ = 0, the distribution

is degenerate at 0, and when φ = 1, there are no zeros and the distribution reduces to a

lognormal density.

The model can be extended to allow for repeated measures and latent classes by introducing

a latent categorical variable Cithat takes the value k (k = 1,...,K) if subject i belongs to

class k. In its most general form, the model is given by

f(yij|Ci= k,bi) = (1 − φijk)1−dij?φijk× LN(yij;µijk,τ2

g(φijk) = x?

k)?dij

1ijαk+ z?

1ijb1i (binomial component)

log(µijk) = x?

2ijβk+ z?

2ijb2i (lognormal component), (1)

where yijis the j-th observed response for subject i (j = 1,...,ni); g denotes a link function,

such as the probit or logit link; xlijand zlijare pl×1 and ql×1 vectors of fixed and random

effect covariates for component l(l = 1,2), including appropriate time-related variables (e.g.,

polynomials of time or binary indicators representing measurement occasions); αkand βk

are fixed effect coefficients specific to class k; and bi|Ci= (b?

1i,b?

2i)?|Ci∼ Nq1+q2(0,Σk) is

Page 7

Bayesian Two-Part Latent Class Model5

a stacked vector of random effects for subject i, with class-specific covariance Σk. When

q1= q2= 1, the model reduces to the widely used random-intercept two-part model. For

the standard random-slope model, q1 = q2 = 2 and Σk is a 4 × 4 matrix that includes

cross-covariances between the random intercepts and slopes of the two components. While

this model captures additional heterogeneity over time, restrictions on Σkmay be needed to

aid identifiability.

To complete the model, we assume that the class indicator Cihas a “categorical distribu-

tion” taking the value k with probability πik, where πikis linked to a r-dimensional covariate

vector, wi, via a generalized logit model; that is,

Ci ∼ Cat(πi1,...,πiK),

ew?

?K

πik

=

iγk

h=1ew?

iγh, with γ1= 0 for identifiability.(2)

(We use the term “categorical distribution” in lieu of “multinomial distribution” since Ck

is an integer-valued variable ranging from 1 to K.) And finally, throughout the paper, we

assume that the number of classes, K, is known. In Section 3.3, we discuss Bayesian model-

selection strategies for determining the optimal value of K, and in the Discussion section,

we note alternatives to fixing K.

The latent class two-part model is quite general in that it allows the fixed effects and

random effect covariances to differ across classes. For the special case when K = 1, the model

reduces to a Bayesian version of the standard two-part model for semi-continuous data (c.f.,

Tooze et al., 2002). For K > 2 classes, the model introduces two levels of between-subject

heterogeneity: one induced by the latent classes, and a second represented by the within-class

covariances Σk. Our model can therefore be viewed as a two-part growth mixture model.

Note that Σkmay vary across classes. For example, for some classes, b1iand b2imay be

positively correlated, while for others they may be negatively correlated or even uncorrelated.

In fact, the structure of Σkcan itself vary across classes. For instance, in one class, there

Page 8

6 Biometrics, xx 20yy

may be no particular structure (unstructured covariance), while in another, an exchangeable

or an AR1 structure may be more suitable.

3. Priors Specification, Posterior Computation, and Model Selection

3.1 Prior Specification

Under a fully Bayesian approach, prior distributions are assumed for all model parameters.

To ensure a well-identified model with proper posteriors determined almost entirely by

the data, we assign weakly informative proper distributions to all class-specific parameters

{αk,βk,τ2

Np1(µα,Σα) and βk∼ Np2(µβ,Σβ). We assume that the prior hyperparameters are identical

across classes, but this is not necessary. Each Σkis assumed to have a conjugate inverse-

k,Σk,γk}. For the fixed effects, we assume exchangeable normal priors: αk ∼

Wishart IW(ν0,D0) (ν0? k) distribution. Our experience suggests that the conjugate IW

prior performs well in zero-inflated models with low-dimensional random effect covariance

matrices (c.f., Neelon, O’Malley, and Normand, 2010).

For, the lognormal precisions, τ−2

k, we assume conjugate Ga(λ,δ) priors. Following Garrett

and Zeger (2000) and Elliott et al. (2005), we recommend γk∼ Nr[0,(9/4)Ir], which

induces a prior for πik centered at 1/K and bounded away from 0 and 1. If there are no

class-membership covariates (i.e., r = 1), a conjugate Dirichlet(e1,...,eK) prior can be

placed directly on the class-membership probabilities π = (π1,...,πK)?, which can lead to

convenient closed-form full conditionals. Fr¨ uhwirth-Schnatter (2006) recommends choosing

ek> 1 to bound πkaway from zero.

Page 9

Bayesian Two-Part Latent Class Model7

3.2 Posterior Computation

Let θk = (α?

k,β?

k,τ2

k)?. Assuming prior independence, the corresponding joint posterior is

given by

π(θ1,...,θK;C;b;Σ1,...,ΣK;γ2,...,γK|y) ∝

K

?

where f(yij|θk,bi) is given in equation (1) and b = (b?

For posterior computation, we propose an MCMC algorithm that combines draws from full

k=1

n

?

i=1

?ni

j=1

?

f(yij|θk,bi)πikf(bi|Σk)

?I(Ci=k)

π(αk)π(βk)π(τ2

k)π(Σk)

K

?

h=2

π(γh),

1,...,b?

n)?.

conditionals and Metropolis steps. After assigning initial values to the model parameters,

the algorithm iterates between the following steps:

(1) For k = 2,...,K, update the vector γkusing a random-walk Metropolis step;

(2) Sample the class indicators Ci(i = 1,...,n) from a categorical distribution with posterior

probability vector pi= (pi1,...,piK);

(3) For k = 1,...,K, sample αk, βk, τ−2

k, and Σkfrom their full conditionals;

(4) Update biusing a random-walk Metropolis step.

Details of the algorithm are provided in the Web Appendix. Convergence is monitored by

running multiple chains from dispersed initial values and then applying standard Bayesian

diagnostics, such as trace plots; autocorrelation statistics; Geweke’s (1992) Z-diagnostic,

which evaluates the mean and variance of parameters at various points in the chain; and the

Brooks-Gelman-Rubin scale-reduction statistic?R, which compares the within-chain variation

to the between-chain variation (Gelman et al., 2004). As a practical rule of thumb, a 0.975

quantile for?R ? 1.2 is indicative of convergence. In the application below, convergence

diagnostics were performed using the R package boa (Smith, 2007).

A well-known computational issue for Bayesian finite mixture models is “label switching”

in which draws of class-specific parameters may be associated with different class labels

Page 10

8 Biometrics, xx 20yy

during the course of the MCMC run. Consequently, class-specific posterior summaries that

average across the draws will be invalid. In some cases, label switching can be avoided by

placing constraints on the class probabilities (Lenk and DeSarbo, 2000) or on the model

parameters themselves (Congdon, 2005). However, as Fr¨ uhwirth-Schnatter (2006) notes,

these constraints must be carefully chosen to ensure a unique labeling. She describes several

exploratory procedures useful for identifying appropriate constraints. As an alternative,

Stephens (2000) proposed a post-hoc relabeling algorithm that minimizes the Kullback-

Leibler distance between the posterior probability pijthat individual i is assigned to class j

under the current labeling, and the posterior probability under the “true” labeling, estimated

as the posterior mean of pij. We apply Stephens’ approach in the case study below.

3.3 Determining the Number of Classes

To determine the number of latent classes, we adopt a model selection approach and use

the deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002) to compare

models under various fixed values of K (K = 1,...,Kmax). This approach has been applied

in several previous studies involving latent class models (Elliott et al., 2005; White et al.,

2008; Leiby et al., 2009).

Like Akaike’s information criterion (AIC), the DIC provides an assessment of model fit as

well as a penalty for model complexity. The DIC is defined as D(θ) + pD, where D(θ) =

E[D(θ)|y] is the posterior mean of the deviance, D(θ), and pD= D(θ)−?D(θ) = E[D(θ)|y]−

D(E[θ|y]) is the difference in the posterior mean of the deviance and the deviance evaluated

at the posterior mean of the parameters. The deviance, typically taken as negative twice

the log-likelihood, is a measure of the model’s relative fit, whereas pDis a penalty for the

model’s complexity. For fixed effect models, the complexity—as measured by the number

of model parameters—is easily determined. For random effect models, the dimension of the

parameter space is less clear and depends on the degree of heterogeneity between subjects

Page 11

Bayesian Two-Part Latent Class Model9

(more heterogeneity implies more “effective” parameters). DIC was proposed to estimate the

number of effective parameters in a Bayesian hierarchical model.

As a rule of thumb, if two models differ in DIC by more than three, the one with the

smaller DIC is considered the best fitting (Spiegelhalter et al., 2002). For finite mixture

models, Celeux et al. (2006) recommend a modified DIC, termed DIC3, which estimates

?D(θ) using the posterior mean of the marginal likelihood averaged across the classes, a

measure invariant to label switching. Specifically,

DIC3= 2D(θ) + 2log

?

n

?

i=1

ˆf(yi)

?

,

whereˆf(yi) is the posterior mean of the marginal likelihood contribution for subject i

averaged across the classes. As Celeux et al. (2006) point out, DIC3 is closely related to

the measure proposed by Richardson (2002) to avoid overfitting the number of components.

In the application below, we use a hybrid DIC that combines DIC3with the original DIC

measure: for the fixed effects, we average across classes, as in DIC3; for the within-class

random effects, we condition on the posterior draws, as in the original DIC. This approach

preserves the conditional nature of the model, and provides a penalty for the effective

number of random effect parameters. The approach can also be viewed as a natural extension

of the one-class DIC measure provided in standard Bayesian software, such as WinBUGS

(Spiegelhalter et al., 2003).

3.4 Assessment of the Final Model Fit

To assess the adequacy of the selected model, we use posterior predictive checking (Gelman,

Meng, and Stern, 1996), whereby the observed data are compared to data replicated from

the posterior predictive distribution. If the model fits well, the replicated data, yrep, should

resemble the observed data, y. To quantify the similarity, we can choose a discrepancy

measure, T = T(y,θ), that takes an extreme value if the model is in conflict with the

Page 12

10Biometrics, xx 20yy

observed data. Popular choices for T include sample moments and quantiles, and residual-

based measures.

The Bayesian predictive p-value (PB) denotes the probability that the discrepancy measure

based on the predictive sample, Trep= T(yrep,θ), is more extreme than the observed measure

T. A Monte Carlo estimate of PBcan be computed by evaluating the proportion of draws

in which T∗> T. A p-value close to 0.50 represents adequate model fit, while p-values near

0 or 1 indicate lack of fit. The cut-off for determining lack of fit is subjective, although by

analogy to the classical p-value, a Bayesian p-value between 0.05 and 0.95 suggests adequate

fit. In some cases, a stricter range, such as (0.20, 0.80), might be more appropriate.

For the latent class two-part model, we recommend two test statistics to assess the fit of

both the binomial and lognormal components. For the binomial component, we recommend

T1 = the proportion of observations greater than zero. For the nonzero observations, we

suggest a modification of the omnibus chi-square measure proposed by Gelman et al. (2004)

?[log(yij) − µijk]

where, for the random intercept model, µijk = x?

T2=

1

M

?

k

?

i,j:yij>0

τk

?2

× I(Ci=k),

ij(βk) + b2i and M denotes the number

of nonzero observations. To generate replicate data, we first draw replicate class indicators

Crep

i

(i = 1,...,n) using expression (2); then, conditional on Crep

i

, we generate brep

i

from

N2(0,Σk); finally, we draw yrep

ij

from (1). An alternative approach is to use the actual

posterior draws of Ciand bi; however, this approach does not mimic the data-generating

process as accurately as the former approach. That said, for the analysis presented in Section

5, the two approaches yield similar results.

4. Simulation Study

To examine the properties of the proposed model, we conducted a small simulation study.

First, we simulated 100 datasets from a three-class model according to equation (1). The

Page 13

Bayesian Two-Part Latent Class Model 11

datasets contained 500 subjects, each with five observations, for a total of 2,500 observations

per dataset. The binomial and lognormal components contained class-specific fixed-effects

intercepts, fixed-effect linear time trend, and random intercepts. That is, α = (αk1,αk2)?,β =

(βk1,βk2)?, and, given Ci= k, bi= (b1i,b2i)?∼ N2(0,Σk), with

Σk=

σ2

k1

ρkσk1σk2

ρkσk1σk2

σ2

k2

, k = 1,2,3.

We also allowed the class membership probabilities to include an intercept and a single

covariate wi; hence, γ2= (g21,g22)?, γ3= (g31,g32)?, and γ1= (0,0) for identifiability.

Next, we fitted one to four class models to each of the 100 datasets and compiled the results.

Web Table 1 presents the DIC statistics for each of the four fitted models. As expected, the

average DIC across the 100 simulations was lowest for the three-class fitted model (i.e., the

true model). Moreover, the three-class model had the lowest DIC values for each of the 100

datasets, followed in general by the two-class model, which had the second-lowest DIC in 97

of the 100 simulations. For the most part, the four-class model had the highest DIC score,

alleviating concerns that the hybrid DIC measure proposed in Section 3.3 overestimates the

number of classes.

Web Table 3 provides summary statistics for the three-class model parameters. Column

1 presents the estimated class precentages, averaged across the 100 simulations. These were

identical (up to two decimal places) to the true class percentages of 31%, 26%, and 43% for

classes 1, 2, and 3, respectively. Column 5 presents the average posterior estimates across

the 100 simulations. The bias was extremely low for all parameters, including the random

effect variance components. The coverage rates ranged from 0.91 to 0.99, but for the most

part, were close to the nominal value of 0.95. Variability in coverage rates was likely due to

the size of the simulation.

Page 14

12 Biometrics, xx 20yy

5. Assessing the Impact of Mental Health and Substance Abuse Parity

To analyze the FEHB data described in the introduction, we fitted a series of two-part

growth mixture models, allowing the number of classes, K, to range from one to four. For

each class, we fit a fixed effects model, a model with uncorrelated random intercepts, a

model with correlated random intercepts, and a model with random intercepts for each

component and a random slope for the lognormal component. We also fitted a model with

an additional random slope for the binomial component (i.e., four random effects), but the

model was poorly identified and failed to converge according to standard MCMC diagnostics.

We consider identifiability issues related to this model further in the Discussion section.

Within each class, we assumed a probit-lognormal two-part model as in equations (1) and

(2). For both components, the fixed-effect covariate vector xijcomprised an intercept term

and three dummy indicators representing years 2000–2002. Because our study included only

four measurement occasions, we chose to model time categorically to allow for maximum

flexibility in capturing the time trend. Alternative parameterizations of the time trend—

such as polynomials or splines—may be appealing in other settings, particularly if there are

a large number of time points. For K > 2 classes, we allowed gender and employee status

to serve as class-membership covariates; specifically, wiin equation (2) represented a 3 × 1

vector consisting of an intercept and indicator variables for female gender and employee vs.

dependant status. To investigate the impact of between-subject heterogeneity, we compared

fixed effects models to models with correlated and uncorrelated random intercepts.

The models were fitted in R version 2.8 (R Development Core Team, 2008) using a MCMC

code developed by the authors. For each model, we ran three, initially dispersed MCMC

chains for 200,000 iterations each, discarding the first 50,000 as a burn-in to ensure that a

steady-state distribution had been reached. We retained every 50th draw to reduce autocor-

relation. Run times ranged from six to 12 hours depending on the number of classes. MCMC

Page 15

Bayesian Two-Part Latent Class Model 13

diagnostics, such as trace plots, Geweke Z-statistics (Geweke, 1992), and Brooks-Gelman-

Rubin scale reduction statistics (Gelman et al., 2004), were used to assess convergence

of the chains. There was little evidence of label switching within individual chains, and

Stephens’ (2000) relabeling algorithm tended to converge rapidly. In some cases, the class

labels required reordering across chains, but the proper order was easily identified in each

case.

For model comparison, we used the hybrid DIC measure proposed in Section 3.3. The

results are presented in Table 2. For each class, the correlated random intercept model had

the lowest DIC value. Overall, the three-class model with correlated random intercepts was

preferred, followed by the two- and four-class correlated models.

[Table 2 about here.]

Web Figure 1 presents post-burn-in trace plots for four representative parameters from

the 3-class random intercepts model: α22 (change in log odds use at year 2 compared to

year 1, class 2); β22(increase in log-spending at year 2 for class 2); γ22(log odds of class-

two membership, female vs. male); and ρ2 (class-2 random effect correlation). For clarity

of presentation, we have graphed only two of the three MCMC chains. The overlapping

trajectory lines suggest convergence and efficient mixing of the chains. The Geweke Z-

diagostic p-values ranged from 0.35 (β22) to 0.64 (α22), indicating no significant difference in

posterior means across regions of the chains; the 0.975 quantiles of the Brooks-Gelman-Rubin

statistic were each less than 1.04, again indicating convergence of the chains. However, we

did observe modest autocorrelation in the chains: the lag-10 autocorrelations ranged from

0.01 for α22to 0.16 for ρ2.

Table 3 presents the posterior means and 95% posterior intervals for the three-class model.

[Table 3 about here.]

A few general trends are worth noting:

Page 16

14Biometrics, xx 20yy

1) Class 1 comprised an estimated two-thirds of the population and was characterized by

low initial probability of spending (α11) as well as low baseline median spending (β11).

These subjects, termed the “low spenders,” were relatively rare users of mental health

services who exhibited a decreasing spending pattern over time;

2) Class 2, comprising an estimated 23% of subjects, included “moderate spenders” who

had an increasing spending pattern over time. For these individuals, spending and use

were closely linked, as demonstrated by the large increases in use and spending in 2001

and 2002, after enactment of the parity mandate. It appears that parity had the most

impact on these individuals. We provide more formal assessment of the parity effect

below;

3) Class 3 comprised the fewest subjects (10%) and was characterized by a high probability

of spending and high median spending. Individuals in this “high spending” group tended

to be chronic service users whose trends remained relatively stable over time.

The variance component estimates also reveal interesting trends. Classes 1 and 2 showed

moderate to high positive correlation among the random intercepts (ρ1= 0.35 and ρ2= 0.85),

suggesting a strong association between the probability of spending and the amount spent.

Class 3, on the other hand, exhibited low correlation (ρ3= 0.17), with a credible interval well-

overlapping zero. Thus, there was little evidence of an association between the probability of

spending and the amount spent in this class. Note that these are conditional correlations (i.e.,

assuming class membership is known); they represent the additional correlation between the

model components beyond that captured by class membership. In particular, for class 3, the

marginal correlation (prior to knowing class membership) is quite high, since the probability

of spending and amount spent both take on large values on their respective scales. However,

once class 3 membership is known, there is little residual correlation between the components,

Page 17

Bayesian Two-Part Latent Class Model 15

as evidenced by the low ρ3value. Thus, given class 3 membership, knowing how likely one

is to spend provides little additional information about the actual amount spent.

The class membership probabilities (γ’s) indicate that females were much more likely than

males to be in higher spending classes. There was minimal impact for employee status.

Table 4 presents these results on the probability scale. For example, an estimated 18% of

male dependants were in class 2, compared to 31% of female dependants.

[Table 4 about here.]

Figure 1 presents the posterior mean spending patterns for the three classes. (Web Figure

2 provides an enlarged view of classes 1 and 2.)

[Figure 1 about here.]

Again, three distinct groups emerge: “low spenders,” whose spending decreased over time;

“moderate spenders,” who showed a substantial increase in spending, particular in 2001 when

the parity directive went into effect; and “high spenders,” who spent large amounts on mental

health services throughout the study period. Thus, it appears that parity had the largest

impact on the moderate spenders. This policy may have encouraged class 2 individuals to use

services more, thereby increasing their median annual spending. The findings also illustrate

the relevance of uniquely modeling the probability of use and median spending given use:

doing so revealed the direct impact of increased use on increased spending for subjects in

class 2.

To formally assess the impact of the parity policy, we constructed parameter contrasts to

compare the effect of parity on both the log-odds of spending and mean log-spending. In

particular, for each class, we formed two contrasts:

change in log-odds(any spending) = (αk3+ αk4− αk2)/2, and

change in mean log-spending = (βk3+ βk4− βk2)/2.

Page 18

16 Biometrics, xx 20yy

Positive contrasts estimates indicate an increase in the likelihood of spending and amount

spent following parity. Because these contrasts are correlated, we plotted their joint posterior

95% highest probability density (HPD) regions, and examined whether these regions included

the origin. As shown in Figure 2, the HPD regions for classes 1 and 3 contain the origin,

indicating no significant parity effect for these classes. The circular plot for class 3 suggests

low correlation between the contrasts, which is consistent with the small random effect

correlation (ρ3) estimate observed in Table 3. On the other hand, the HPD region for class 2

is situated away from the origin in the upper-right quadrant, suggesting a significant increase

in both the probability of spending and amount spent following enactment of parity. This

supports the earlier observation that parity has the greatest impact on class 2 subjects.

[Figure 2 about here.]

As a final assessment of model fit, we conducted posterior predictive checks using the

two discrepancy measures described in Section 3.4: the sample proportion of subjects with

nonzero spending (T1), and the omnibus chi-square statistic applied to subjects with nonzero

spending (T2). Web Figure 3(a) shows the posterior predictive distribution of T1; the shaded

region corresponds to the Bayesian predictive p-value of 0.29. For T2, the observed and

posterior predicted values vary across MCMC samples (both being functions of the posterior

parameter draws). Therefore, Figure 3(b) provides a two-dimensional scatterplot of the

posterior predicted values based on yrepversus the observed values based on y. The Bayesian

predictive p-value of 0.37, corresponding to the proportion of samples above the diagonal,

again indicates adequate fit of the three-class, correlated random effects model.

6. Discussion

We have described a Bayesian approach to fitting a two-part growth mixture model for

longitudinal medical expenditure data. The model includes correlated random effects with

class-specific covariance structures, thus permitting the variance and correlation between the

Page 19

Bayesian Two-Part Latent Class Model 17

two model components to vary across classes. Advantages of the approach include distinct

modeling of zero and nonzero values; flexible modeling of time trends, class membership prob-

abilities, and between-subject heterogeneity (both within and across classes); full posterior

inference, including estimation of complex parameter contrasts and their corresponding HPD

regions; incorporation of prior information; and, in our experience, improved computation

over non-MCMC estimation techniques.

In our application, we were able to identify three distinct types of individuals: a group of

low spenders, who generally made little use of mental health services, thereby minimizing

their spending; a group of chronic spenders who, perhaps due to medical conditions, tended

to use mental health services often and in more costly ways; and a group of moderate users,

primarily female dependants, who after introduction of the parity mandate, tended to use

services more often, subsequently increasing their expenditures. Future parity efforts might

target these moderate spenders who are most likely to benefit from increased mental health

coverage.

We have focused primarily on the probit-lognormal two-part model commonly used to an-

alyze medical expenditures. This model has the particular advantage of yielding closed-form

full conditionals for many of the model parameters, resulting in efficient MCMC computation.

Although the model can be fit in standard Bayesian software, such as WinBUGS, sampling

“traps” can occur for complex models with many classes. To overcome this problem, we

implemented the MCMC algorithm in R using the full-conditionals described in the Web

Appendix.

The model can be easily adapted to allow for a logit link for the binomial component,

as well as multivariate random effects (e.g., random slopes as well as random intercepts).

The standard random-slope model implies a 4 × 4 covariance matrix that induces cross-

correlations between the intercepts and slopes for the two components. In some cases,

Page 20

18 Biometrics, xx 20yy

particularly when the ratio of observation to parameters is low, it may be necessary to impose

a structure to this matrix to ensure the model is well identified. Note that adding a single

random slope to the model yields n additional parameters, along with several class-specific

variance components. It is therefore not surprising that in our application the full random

effects model with random intercepts and slopes for both components failed to converge to

a stationary distribution. With only four observations per subject, attempting to include

random slopes for both components yielded more parameters than observations, thereby

impeding identifiability. As a result, we focused our analysis on models with at most three

random effects (two random intercepts and a random slope for the lognormal component).

The model can also be extended to accommodate multiple outcomes by assuming, for

example, conditional independence between outcomes given class membership and subject-

specific random effects. Alternatively, the factor-analytic approach recently proposed by

Leiby et al. (2009) can be used to avoid the conditional independence assumption. Future

work might also allow the covariance structures themselves to vary across classes, permitting,

say, an AR1 structure in one class and a compound symmetric structure in another.

The two-part semi-continuous model is closely related to zero-inflated models for count

data, such as the Poisson hurdle model and the zero-inflated Poisson (ZIP) model. In fact,

hurdle models for count data are structurally very similar to two-part semi-continuous mod-

els, leading to many analogous interpretations of model parameters. See Neelon, O’Malley,

and Normand (2010) for a discussion of Bayesian approaches to fitting zero-inflated count

models.

To estimate the number of latent classes, we adopted a commonly used model-comparison

approach based on DIC. A limitation of this approach is that the analysis must be conducted

several times in order to compare models with varying numbers of classes. This approach

also fails to incorporate the uncertainty associated with the unknown number of classes K.

Page 21

Bayesian Two-Part Latent Class Model 19

One way to accommodate random K is to employ reversible jump MCMC (Green, 1995),

which shifts between models of different dimensions K ∈ 1,...,Kmax. The drawback here is

potentially slow mixing of the MCMC chain. An alternative strategy is to specify an infinitely

large mixture via a Dirichlet process (DP) prior on the number of mixture components

(Ferguson, 1973).

The models proposed here have wide applicability to a variety of settings, including health

economics, health services research, psychometrics, substance-abuse research—essentially,

any of a number of fields in which semi-continuous data, typified by a large spike at zero,

arise.

Acknowledgements

This work was supported by Grants R01-MH61434 and R01-MH80797 from the National

Institute of Mental Health, Bethesda, MD. The authors thank Haiden Huskamp (Harvard

Medical School) for discussion and use of the FEHB data, Hocine Azeni (Harvard Medical

School) for assistance in preparing the data, and the referee and associated editor for their

helpful comments.

Supplementary Materials

The Web Appendix referenced in Sections 3 and 5, Web Tables 1 and 2 referenced in Section 4,

and Web Figures 1, 2, and 3 referenced in Section 5, are available under the Paper Information

link at the Biometrics website http://www.biometrics.tibs.org.

References

Beunckens C., Molenberghs G., Verbeke, G., and Mallinckrodt C. (2008). A latent-class

mixture model for incomplete longitudinal Gaussian data. Biometrics 64, 96–105.

Celeux, G., Forbes, F., Robert, C.P., and Titterington, D.M. (2006). Deviance information

criteria for missing data models. Bayesian Analysis 1, 651–674.

Congdon, P. (2005) Bayesian Models for Categorical Data. Chichester: John Wiley and Sons.

Elliott, M.R., Gallo, J.J., Ten Have, T.R., Bogner, H.R., and Katz, I.R. (2005). Using a

Bayesian latent growth curve model to identify trajectories of positive affect and negative

events following myocardial infarction. Biostatistics 6, 119–143.

Page 22

20 Biometrics, xx 20yy

Ferguson, T.S (1973). A Bayesian analysis of some nonparametric problems. Annals of

Statistics 1, 209–30.

Fr¨ uhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. New York:

Springer.

Garrett, E.S. and Zeger, S.L. (2000). Latent class model diagnosis. Biometrics 56, 1055–1067.

Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian Data Analysis, 2nd

edition. Boca Raton: Chapman and Hall.

Gelman, A., Meng, X.L., and Stern, H.S. (1996). Posterior predictive assessment of model

fitness via realized discrepancies (with discussion). Statistica Sinica 6, 733–807.

Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating

posterior moments. In Bayesian Statistics, Volume 4, J.M. Bernardo, J. Berger, A.P.

Dawid and A.F.M. Smith (eds), 169–193. Cambridge: Oxford University Press.

Ghosh, P. and Albert, P.S. (2009). A Bayesian analysis for longitudinal semi-continuous

data with an application to an acupuncture clinical trial. Computational Statistics &

Data Analysis 53, 699–706.

Goldman, H.H., Frank, R.G., Burnam, M.A., Huskamp, H.A., Ridgely, S., Normand, S-L.T.,

Young, A.S., Berry, C.L., Azzone, V., Busch, A.B., Azrin, S.T., Moran, G., Lichtenstein,

C., and Blasinksy, M. (2006). Behavioral health insurance parity for federal employees.

The New England Journal of Medicine 354, 1378–1386.

Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian

model determination. Biometrika 82, 711–732.

Leiby, B.E., Sammel, M.D., Ten Have, T.R., and Lynch, K.G. (2009). Identification of

multivariate responders and non-responders by using Bayesian growth curve latent class

models. Journal of the Royal Statistical Society, Series C 58, 505–524.

Lenk, P.J. and DeSarbo, W.S. (2000). Bayesian inference for finite mixtures of generalized

linear models with random effects. Psychometrika 65, 93–119.

Lin, H., McCulloch, C.E., Turnbull, B.W., Slate, E.H., and Clark, L.C. (2000). A latent class

mixed model for analyzing biomarker trajectories with irregularly scheduled observations.

Statistics in Medicine 19, 1303–1318.

Lin, H., McCulloch, C.E., Turnbull, B.W., Slate, E.H. (2002). Latent class nodels for joint

analysis of longitudinal biomarker and event process data: Application to longitudinal

prostate-specific antigen readings and prostate cancer. Journal of the American Statis-

tical Association 97, 53–65.

Muth´ en, B. and Shedden K. (2000). Finite mixture modeling with mixture outcomes using

the EM algorithm. Biometrics 55, 463–469.

Muth´ en, B., Brown, C.H., Booil Jo, K.M., Khoo, S-T., Yang, C-C., Wang C-P., Kellam,

S.G., Carlin, J.B., and Liao, J. (2002). General growth mixture modeling for randomized

preventive interventions. Biostatistics 3, 459-475.

Neelon, B.H., O’Malley, A.J., and Normand, S-L.T. (2010). A Bayesian model for repeated

measures zero-inflated count data with application to outpatient psychiatric service use.

Statistical Modelling, in press.

Olsen, M.K. and Schafer, J.L. (2001). A two-part random-effects model for semi-continuous

longitudinal data. Journal of the American Statistical Association 96, 730–745.

Proust-Lima, C., Letenneur L., and Jacqmin-Gaddam H. (2007). A nonlinear latent class

model for joint analysis of multivariate longitudinal data and a binary outcome. Statistics

Page 23

Bayesian Two-Part Latent Class Model 21

in Medicine 26, 2229–2245.

Proust-Lima, C., Joly, P., Dartigues, J-F., and Jacqmin-Gaddam H. (2009). Joint modelling

of multivariate longitudinal outcomes and a time-to-event: A nonlinear latent class

approach. Computational Statistics & Data Analysis 53, 1142–1154.

R Development Core Team (2008). R: A language and environment for statistical com-

puting. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

http://www.R-project.org.

Richardson, S. (2002). Discussion of Spiegelhalter et al. Journal of the Royal Statistical

Society, Series B 64, 626–627.

Smith, B.J. (2007). boa: An R package for MCMC output convergence assessment and

posterior inference. Journal of Statistical Software 21, 1–37.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. (2002). Bayesian measures

of model complexity and fit (with discussion). Journal of the Royal Statistical Society,

Series B 64, 583–639.

Spiegelhalter, D.J., Thomas, A., Best, N., and Lunn, D. (2003). WinBugs Version 1.4:

User Manual. Cambridge: Medical Research Council Biostatistics Unit. http://www.mrc-

bsu.cam.ac.uk/bugs.

Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal

Statistical Society, Series B 62, 795–809.

Su, L., Tom B.D.M, Vernon, F.T. (2009). Bias in 2-part mixed models for longitudinal

semicontinuous data. Biostatistics 10, 374-389.

Tooze, J.A., Grunwald, G.K., and Jones, R.H. (2002). Analysis of repeated measures data

with clumping at zero. Statistical Methods in Medical Research 11, 341–355.

U.S. Office of Personnel Management (2000). Call letter for contract year 2001: policy

guidance. FEHB Program carrier letter no. 2000-17. Washington, D.C.: OPM.

Verbeke, G. and Lesaffre, E. (1996). A linear mixed-effects model with heterogeneity in the

random-effects population. Journal of the American Statistical Association 91, 217–221.

White, J.W., Standish, J.D., Thorrold, S.R., and Warner, R.R. (2008). Markov chain Monte

Carlo methods for assigning larvae natal sites using natural geochemical tags. Ecological

Applications 18, 1901–1913.

Page 24

22Biometrics, xx 20yy

(a)

Year

Probability

1999 200020012002

0.00

0.50

1.00

Class 1 (67%)

Class 2 (23%)

Class 3 (10%)

(b)

Year

Median Spending ($)

1999200020012002

0

1000

2500

Class 1 (67%)

Class 2 (23%)

Class 3 (10%)

Figure 1. Posterior class trajectories for (a) probability of spending > $0 and (b) median

spending (given some spending). See Web Figure 2 for an enlarged view of classes 1 and 2.

Page 25

Bayesian Two-Part Latent Class Model 23

−2.0 −1.5−1.0 −0.50.0 0.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

Class 1

Change in mean log−spending

Change in log−odds spending

G

G

G

G

GG

G

G

G

G

GG

GG

G

G

G

G

G

G

G

G

GG

G

G

G

G

G G

G

G

G

G

G

G

GG

GG

G

GG

G

G

G

G

G

G

G

G

G

GG

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

GG GG

GG

G

G

G

G

G

GG

G

GG

G

G

G

G

G

G

G

G

G

GGGG

GG

GG

G

G

G

G

G

G

GG

G

G

GG

G

G

GG

GGGG

GG

GG

G

G

G

G

G

G

G G

G

G

G

G

G

G

G

GG

G

G

G

G

GG

GG

G

G

G

GG

G

G

G

G

G

G

G

G

G

G

G

G

G

G

o

01234

−0.5

0.5

1.5

2.5

Class 2

Change in mean log−spending

Change in log−odds spending

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

GG

G

GG

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

GG

G

G G

G

G

G G

G

GG

GG

GG

G

G

G

G

G

G

G

G

GG

G

GG

GG

G

G

G

G

G

G

GG

G

G

G

G

GG

GG

G

GG

G

GG

G

G

G

G

G

G

G

GG

G

GG

G

G

G

G

GG

G

G

G

G

G

G

G

G

G

G

GG

G

G

G

GG

G

GG

G

G

GG

GG

GG

G

G

G

G

GG

GG

GG

GG

GG

G

G

G

G

GG

G

G

G

GG

GG

GG

GG

GG

GG

G

G

G

G

G

G

G

G

G

G

G

o

−0.2 −0.1 0.0 0.10.2

−4

−2

0

2

4

Class 3

Change in mean log−spending

Change in log−odds spending

G

G

G

G

G

G

G

G

G G

G

GG

G

G

G

G

GG

GG

GG

G

GG

G G

GG

G G

G

GG

GG

GG

G

G

G

G

G

G

GG

GG

G G

G

G

G G

G

GG

GG

G

GG

GG

G

GG

GG

GG

G

GG

GG

G

G

G

G

G

G

G

GG

GG

GG

GG

GG

GG

G

G

G

G

G

GG

GG

GG

GG

GG

G

G

G

G

G

GG

G

G

G

GG

G

G

G

GG

G

G

GG

GG

GG

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

G

GG

GGGGGG

GG

GG

G

G

G

G

GG

GG

G

G

G

G

GG

GG

GG

GG

G

G

G

G

G G

G

GG

G

G

G

G

G

o

Figure 2. Posterior 95% highest probability density (HPD) regions for contrasts assessing

the effect of the parity law. Y-axes represent the change in log-odds of any spending before

and after introduction of parity; x-axes represent the change in the average amount spent

(on the log scale) before and after parity. Shaded regions correspond to the bivariate 95%

HPD regions, open circles denote posterior modes, and dots denote observations outside the

95% contours. For classes 1 and 3, the 95% HPD regions intersect the origin, suggesting no

significant change due to parity. Conversely, class 2 showed a significant increase in both

the log odds of any spending and the mean amount spent following parity.

Page 26

24 Biometrics, xx 20yy

Table 1

Descriptive statistics for FEHB study.

Median Mental

Health Spending∗(IQR) Yearn % Nonspenders

1999

2000

2001†

2002

∗Among subjects with nonzero expenditures.

†Parity mandate introduced.

1581

1581

84.69

83.36

362.30 (61.97, 1150.28)

367.95 (75.09, 845.41)

1581

1581

82.16

19.48

403.16 (118.96, 941.13)

80.52 (69.14, 864.95)

Page 27

Bayesian Two-Part Latent Class Model 25

Table 2

Model comparison statistics for FEHB study.

Number of Classes Model DescriptionDIC

Fixed Effects

Uncorrelated Random Intercepts∗

Correlated Random Intercepts

Correlated Random Intercepts and Slope†

22397

19891

18819

19588

1

Fixed Effects

Uncorrelated Random Intercepts

Correlated Random Intercepts

Correlated Random Intercepts and Slope

20687

19241

18755

19315

2

Fixed Effects

Uncorrelated Random Intercepts

Correlated Random Intercepts

Correlated Random Intercepts and Slope

20341

19230

18736

19441

3

Fixed Effects

Uncorrelated Random Intercepts

Correlated Random Intercepts

Correlated Random Intercepts and Slope

20414

19480

18772

20270

4

∗Random intercepts for both components.

†Random intercepts with a random slope for the lognormal component.

Page 28

26 Biometrics, xx 20yy

Table 3

Posterior means and 95% credible intervals (CrIs) for the three-class correlated model.

Class (%)Model Component Parameter (Variable) Posterior Mean (95% CrI)

1 (67%)Binomialα11(Intercept)

α12(Year2)

α13(Year3)

α14(Year4)

β11(Intercept)

β12(Year2)

β13(Year3)

β14(Year4)

τ2

σ2

σ2

ρ1(corr[b1i,b2i])

−2.64 (−3.78, −2.06)

−0.13 (−0.61,

−0.50 (−1.35, −0.01)

−0.29 (−1.38,

4.22 ( 2.98,

−0.36 (−0.78,

−0.32 (−0.82,

−1.02 (−1.92, −0.38)

1.11 ( 0.98,

3.65 ( 2.03,

2.64 ( 1.74,

0.35 (−0.05,

−2.02 (−3.27, −0.90)

0.41 ( 0.01,

1.25 ( 0.77,

1.38 ( 0.84,

2.02 ( 0.89,

0.99 ( 0.30,

2.26 ( 1.50,

2.54 ( 1.70,

1.14 ( 0.97,

2.07 ( 0.83,

2.26 ( 1.04,

0.85 ( 0.65,

0.17)

0.26)

5.30)

0.06)

0.14)

Lognormal

Variance Components

1(Lognormal Variance)

11(Var[b1i])

12(Var[b2i])

1.27)

8.22)

3.98)

0.69)

2 (23%) Binomialα21(Intercept)

α22(Year2)

α23(Year3)

α24(Year4)

β21(Intercept)

β22(Year2)

β23(Year3)

β24(Year4)

τ2

σ2

σ2

ρ2(corr[b1i,b2i])

0.84)

1.80)

1.98)

3.03)

1.75)

3.09)

3.38)

1.30)

4.22)

3.96)

0.96)

Lognormal

Variance Components

2(Lognormal Variance)

21(Var[b1i])

22(Var[b2i])

3 (10%)Binomialα31(Intercept)

α32(Year2)

α33(Year3)

α34(Year4)

β31(Intercept)

β32(Year2)

β33(Year3)

β34(Year4)

τ2

σ2

σ2

ρ3(corr[b1i,b2i])

0.86 (−4.23,

1.73 ( 0.04,

1.06 (−0.55,

0.17 (−0.67,

6.55 ( 5.13,

0.04 (−0.09,

0.13 (−0.01,

−0.01 (−0.14,

0.37 ( 0.32,

6.80 ( 0.15, 30.74)

0.87 ( 0.55,

0.17 (−0.51,

−1.55 (−2.92, −0.22)

0.91 ( 0.22,

−0.18 (−0.88,

−2.70 (−3.70, −0.78)

1.05 ( 0.35,

−0.003 (−0.56, 0.52)

2.66)

6.50)

5.13)

1.39)

6.95)

0.16)

0.24)

0.13)

0.42)

Lognormal

Variance Components

3(Lognormal Variance)

31(Var[b1i])

32(Var[b2i]) 1.81)

0.82)

Class Parametersγ21(Class 2 Intercept)

γ22(Class 2 Female)

γ23(Class 2 Employee)

γ31(Class 3 Intercept)

γ32(Class 3 Female)

γ33(Class 3 Employee)

1.71)

0.44)

1.68)

Page 29

Bayesian Two-Part Latent Class Model 27

Table 4

Predicted class-membership probabilities by covariate profile.

Gender Employee Status Class-Membership Probabilities

Class 1 Class 2Class 3

Male

Male

Female

Female

Non-Employee

Employee

Non-Employee

Employee

0.75

0.77

0.57

0.60

0.18

0.16

0.31

0.28

0.07

0.07

0.12

0.12