Page 1

eScholarship provides open access, scholarly publishing

services to the University of California and delivers a dynamic

research platform to scholars worldwide.

Department of Statistics, UCLA

UC Los Angeles

Title:

A Random-effects Markov Transition Model for Poisson-distributed Repeated Measures with

Nonignorable Missing Values

Author:

Li, Jinhui, Department of Statistics, UCLA

Yang, Xiaowei, BayesSoft, Inc.

Wu, Ying N, UCLA Department of Statistics

Shoptaw, Steven, Integrated Substance Abuse Programs, UCLA

Publication Date:

10-28-2005

Series:

Department of Statistics Papers

Publication Info:

Department of Statistics Papers, Department of Statistics, UCLA, UC Los Angeles

Permalink:

http://escholarship.org/uc/item/0d77p92p

Keywords:

Repeated Measures, Markov Transition Models, Nonignorable Missing Values, Poisson

Regression Model, Shared-Parameter Missingness

Abstract:

In biomedical research with longitudinal designs, missing values due to intermittent nonresponse

or premature withdrawal are usually ’nonignorable’ in the sense that un- observed values are

related to the patterns of missingness. When missing values are simply ignored, analyses based

on observed-data likelihood may yield biased estimates or invalid inferences. By drawing the

framework of a shared-parameter mechanism, the process yielding the repeated count measures

and the process yielding missing val- ues can be modelled separately, conditionally on a group

of shared parameters. For chronic diseases, Markov transition models can be used to study

the transitional fea- tures of the pathologic processes. In this paper, Markov chain Monte Carlo

(MCMC) algorithms are developed to fit a random-effects Markov transition model (REMTM) for

incomplete count repeated measures, within which random effects are shared by the counting

process and the missing-data mechanism. Assuming a Poisson distribu- tion for the count

measures, the transition probabilities are estimated using a Poisson linear regression model. The

missingness mechanism is modeled with a multinomial- logit regression to calculate the transition

probabilities of the missingness indicators. The method is demonstrated using both simulated data

sets and a practical data set from a smoking cessation clinical trial

Page 2

Title:

A Random-effects Markov Transition Model for

Poisson-Distributed Repeated Measures with Nonignorable

Missing Values

Short Running Title

Markov Transition Model for Incomplete Count Measures

Authors:

Jinhui Li1,2Xiaowei Yang1,3, Yingnian Wu2, and Steven Shoptaw3

1. BayesSoft, Inc.,

3641 Midvale Avenue, #207, Los Angeles, CA 90034.

2. UCLA-Department of Statistics

PO Box 951554, Los Angeles, CA 90095-1554.

3. UCLA-Integrated Substance Abuse Programs

11075 Santa Monica Blvd, Suite 200, Los Angeles, CA 90025.

Contact Author:

Xiaowei Yang, Ph.D., BayesSoft, Inc.,

3641 Midvale Avenue, #207, Los Angeles, CA 90034, U.S.A.

Tel 310-600-4368, Fax 310-836-5851

E-mail: XYang@BayesSoft.com

1

Page 3

SUMMARY

In biomedical research with longitudinal designs, missing values due to intermittent

nonresponse or premature withdrawal are usually ’nonignorable’ in the sense that un-

observed values are related to the patterns of missingness. When missing values are

simply ignored, analyses based on observed-data likelihood may yield biased estimates

or invalid inferences. By drawing the framework of a shared-parameter mechanism,

the process yielding the repeated count measures and the process yielding missing val-

ues can be modelled separately, conditionally on a group of shared parameters. For

chronic diseases, Markov transition models can be used to study the transitional fea-

tures of the pathologic processes. In this paper, Markov chain Monte Carlo (MCMC)

algorithms are developed to fit a random-effects Markov transition model (REMTM)

for incomplete count repeated measures, within which random effects are shared by

the counting process and the missing-data mechanism. Assuming a Poisson distribu-

tion for the count measures, the transition probabilities are estimated using a Poisson

linear regression model. The missingness mechanism is modeled with a multinomial-

logit regression to calculate the transition probabilities of the missingness indicators.

The method is demonstrated using both simulated data sets and a practical data set

from a smoking cessation clinical trial.

Key words: Repeated Measures, Markov Transition Models, Nonignorable Missing

Values, Poisson Regression Model, Shared-Parameter Missingness

2

Page 4

1 Introduction

1.1Background

Longitudinal designs are commonly used to conduct biomedical research, especially

for clinical trials. The defining feature of a longitudinal study is that large numbers of

repeated measures are collected on study participants. Special statistical methods are

required for the data analysis because observations collected on the same participant

are correlated to each other [5]. For complete data sets or for those with ignorable

missing values, three longitudinal strategies are most popular in biomedical research:

(i) generalized linear mixed models (GLMM; [2]), where intra-subject correlations

are introduced via random effects; (ii) marginal models using generalized estimat-

ing equations (GEEs) [10] where parameters on group means are estimated using

quasi-likelihood method assuming a working correlation structure; and (iii) Markov

transition models (MTM; [23]) where current measures are modelled by conditioning

on the previous observations and covariates. In social-economical sciences, there are

other forms of longitudinal modelling, e.g., hierarchical models, latent variable mod-

els, and structure equation models [19, 3, 9]. Compared to time-naive analysis using

aggregate markers of outcomes(e.g., T-test, ANOVA, and GLM), longitudinal models

are capable of modelling intra-correlation structures and drawing inferences on time-

trends, thus more closely reflecting the nature of longitudinal design. Longitudinal

models are also able to handle certain types of missing data problems encountered in

3

Page 5

practical data sets. Many statistical packages (e.g., SAS, SPSS, STATA, S-Plus and

R) have implemented the above models for longitudinal data analysis.

In longitudinal studies for chronic health problems, such as drug dependence, res-

piratory diseases, and cancers, there are usually large vectors of Poisson-distributed

repeated measures that count the numbers of adverse events. For a participant of

such a study, the current states are usually dependent on the previous observations

in addition to the explanatory variables of interest, e.g., dummy variables indicating

the treatment assignment in a clinical trial. For such a data set, a Markov transition

model could be applied for statistical analysis, since it models dynamic features of

transition patterns of the counting process. Markov chains can describe the phenom-

ena that evolves through time, with applications ranging from biomedical research to

many other scientific fields such as physics, engineering, sociology, and economics [6].

A noticeable problem with longitudinal data analysis is introduced by missing val-

ues. Within certain areas of biomedical research, e.g., drug dependence, HIV, and

cancer, the statistical analysis is plagued by a large amount of missing items. The

feature of incompleteness is related to the chaotic nature of the clinical disorders. For

example, drug abusers in a study frequently missed their scheduled clinic visits or

dropped out of studies prematurely, leading to proportions of missing values as large

as 70% toward the termination of the study period [7]. High levels of incomplete-

ness usually falsify assumptions that missing data may be ignored [22]. Even in a

randomized-controlled clinical trial, the presence of missing values after randomiza-

4

Page 6

tion can complicate standard complete-data analysis approaches; missing responses

can occur at different rates and with different reasons for different conditions. Data

analyses that ignore missing data are apt to introduce biases during significance test-

ing.

In Albert and Follmann [1], an extended version of the above Markov transition

model was proposed to handle nonignorable missing values in a binary longitudinal

data set. This model introduced shared random-effects in order to link the propen-

sity of transition between measurement states and the probability of being observed,

intermittently missed, or dropped out. By jointly modelling the transitional features

for observed binary repeated measures and the 3-category missingness indicators,

random-effects Markov transition models (REMTMs) provide meaningful clinical in-

terpretations on the dynamic change of cocaine dependence and useful inferences on

the patterns and mechanisms of missing data. Recently, we conducted some simula-

tion studies that jointly supported the superior performance of REMTM over tradi-

tional,yet inadequate, Markov transition models in analyzing binary longitudinal data

with nonignorable missing values. For certain practical binary data sets with large

amounts of missing data, REMTM seems to be the only applicable choice with accept-

able performance. Considering that Poisson-distributed repeated measures are also

frequently encountered in biomedical research, REMTMs are sufficient for analyzing

incomplete count repeated measures in the presence of nonignorable missing values.

In drug dependence studies, for example, such count measures indicate numbers of

5

Page 7

drug use within a certain period.

1.2A Motivating Study

Before describing the fitting algorithms for REMTM, a brief description of a mo-

tivating example is offered. In Shoptaw et al. [18], a smoking cessation clinical

trial was conducted to study the relative efficacy of contingency management and

relapse prevention types of behavioral therapies when optimizing outcomes using

nicotine replacement therapy. In this 2 (contingency management or not) × 2 (re-

lapse prevention or not) 12-week study, 175 methadone-maintained tobacco smokers

were randomized to receive one of the four resulting conditions; all received nicotine

replacement therapy.The number of smoking episodes during the previous week was

evaluated through self report. Thus, there are at most 12 weekly-reported counts for

each participant. Main analyses using carbon monoxide levels from breath samples

established that there was a significant treatment effect for contingency management

but no effect fore relapse prevention and no interaction during the study period [22].

The four conditions were then collapsed into two: the contingency management group

(n = 90) with smokers who received contingency management and the control group

(n = 85) with smokers who did not receive contingency management. As a secondary

statistical analysis in this article, we compared the numbers of smoking episodes be-

tween the treatment and control groups. As noticeable from the data matrix, there is

moderate amount of missing values due to dropout: 43 subjects dropped out during

6

Page 8

the study period (24.6%), causing 296 missing values. The proportion of intermittent

missingness is very low: 20 total intermittent missing values (about 1%) observed on

six subjects (3.4%).

<INSERT FIGURE 1 HERE>

In Figure 1, the numbers of smoking episodes in the treatment (i.e., contingency

management) and control groups are plotted separately. The two groups have similar

distributions of the response variable at the beginning period of the study, however the

average number of smoking episodes in the treatment group decreased more quickly

and to a lower level than the control group. This typical clinical trial data set shows

the treatment assignment is an important predictor variable. The repeated count

measures and incomplete observations are all common in clinical trials when testing

treatment effects. The REMTM will be applied to this 2-group set of data.

2 Method and Model

2.1Analysis of Incomplete Longitudinal Data

Given a longitudinal data set, the repeated measures are denoted by a matrix Y = [yit]

where yitindicates the tthmeasure (t = 1,...,T) collected on the ithsubject (i =

1,..,n). For the discussion, we restrict the longitudinal data to a balanced design

with time-independent covariates (i.e., those measured at baseline). The matrix of

7

Page 9

covariates, thus, can be denoted by X = [xij] where xi1,...,xipindicate p predictors

collected at baseline for the ithsubject. In the presence of missing values, missingness

patterns are denoted as R = [rit], a matrix with elements:

rit=

0 if yitobserved

1 if yitintermittently missing

2 if yitmissing after dropout

Further, θ is used to denote the parameters modeling the repeated measures and φ

is used to represent the parameters modeling the missingness mechanism. For each

subject, the full likelihood function is the joint distribution of observed repeated

measures (i.e., yi= (yi1,...,yiT)T) and the vector of missingness indicators (i.e.,

ri= (xi1,...,xip)T), i.e.,

L(θ,φ|yi,ri,Xi) ∝ P(yi,ri|Xi,θ,φ)

When determining the influence of missing data, a primary interest is to identify

missingness patterns and missingness mechanisms, and their potential relationships

with treatment conditions or other baseline factors. While missingness patterns in-

dicate which data points are missing, missingness mechanisms explain why they are

missing. In practice, missingness mechanisms refer to the underlying processes yield-

ing missing values. Such mechanisms are usually partially known or completely hid-

8

Page 10

den to investigators. By partitioning Y into (Yobs,Ymis), which respectively represent

the observed values and the values that would be observed if they were not missing,

missingness mechanisms reflect the association between (Yobs,Ymis) and R. When the

missingness pattern (R) is not associated with the values of the underlying poten-

tially missing data (i.e., Ymis) – a condition that we call ignorability – it is possible

to obtain correct inferences without modelling the missingness mechanisms.

Within the framework of outcome-dependent missingness (see Figure 2(A)), the

joint distribution of (yi,ri) is factored into the marginal distribution of yiand the con-

ditional distribution of rigiven yi, i.e., P(yi,ri|Xi,θ,φ) = P(yi|Xi,θ)P(ri|yi,Xi,φ).

Within this framework, the definition of ignorability was extensively discussed in

the statistical community; see [16, 11, 17]. More specifically, when missing data are

”missing at random” (MAR; i.e., P[R|Y ] = P[R|Yobs]) and the parameters of data

(i.e., θ) are distinct from those of the missingness mechanism (i.e., φ ), the miss-

ingness mechanisms can be ignored for likelihood-based inferences about θ. This is

because the joint likelihood function L(θ,φ) can then be factored as the product of

the likelihood function for φ and the observed-data likelihood function for θ, i.e.,

L(θ,φ|Yobs,R) = L1(θ|Yobs)L2(φ|R,Yobs). As mentioned earlier, most longitudinal

models are based on observed-data likelihood (i.e., L1(θ|Yobs)) = P(R|Yobs,φ), thus

requiring the condition of ignorability. More specifically, generalized linear mixed

models require the assumption of MAR, Markov transition models require a spe-

cial case of MAR where ritdepends on (yit−1,..,yi1), and marginal models with GEE

9

Page 11

assume covariate-dependent MAR (i.e., P[R|Y,X] = P[R|X]) [22]. Time-naive meth-

ods usually assume that missing data are missing completely at random (MCAR; i.e.,

P[R|Y ] = P[R]), which is usually too rigorous for practical data sets.

<INSERT FIGURE 2 HERE>

As seen in Figure 2, there are two other ways in defining the missingness mecha-

nisms: shared-parameter missingness (Figure 2(B)) and pattern-mixture missingness

(Figure 2(C)). Contrary to outcome-dependent missingness, pattern-mixture models

assumes that the joint distribution of (yi,ri) is factored into the marginal distrib-

ution of ri and the conditional distribution of yi given ri, i.e., P(yi,ri|Xi,θ,φ) =

P(ri|Xi,φ)P(yi|ri,Xi,θ). In other words, different distributions are assumed for re-

peated measures on subjects within different missingness patterns. For example, in

cancer studies, individuals who have died during the study should be treated differ-

ently than those who are still alive at the end of the study . By sharing a common

vector of parameters (i.e., ξ), a shared-parameter model assumes that the data yi

and missingness indicators riare conditionally independent of each other given ξ, i.e.,

P(yi,ri|Xi,θ,φ) =?P(ri|ξi,Xi,φ)P(yi|ξi,Xi,θ)dξi. In the case of shared-parameter

missingness, the shared parameters can be either observed covariates or unobserved

latent variables. For example, in a cancer study, we may observe that yiand riare

independent of each other within each age category, but are dependent of each other

across all the age groups. In this case, age can be viewed as a confounder in determin-

10

Page 12

ing the relationship between yiand ri. When ξ corresponds to an latent variable, such

as random-effects, the missingness mechanism is also called informative [22], which

is a special case of nonignorability. For informative missingness, structure equation

models [9] potentially provide a tool for analysis.

It has become intuitive that among each of the three missingness modeling setting,

ignorability can be achieved so long as there are no marginal association between Ymis

and R conditionally on Yobs. Among the three cases, outcome-dependent models and

pattern-mixture model have been studied intensively. In this article, we use the

shared-parameters model, which will be implemented by a Markov transition models

with shared random effects, to analyze the effects of contingency management on

reducing cigarette smoking.

2.2Random-Effects Markov Transition Model for Repeated

Count Measures

For longitudinal data with Poisson-distributed count measures and informative miss-

ing values, REMTM offers a strategy for implementing the shared-parameter models

where random effects are the shared parameters. This model can be viewed as a

natural extension of the REMTM for binary longitudinal data [21]. REMTM first

assumes that complete data, (yi, ri), are identically independently distributed across

subjects (i = 1,...,n), and for each subject i, the repeated count measures yiare

11

Page 13

conditionally independent of the missingness indicators rigiven the random effects

ξi. Therefore, we can separately model the counting process p(yi|θ,ξi) and the miss-

ingness mechanism p(ri|φ,ξi).

2.2.1Modelling the Counting Process

To model the counting process, the first-order Markov chain is assumed for Bfyi=

(yi1,...,yiT)T, where on any time point, yitis independent of (yi1,...,yit−2)Tgiven

the previous observation yit−1. A random intercept effect is used to capture the

baseline heterogeneity across subjects. In this random-intercept Markov transition

model, we are interested in the transition probability of the Poisson-distributed count

measures. Such probability depends on the covariates under investigation and a

random intercept, i.e.,

P(yit|xit,yit−1,ξi) =λyit

it

yit!e−λit

with λit= E(yit|xit,yit−1,ξi) connected to covariates xitand random effect ξiN(0,σ2)

through a linear regression model using a link function log(·),

log(λit) = xitβ + (log(max(1,yit−1) − xit−1β)α + ξi.

This article solely deals with baseline covariates that do not change with time, i.e.,

xit = (xi1,...,xip)). Time-varying covariates can be easily implemented into the

above Poisson regression model. Here, β contains the fixed parameters, which are

12

Page 14

of the most interest in making inferences on the covariates effect (e.g., treatment

efficacy). The parameter α indicates the influence of the previous counts through

the logarithm of the residual, (max(1,yit−1)−xit−1β), where max(1,yit−1) is used to

ensure a positive value for logarithm.

2.2.2 Modelling the Missingness Mechanism

By viewing the missingness indicator matrix R as a special form of categorical re-

sponses, we can model the missingness mechanism by a multinomial-logit Markov

transition model. Still, we adopt the first-order Markov chain assumption to calibrate

the transitional probabilities Pij= Pr(rit= j|rit−1= i) between any consecutive 3-

category missingness indicators, rit−1and rit(i = 0,1,2; j = 0,1,2). Determined by

certain restrictions, the following transitional probabilities would be always equal to

zero: P12, P20, P21. For other combinations of rit−1and rit, the transitional proba-

bilities are calculated in the following manner. First, if the previous count measure

is observed (i.e., rit−1= 0), then the “current’ one could be observed, intermittently

missing, or missing due to dropout. In this case, the 3-category multinomial-logit

regression model can be used to calculate the transitional probabilities, i.e.,

P(rit= j|ξi,xit,rit−1= 0) =

1

1+

?2

l=1exp(xitηl+ξiγl)

if j = 0,

exp(xitηj+ξitγj)

l=1exp(xitηl+ξitγl)

1+

?2

if j = 1 or 2.

Second, if the previous count measure is intermittent missing, then the current one

13

Page 15

may only be either observed or intermittently missing. Correspondingly, a logistic

regression model can be used for calculating P10and P11, i.e.,

P(rit= j|ξi,xit,rit−1= 1) =

1

1+exp(xitη1+ξiγ1)

if j = 0,

exp(xitη1+ξitγ1)

1+exp(xitη1+ξitγ1)

if j = 1.

Third, for the absorbing state 2, we would always have P(rit= 2|ξi,xit,rit−1= 2) =

1. By denoting Tias the time for the last observed measurement for subject i, special

considerations should be paid to the last observed measures yiTi, for which we always

have P(riTi= 0|riTi−1= 1) = 1. This is because for any individual, if the measure

at time Ti− 1 is intermittently missing, the one at time Timust be observed. In the

above logit and logistics models, regression coefficients η1and η2respectively indicate

whether intermittent missingness and dropout depend on covariates, while coefficients

γ1 and γ2 respectively indicate whether intermittent missingness and dropout are

informative (i.e., nonignorable).

2.3Bayesian Inference using MCMC

After setting up models for the counting process and the missingness mechanism, we

need to estimate the parameters in the above models. In [22], a maximum likelihood

method was adopted to estimate the parameters of REMTM for binary longitudinal

data. Similarly, we can optimize the REMTM likelihood function for the count data,

14