Page 1

Subject-specific odds ratios in binomial GLMMs with

continuous response

Rapporti di quote specifici per soggetto nei GLMM binomiali a risposta

continua

Mariangela Sciandra1, Vito M.R. Muggeo1, Gianfranco Lovison1

Dipartimento di Scienze Statistiche e Matematiche ‘S. Vianelli’, Universit` a di Palermo

e-mail: vmuggeo@dssm.unipa.it

Riassunto: In un contesto di regressione, la dicotomizzazione di una variabile risposta

continua ` e solitamente dettata dall’esigenza di ottenere una misura del rapporto di quote

(odds ratio) che ne quantifichi l’associazione con uno o pi` u fattori di rischio. Prendendo

spunto da un lavoro recentemente pubblicato, in questo articolo si esplora la possibilit` a di

ottenere stime del rapporto di quote da un modello di regressione lineare senza ricorrere

alla discretizzazione della variabile risposta, in presenza di dati raggruppati quando un

approccio basato su un modello ad effetti casuali viene utilizzato.

Keywords: odds ratio, random-effects, logistic regression, dichotomizing, efficiency.

1. Background

Given i = 1,...,n units, for the i subject let Yi the continuous outcome variable

assumed to be dependent on the p-dimensional vector of risk factors xi; regression

models are aimed to investigate the relationships between the response Y and the

covariates X1,...,Xp. Due to historical reasons and to a presumed superiority in terms

of interpretability, the odds ratio (OR) is often the preferred association measure used to

communicate results in medical and biological studies; therefore even with continuous

outcome variables, it is common practice to analyze data through a logistic regression

model for the dichotomized response variable Y∗= I(Y > c) for some threshold value c.

Often such a cut-off is selected on the basis of biological/medical grounds or to assure a

pre-specified number of ‘events’ (for instance c may be the median or any sample quantile

of Y ); whatever value c, the resulting fitted logistic model provides estimates of the odds

ratios which greatly facilitate communication in the medical community. Recently, Moser

and Coobs (2004) discuss that with pure continuous response variable Y , the OR may be

estimated properly transforming the regression parameters from a continuous regression

model of Y versus X1,...,Xpleading to an estimator of OR with greater efficiency than

that coming from the dichotomized approach. The paper by Moser and Coobs (2004)

deals with independent observations, while in this paper we study at which extent it is

possible to generalize the results to account for non-independent observations.We assume

that the mixed model framework is employed, where the random coefficients are used

to model the subject-specific effects while accounting for the serial correlation among

the observations. Section 2 is concerned with statistical methods, where the linear and the

binomialgeneralizedlinearmixedmodelsarecontrasted; Section3illustratesresultsfrom

a small simulation study and Section 4 includes a discussion and some final remarks.

– 609 –

Page 2

2. The logistic random-effects model for dichotomized response vs.

linear mixed models for a continuous response

Random effect models represent one way to handle subject-specific parameters in a model

for repeated measurement in which the main interest is in describing the evolution of

each subject separately. This approach is based on the assumption that, for every subject,

the response can be modelled by a linear (or generalized linear) regression model, but

with subject-specific regression coefficients.

(GLMM) is the most frequently used random-effects model for discrete outcomes. A

general framework for a GLMM is the following. Let Yij be the j-th measurement

(j = 1,...,ni) for the i-th cluster (i = 1,...,m). Denoting by bia q-dimensional vector

of random effects, assumed to be drawn independently from MVN(0,D), we assume

that conditionally on random effects bithe elements Yij are independent following a

generalized linear model but with the linear predictor extended to include subject-specific

regression parameters bi. More specifically, it is assumed that all Yij have densities

from the exponential family and the mean µij is modelled through a linear predictor

containing fixed regression parameter β as well as subject-specific parameters bi, i.e.,

g(E[Yij|xij;zij;bi] = g(µij) = x?

xij and zij two vectors containing known covariates’ values. In order to obtain OR

estimates, the logistic random-effect regression model for the dichotomized response

values Y∗

It assumes that Y∗

E[Y∗

The Generalized Linear Mixed Models

ijβ + z?

ijbifor a known link function g(·), and for

ij= I(Yij> c) is a common choice for analysis of multilevel dichotomous data.

ij|bi∼ Bernoulli(πij) with P(Yij> c|bi,xij) = P(Y∗

ij] = πijand:

ij= 1|bi,xij) =

πij= (1 + exp{−(xijβ + zijbi)})−1

where the marginal likelihood for this model is obtained integrating over the random

effects

(1)

m

?

i=1

?

ni

?

j=1

π

y∗

ij(1 − πij)1−y∗

ij

ijf(bi)dbi

In general, no closed forms are available to solve the above integral because the presence

of m integrals over the q-dimensional random effects bi, therefore numerical integration

methods are needed (Crouch and Spiegelman, 1990).

It is well-known that in a binomial GLMM the parameters have a different

interpretation than in the corresponding marginal model where the main interest is in the

population-averaged evolutions. Fixed effects, infact, have not a marginal interpretation

since in general

?

πijf(bi)dbi?= g−1(x?

ijβ) = (1 + exp{−(x?

ijβ)})−1

So, in a logistic regression random-effect model the OR interpretation is differentfrom the

ordinary logistic regression model because now the OR is a random variable rather than a

fixed parameter and this should be led to a different interpretation of the model. However

Larsen et al. (2000) discuss that the fixed estimates have a nice interpretation in terms of

median of the random effects distribution. Therefore focusing on fixed effects estimate

– 610 –

Page 3

of the OR is meaningful even in a mixed model framework. Rather than to obtain OR

estimates from the logistic random effect model for Y∗

and Coobs (2004) leads to consider the LMM for the true continuous response,

ij, extension of the idea of Moser

E[Yij] = x?

ijα + z?

ijai

(2)

where now the error terms are independent and identically distributed logistic random

variables with mean 0 and variance σ2. Model (2) can be used to gain OR estimates of

the model (1); simple algebra show that the OR estimates can be obtained by means of

exp{πˆ αj/(√3ˆ σ)} where ˆ αjand ˆ σ are estimates from the fitted linear mixed model (2);

preserving the continuous nature of Y is expected to lead to a substantial improvement

in the OR estimation in terms of unbias and mean square error of the estimator and

consequent width of the estimated confidence intervals.

3. Simulations

To assess the performance of the aforementioned approach with respect to the

dichotomizing procedure, we carried out a few simulations: Table 1 shows results from

the simulated model yi = b0i+ 0.7xi+ ?iwhere xi ∼ U(−2,2), b0i ∼ N(0,σb0) and

?icoming from a logistic distribution with zero mean and and scale parameter equal to

√3σ/π, where σ2is the variance set to one. Three values of σb0were explored. For each

of the one thousand replicates, four models were estimated: i) a random intercept linear

mixed model for the response y and explanatory variable x and ii) three random intercept

binomial GLMMs for the binary response I(y > c) where c was selected each time to

obtain 0.50, 0.30 and 0.10 percent of events. For binomial GLMMs a penalized quasi

likelihood approach has been employed, while for LMM we have assumed a Gaussian

distribution for the response where the log likelihood has a closed form expression.

Table 1: Monte Carlo estimates of mean and mean square error of the logOR estimator under

different models (true logOR ≈ 1.270): LMM and binomial GLMM for the dichotomized response

with different percentage π of successes.

GLMM(π)

0.3

1.244

0.166

1.218

0.274

1.243

0.399

σb0

0.3

LMM

1.273

0.078

1.275

0.085

1.273

0.081

0.5

1.236

0.148

1.201

0.238

1.222

0.354

0.1

1.295

0.516

1.314

0.769

1.308

1.368

mean

mse

mean

mse

mean

mse

1.0

2.0

Results highlight how the LMM approach yields estimators with negligible unbias

and mean square errors (mse) which are substantially independent of the variance of the

random intercept σb0; on the other hand, the logOR estimators via the dichotomizing

approach have greater mse which are inflated when the heterogeneity of the intercepts

increases and the the number of events decreases. Figure 1 displays simulation results for

σb0= 0.3.

– 611 –

Page 4

Figure 1: Smoothed Monte Carlo estimates (based on 1000 replications) of the sampling

distributions of the logOR for different models: LMM (continuous line), GLMM(0.5)

(dashed line); GLMM(0.3) (dotted line); GLMM(0.1) (dashed-dotted line)

.

0.51.0 1.52.0

0

1

2

3

4

log(Odds Ratio)

Density

4. Final remarks

When the response variable is continuous and a mixed model framework is employed,

odds ratios estimates for the risk factors in the model may be obtained directly from a

linear model for the original continuous variable. The loss of information deriving from

dichotomizing appears to be substantial, and it is emphasized when random effects are

included in the model and in particular when the relevant variance σ2

approach allows to obtain ‘better’ estimates of the OR and furthermore allows to avoid the

well-known computational problems which usually complicate the estimation of logistic

randomeffectsmodels. InsteadwemetconvergenceproblemsinfittingbinomialGLMMs

for the dichotomized variable in some simulation scenarios; such estimating problems are

expected to increases when additional random effects (e.g. random slopes) are included

in the model, making model estimation really demanding. On the other hand, the linear

mixed model does not suffer from such computational burdens since the closed form

expression of the log-likelihood. These results can be relevant in practice, since both

dichotomizing of continuous outcomes and mixed modelling for longitudinal studies are

becoming integral part of medical and biological research.

b0increases. A LMM

References

Larsen K., Petersen, J. H., B.-Jorgesen, E., and Endahl, L. (2000) Interpreting parameters

in the logistic regression model with random effects. Biometrics, 56, 909–914

Moser, B. K. and Coobs, L.P. (2004) Odds ratios for a continuous outcome variable

without dichotomizing. Statistics in Medicine, 23, 1843–1860.

Stiratelli, R., Laird, N., and Ware, J.H. (1984)

observation with binary response. Biometrics, 40, 961–971.

Crouch,A. C. and Spiegelman, E. (1990)

?f(t)exp(−t2)dt: application to logistic-normal models. Journal of the American

Statistical Association, 85, 464–469.

Random effects models for serial

The evaluation of integrals of the form

– 612 –