# Subject-specific odds ratios in binomial GLMMs with continuous response

**Abstract**

In a regression context, the dichotomization of a continuous outcome variable is often motivated by the need to express results
in terms of the odds ratio, as a measure of association between the response and one or more risk factors. Starting from the
recent work of Moser and Coombs (Stat Med 23:1843–1860, 2004) in this article we explore in a mixed model framework the possibility
of obtaining odds ratio estimates from a regression linear model without the need of dichotomizing the response variable.
It is shown that the odds ratio estimators derived from a linear mixed model outperform those from a binomial generalized
linear mixed model, especially when the data exhibit high levels of heterogeneity.

Subject-speciﬁc odds ratios in binomial GLMMs with

continuous response

Rapporti di quote speciﬁci per soggetto nei GLMM binomiali a risposta

continua

Mariangela Sciandra

1

, Vito M.R. Muggeo

1

, Gianfranco Lovison

1

Dipartimento di Scienze Statistiche e Matematiche ‘S. Vianelli’, Universit

`

a di Palermo

e-mail: vmuggeo@dssm.unipa.it

Riassunto: In un contesto di regressione, la dicotomizzazione di una variabile risposta

continua

`

e solitamente dettata dall’esigenza di ottenere una misura del rapporto di quote

(odds ratio) che ne quantiﬁchi l’associazione con uno o pi

`

u fattori di rischio. Prendendo

spunto da un lavoro recentemente pubblicato, in questo articolo si esplora la possibilit

`

adi

ottenere stime del rapporto di quote da un modello di regressione lineare senza ricorrere

alla discretizzazione della variabile risposta, in presenza di dati raggruppati quando un

approccio basato su un modello ad effetti casuali viene utilizzato.

Keywords: odds ratio, random-effects, logistic regression, dichotomizing, efﬁciency.

1. Background

Given i =1,...,n units, for the i subject let Y

i

the continuous outcome variable

assumed to be dependent on the p-dimensional vector of risk factors x

i

; regression

models are aimed to investigate the relationships between the response Y and the

covariates X

1

,...,X

p

. Due to historical reasons and to a presumed superiority in terms

of interpretability, the odds ratio (OR) is often the preferred association measure used to

communicate results in medical and biological studies; therefore even with continuous

outcome variables, it is common practice to analyze data through a logistic regression

model for the dichotomized response variable Y

∗

= I(Y>c) for some threshold value c.

Often such a cut-off is selected on the basis of biological/medical grounds or to assure a

pre-speciﬁed number of ‘events’ (for instance c may be the median or any sample quantile

of Y ); whatever value c, the resulting ﬁtted logistic model provides estimates of the odds

ratios which greatly facilitate communication in the medical community. Recently, Moser

and Coobs (2004) discuss that with pure continuous response variable Y , the OR may be

estimated properly transforming the regression parameters from a continuous regression

model of Y versus X

1

,...,X

p

leading to an estimator of OR with greater efﬁciency than

that coming from the dichotomized approach. The paper by Moser and Coobs (2004)

deals with independent observations, while in this paper we study at which extent it is

possible to generalize the results to account for non-independent observations.We assume

that the mixed model framework is employed, where the random coefﬁcients are used

to model the subject-speciﬁc effects while accounting for the serial correlation among

the observations. Section 2 is concerned with statistical methods, where the linear and the

binomial generalized linear mixed models are contrasted; Section 3 illustrates results from

a small simulation study and Section 4 includes a discussion and some ﬁnal remarks.

– 609 –

2. The logistic random-effects model for dichotomized response vs.

linear mixed models for a continuous response

Random effect models represent one way to handle subject-speciﬁc parameters in a model

for repeated measurement in which the main interest is in describing the evolution of

each subject separately. This approach is based on the assumption that, for every subject,

the response can be modelled by a linear (or generalized linear) regression model, but

with subject-speciﬁc regression coefﬁcients. The Generalized Linear Mixed Models

(GLMM) is the most frequently used random-effects model for discrete outcomes. A

general framework for a GLMM is the following. Let Y

ij

be the j-th measurement

(j =1,...,n

i

) for the i-th cluster (i =1,...,m). Denoting by b

i

a q-dimensional vector

of random effects, assumed to be drawn independently from MVN (0, D), we assume

that conditionally on random effects b

i

the elements Y

ij

are independent following a

generalized linear model but with the linear predictor extended to include subject-speciﬁc

regression parameters b

i

. More speciﬁcally, it is assumed that all Y

ij

have densities

from the exponential family and the mean µ

ij

is modelled through a linear predictor

containing ﬁxed regression parameter β as well as subject-speciﬁc parameters b

i

, i.e.,

g(E[Y

ij

|x

ij

; z

ij

; b

i

]=g(µ

ij

)=x

ij

β + z

ij

b

i

for a known link function g(·), and for

x

ij

and z

ij

two vectors containing known covariates’ values. In order to obtain OR

estimates, the logistic random-effect regression model for the dichotomized response

values Y

∗

ij

= I(Y

ij

>c) is a common choice for analysis of multilevel dichotomous data.

It assumes that Y

∗

ij

|b

i

∼ Bernoulli(π

ij

) with P (Y

ij

>c|b

i

, x

ij

)=P (Y

∗

ij

=1|b

i

, x

ij

)=

E[Y

∗

ij

]=π

ij

and:

π

ij

=(1+exp{−(x

ij

β + z

ij

b

i

)})

−1

(1)

where the marginal likelihood for this model is obtained integrating over the random

effects

m

i=1

n

i

j=1

π

y

∗

ij

ij

(1 − π

ij

)

1−y

∗

ij

f(b

i

)db

i

In general, no closed forms are available to solve the above integral because the presence

of m integrals over the q-dimensional random effects b

i

, therefore numerical integration

methods are needed (Crouch and Spiegelman, 1990).

It is well-known that in a binomial GLMM the parameters have a different

interpretation than in the corresponding marginal model where the main interest is in the

population-averaged evolutions. Fixed effects, infact, have not a marginal interpretation

since in general

π

ij

f(b

i

)db

i

= g

−1

(x

ij

β)=(1+exp{−(x

ij

β)})

−1

So, in a logistic regression random-effect model the OR interpretation is different from the

ordinary logistic regression model because now the OR is a random variable rather than a

ﬁxed parameter and this should be led to a different interpretation of the model. However

Larsen et al. (2000) discuss that the ﬁxed estimates have a nice interpretation in terms of

median of the random effects distribution. Therefore focusing on ﬁxed effects estimate

– 610 –

of the OR is meaningful even in a mixed model framework. Rather than to obtain OR

estimates from the logistic random effect model for Y

∗

ij

, extension of the idea of Moser

and Coobs (2004) leads to consider the LMM for the true continuous response,

E[Y

ij

]=x

ij

α + z

ij

a

i

(2)

where now the error terms are independent and identically distributed logistic random

variables with mean 0 and variance σ

2

. Model (2) can be used to gain OR estimates of

the model (1); simple algebra show that the OR estimates can be obtained by means of

exp{π ˆα

j

/(

√

3ˆσ)} where ˆα

j

and ˆσ are estimates from the ﬁtted linear mixed model (2);

preserving the continuous nature of Y is expected to lead to a substantial improvement

in the OR estimation in terms of unbias and mean square error of the estimator and

consequent width of the estimated conﬁdence intervals.

3. Simulations

To assess the performance of the aforementioned approach with respect to the

dichotomizing procedure, we carried out a few simulations: Table 1 shows results from

the simulated model y

i

= b

0i

+0.7x

i

+

i

where x

i

∼ U(−2, 2), b

0i

∼ N(0,σ

b0

) and

i

coming from a logistic distribution with zero mean and and scale parameter equal to

√

3σ/π, where σ

2

is the variance set to one. Three values of σ

b0

were explored. For each

of the one thousand replicates, four models were estimated: i) a random intercept linear

mixed model for the response y and explanatory variable x and ii) three random intercept

binomial GLMMs for the binary response I(y>c) where c was selected each time to

obtain 0.50, 0.30 and 0.10 percent of events. For binomial GLMMs a penalized quasi

likelihood approach has been employed, while for LMM we have assumed a Gaussian

distribution for the response where the log likelihood has a closed form expression.

Table 1

: Monte Carlo estimates of mean and mean square error of the logOR estimator under

different models (true logOR ≈ 1.270): LMM and binomial GLMM for the dichotomized response

with different percentage π of successes.

GLMM(π)

σ

b

0

LMM 0.5 0.3 0.1

0.3 mean 1.273 1.236 1.244 1.295

mse 0.078 0.148 0.166 0.516

1.0 mean 1.275 1.201 1.218 1.314

mse 0.085 0.238 0.274 0.769

2.0 mean 1.273 1.222 1.243 1.308

mse 0.081 0.354 0.399 1.368

Results highlight how the LMM approach yields estimators with negligible unbias

and mean square errors (mse) which are substantially independent of the variance of the

random intercept σ

b

0

; on the other hand, the logOR estimators via the dichotomizing

approach have greater mse which are inﬂated when the heterogeneity of the intercepts

increases and the the number of events decreases. Figure 1 displays simulation results for

σ

b

0

=0.3.

– 611 –

Figure 1: Smoothed Monte Carlo estimates (based on 1000 replications) of the sampling

distributions of the logOR for different models: LMM (continuous line), GLMM(0.5)

(dashed line); GLMM(0.3) (dotted line); GLMM(0.1) (dashed-dotted line)

.

0.5 1.0 1.5 2.0

01234

log(Odds Ratio)

Density

4. Final remarks

When the response variable is continuous and a mixed model framework is employed,

odds ratios estimates for the risk factors in the model may be obtained directly from a

linear model for the original continuous variable. The loss of information deriving from

dichotomizing appears to be substantial, and it is emphasized when random effects are

included in the model and in particular when the relevant variance σ

2

b

0

increases. A LMM

approach allows to obtain ‘better’ estimates of the OR and furthermore allows to avoid the

well-known computational problems which usually complicate the estimation of logistic

random effects models. Instead we met convergence problems in ﬁtting binomial GLMMs

for the dichotomized variable in some simulation scenarios; such estimating problems are

expected to increases when additional random effects (e.g. random slopes) are included

in the model, making model estimation really demanding. On the other hand, the linear

mixed model does not suffer from such computational burdens since the closed form

expression of the log-likelihood. These results can be relevant in practice, since both

dichotomizing of continuous outcomes and mixed modelling for longitudinal studies are

becoming integral part of medical and biological research.

References

Larsen K., Petersen, J. H., B.-Jorgesen, E., and Endahl, L. (2000) Interpreting parameters

in the logistic regression model with random effects. Biometrics, 56, 909–914

Moser, B. K. and Coobs, L.P. (2004) Odds ratios for a continuous outcome variable

without dichotomizing. Statistics in Medicine, 23, 1843–1860.

Stiratelli, R., Laird, N., and Ware, J.H. (1984) Random effects models for serial

observation with binary response. Biometrics, 40, 961–971.

Crouch,A. C. and Spiegelman, E. (1990) The evaluation of integrals of the form

f(t)exp(−t

2

)dt: application to logistic-normal models. Journal of the American

Statistical Association, 85, 464–469.

– 612 –

- CitationsCitations1
- ReferencesReferences25

- [Show abstract] [Hide abstract]
**ABSTRACT:**In his keynote address to the National Council of Teachers of Mathematics research pre-session, Sloane (2006b) challenged mathematics education researchers to ‘quantify qualitative insights’. This quasi-experimental study used blended methods to investigate the development of two-digit addition and subtraction strategies. Concurrent classroom teaching experiments were conducted in two intact first grade classrooms (n = 41) in a mid-Atlantic American public school. From a pragmatic emergent perspective, design research (Gravemeijer & Cobb, 2006) was used to develop local instructional theory. An amplified theoretical framework for early base-ten strategies is explicated. Multilevel modelling for repeated measures was used to evaluate the differences in strategy usage between classes across occasions and the association of particular pedagogical practices with the emergence of incrementing and decrementing by ten (N10) or decomposition (1010) strategies (Beishuizen, Felix, & Beishuizen, 1990).The two matched classes were not different in terms of gender, poverty, race, pre-assessment performance, and special education services. After the first unit of instruction with differentiated pedagogical tools, the collection class was significantly (p = .001) more likely to use 1010 than the linear class. No difference was demonstrated during the post-assessment. Students in both classes were more likely to use N10 during the last structured interview than in the first (p < .0001). Furthermore, there was no difference between the two classes in using any advanced strategy; however, students in both classes were more likely to use an advanced strategy at the conclusion of the study than they were initially (p = .033). The order of emergence of 1010 and N10 was not associated with the ability to develop both strategies, but there was an association (p < .001) between use of an advanced strategy and success on a district-mandated written assessment of two-digit addition and subtraction.Two original instructional sequences of contextually-based investigations are presented. Protocols transcribed from videotaped lessons and dynamic assessment interviews are presented to illuminate specific constructs detected and to illustrate the pedagogical techniques. An amplified framework for early place value constructs is proposed. Recommendations for future studies, curricular changes, and the need of early intervention are discussed.

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

This publication is from a journal that may support self archiving.

Learn more