Content uploaded by Jan Skopek

Author content

All content in this area was uploaded by Jan Skopek on Jun 29, 2016

Content may be subject to copyright.

1

One Hammer, Different Nails – A Note on the Confusing

Sociologists’ Debate on Comparing Coefficients in Logistic

Regression

Jan Skopek, European University Institute, jan.skopek@eui.eu

Working Paper, December 2015

1– Introduction1

It has been argued that in contrast to linear regression coefficients in logistic

regression models (and in general non-linear models) cannot be directly compared

across different samples, groups and models (Allison, 1999; Mood, 2010; Winship &

Mare, 1984). In a seminal paper, Mood elaborates that the issue of comparability

arises when one wants to interpret these coefficients as estimates of ‘substantial’

effects. Precisely, Mood (2010, p. 67f) asserts that it is problematic (1) to interpret

odds ratios as substantive effects, since they reflect also unobserved heterogeneity, (2)

to compare odds ratios across nested models because unobserved heterogeneity is

likely to vary across models, and (3) to compare odds ratios from the same model

across samples, groups, or over time because unobserved heterogeneity can vary

across samples, groups, or time. Using a fictive data example, the author recommends

average marginal effects as more robust effect estimates. Consequently, a reiterated

critique on odds ratios for being hard to interpret was recently corroborated by

arguing that odds ratios as effect measures are even more problematic as previously

thought because they are directly derived from logit coefficients (Norton, 2012).

Having a look at the practice in quantitative social sciences particularly

stratification research, one notices immediately that Mood’s paper (more than 700

1 While working on this paper, it came to my attention only late that Maarten L. Buis has

published in summer 2015 on his website a very similar paper “Logistic regression: Why we often can

do what we think we can do” (Buis, 2015). While there are some differences in the argumentation, both

papers share the same idea and consequently have many commonalities. Hence, I would like to direct

readers to Buis (2015), who came first and who deserves all the merits for bringing up the issue. My

paper is different in the respect that I try to put emphasis on different research agendas (model

purposes) for which logistic regression may be applied as well as to highlight some problematic issues

with regard to relying solely on average marginal effects in recent stratification studies. Yet, I decided

to cease working on this paper and to leave it in its current form for any researcher who might be

interested.

2

times cited according to Google scholar) was eminently influential in inclining

applied quantitative scholars to abandon the concept of odds ratios and turning

towards average marginal effects as a robust workaround for comparing effect sizes.

Moreover, previous studies have been accused for using odds ratios instead of average

marginal effects (for instance, see critique on Hadjar & Berger, 2010, by Auspurg &

Hinz, 2011). With reference to Mood’s work several scholars even decided to

abandon logistic regression and the concept of odds ratios and to fall back to linear

probability models for studying binary outcomes (e.g., Jacob, Klein, & Iannelli, 2015,

p. 465). Concurrently, scholars of social science methodology suggested various

methodological approaches accounting for different heterogeneity across groups

(Williams, 2009), samples (Breen, Holm, & Karlson, 2014), and nested models

(Karlson, Holm, & Breen, 2012) in estimating effects in non-linear models.

In turn, I argue that up to now this debate is unbalanced and confusing as it

lacks a more differentiated discussion on various modeling purposes. Importantly, this

is not a critique to the work of Allison, Mood and others per se. The core arguments

of the methodological debate are inherently valid and have been laid down

impressively. However, they do not apply to all research agendas being concerned

with respect to binary or categorical outcomes – and this is the simple but crucial

aspect I am referring to. The consequence is an at least questionable practice in social

science that is likely to emerge. A remarkable amount of studies misinterprets the

problematic of the logistic regression being prone to draw misleading conclusions.

Recalling its achievement once invented, it appears to be ironic that nowadays many

scholars tend to discard logistic regressions as actually appropriate tools for tackling

their research questions.

As an analytical framework, I propose a conceptual distinction between (a)

descriptive models dealing with associations among variables and (b) models with

causal primacy being concerned with effects/dependence of variables. Association

among variables describes how strong the predictive relation between two variables

is; while effects/dependence of variables illustrates how a change in one variable is

related to a change in another.2 Bearing this distinction in mind, in the following I will

develop my core argument that although comparability issues validly apply to causal

2 No doubt, the scholars should be clear about the question whether they are interested in

studying associations or effects. Nevertheless, applied research shows that these two fundamental

analytical and statistical concepts are sometimes conflated.

3

modeling purposes they do not apply to descriptive purposes. Beyond this, techniques

provided for remedying comparability issues in a causal approach are not applicable

(and may be even misleading) in a descriptive agenda. My contribution to the

literature is of particular relevance given that a large part of social stratification

research effectively pursues descriptive and not causal modeling.

In the following, I will first repeat briefly the key points of the debate regarding

the comparability issue of log-odds ratios. In a second step, I will discuss the point

that logistic regression is a statistical tool that can serve different ends, i.e. different

purposes of modelling. Afterwards, I will establish some important conceptual links

to the concepts of association and functional dependence in the world of linear

regression models. Finally, I will discuss some caveats with regard to the increasing

reliance on average marginal effects instead of odds ratios in stratification research.

2 – The Problem of Comparing Logit Coefficients

Mood motivates her paper by arguing that applied social science is not aware of the

short-comings with respect to logistic regression, when it comes to the estimation of

what she calls ‘substantive’ effects. She demonstrates the problem of comparability of

log-odds ratios starting with a latent variable model. She is concerned with a latent

variable can be understood as a continuous random variable (!") that is unobserved

but operates in the background generating observable binary outcomes. This factor

may resemble a propensity of being in a certain qualitative state (e.g. the participation

in a collective action, Mood, 2010, p. 68). Once the factor value exceeds a certain

threshold (!"# $) we observe a one otherwise a zero (!"% $). Hence, the latent

factor even if not observed analytically represents a substantial mechanism explaining

why a certain state has been observed (referring to the example this could be ‘a taste

of participation’, Mood, 2010, p.68). Starting with a simple example assuming only

one explanatory variable in the latent model:

!&

"' ( ) *+&,+) -&

(1)

with !&

"the individual propensity, *+& the value of the explanatory variable observed

for individual i, and an error term -& assuming that errors are independent of x.

Constant term ( represents the location parameter (origin) while ,+ represents the

scale parameter (unit change). If only a binary response was observed the model can

4

be estimated via logistic regression under the assumption that the error term is

logistically distributed with a fixed variance of ./

01 0234:

5678

9 : 8; ' < ) *+&=+

(2)

with P as the probability of the observed binary variable Y = 1, b the log-odds ratio,

and exp(b) the odds ratio. Since the variance of y* is not identified, the underlying

parameter , cannot be identified neither. By fixing the residual variance the logit

model fails to estimate , but instead estimates log-odds ratio b which reflects both,

the residual heterogeneity of propensity y* and the effect of x on that propensity.

More precisely, b is standardized by the ratio of the true residual variance in y* and

the fixed variance 3.29 (rescaling effect). Therefore, what the logit model estimates is:

! "

>' ? (

>) *+&

,+

>) -&

(3)

with > ' @AB7C;

D2/E the scaling factor, F

G' < , and HI

G' =+. That has several

consequential implications. First, one cannot compare logit coefficients as what she

calls ‘substantive effects’ on the latent variable between nested models. Adding more

predictors (that are correlated with y* but not with x) to the logit equation will

necessarily lead to an increase of =+ due to the rescaling effect since residual variance

J<K7-; will decrease by adding more relevant variables. This may not be

misinterpreted as a suppressor effect (i.e. the additional predictor is positively

correlated with y* but negatively correlated with x). Conversely, if =+ does not

change compared to the restricted model that does not indicate the absence of a

confounding effect of third variables (i.e. the additional predictor is positively related

to both y* and x); confounding brings the coefficient down and rescaling brings it up

again, thus, both effects might outweigh each other.

Second, one cannot directly assess effects between groups (e.g. between men

and women) by inspecting group specific log-odds ratios. Even if the effect of x on

the underlying factor (,+) is the same across group A and group B, the effect on the

log-odds scale (=+) will not be the same if one group is more heterogeneous than the

5

other. 3 Unbalanced heterogeneity across groups is very likely to be present in

observational data and, thus, will distort direct effect comparisons.4 For instance, if

the log-odds ratio is smaller in group A than group B, that can be due to a lower effect

of ,+ in group A or due to a larger residual heterogeneity in group A or a combination

of both (Figure 1 provides an example using simulated data). Alternatively, group

differences in unobserved heterogeneity and effects also may cancel each other out

when group specific log-odds ratios are equal.

Third, one cannot easily use log-odds ratios to compare effects across points in

time as heterogeneity might change over time. This holds true for comparing effects

across groups of individuals defined by time (e.g. birth cohorts) but also for

comparing effects across time for the same set of individuals (e.g. in panel studies).5

Mood suggests a list of several alternatives that are more or less capable to

account for unobserved heterogeneity or to diminish the problem. Alongside log-odds

based alternatives,6 she particularly proposes the calculation of average marginal

effects (integrating the marginal effects over data points in the sample) or their

simpler approximation via linear probability models for comparisons of effects

between groups and models.

At a first glance, the dramatic conclusions presented above appear to be

devastating for many social science and particularly social stratification studies. For

decades, scholars relied on logistic regression, log-linear modeling and odds ratios for

scrutinizing relationships among sociologically relevant categories like social class,

education, gender, or cohort. Is it possible that all that is flawed? As I will argue in

the following, the answer is “it depends”.

3 Note, that in this regard it makes no difference whether one runs a joint model including

interaction terms or separate models for groups.

4 Exceptions are situations where we can reasonably assume that unobserved heterogeneity is

balanced like in randomized trials (balanced by design).

5 Regarding the latter albeit conditional logit models might account for time-constant

heterogeneity unobserved time-varying heterogeneity might affect comparisons.

6 Log-odds based approaches like Allison’s (1999) method or heterogeneous choice models

(Williams, 2009) for group comparisons and y*-standardization (Winship & Mare, 1984) for model

comparisons.

6

Figure 1 Data example: Latent variable and binary outcomes for two groups.

Note: Underlying model is y*=3x + e (Model 1) for both groups with LM

N' ONPN

Q?? and scaling factor s=1 for

Group A and s=4 for Group B. Variable x is normally distributed with mean 0 and variance 1, identically for both

groups. Groups share the same effect of x (b*=3) on the latent variable y* but differ in the residual heterogeneity

as expressed by the R-squared (Model 1). The larger heterogeneity of Group B translates into a smaller logit

coefficient and, thus, a smaller odds ratio in the logistic regression model (Model 2).

3 – One Hammer, Different Nails

It depends on the purpose of the model and the related research agenda. Let us reflect

again on Models (1) and (2). Mood’s discussion includes the premise that (2) is used

as a technique for estimating effects or effect ratios on outcome Y* in (1). The aim

behind this approach is to assess effects on the propensity score Y* which is

conceived as an individual level but unobserved variable. In fact, if one is

substantially concerned with (1) the best solution would be finding ways to gather

more precise data, ordinal or metric instead of binary variables. That requires, first, a

rigorous theoretical and conceptual assessment of the underlying propensity under

study and, second, a measurement strategy. If neither is given (for instance in the case

R2=.74

b*=3

-30

-20

-10

0

10

20

30

y*

-4 -3 -2 -1 0 1 2 3 4

x

Group A

R2=.16

b*=3

-30

-20

-10

0

10

20

30

y*

-4 -3 -2 -1 0 1 2 3 4

x

Group B

b=2.98

OR=19.69

0

1

Y

-4 -3 -2 -1 0 1 2 3 4

x

Group A

b=.77

OR=2.17

0

1

Y

-4 -3 -2 -1 0 1 2 3 4

x

Group B

7

of truly qualitative variables), any reference to an underlying latent variable is

obviously without substantial meaning.7

However, it could be the case that a particular research agenda is substantially

not concerned with (1), the latent variable effects model, but rather with (2), the log-

odds model, which is a model particularly suited for studying associations between a

binary dependent variable and other variables. Explicitly, this model is not concerned

with an individual propensity, but with a rate: the fraction of units that do have a

certain state (e.g., are married) versus the fraction of units that does not (e.g., those

who are unmarried) resembled by the odds of having that state. Model (2) has also

been labeled in textbooks as the transformational approach in contrast to the latent

variable approach (Powers & Xie, 1999); it represents a non-linear transformation of a

probability expression into the unbounded space of log-odds. For highlighting the

conceptual difference, I will refer in the following to the latter as an association

model synonymously to a log-odds model. Prominent applications for such models

can be found, for instance, abundantly in the sociological inquiry of association

between origin and destination with respect to sociological categories like educational

attainment or social class.

To think about an association model it is helpful to remember that logistic

regression when used in an association context resembles a generalization of specific

log-linear models (Agresti, 2007; Powers & Xie, 1999). Log-linear modeling is a

statistical tool to explore and test relationships between categorical variables in a

population of units cross-classified by contingency tables. If we deal entirely with

binary or categorical variables (like gender, cohort, country, education, class) a

logistic regression model can be employed as a reduced and usually more accessible

version of a particular log-linear model to study patterns of associations in a multi-

dimensional contingency table.

Being aware of which model fits to their substantial problem at hand is crucial

for researchers. The model they pursue should be clearly stated, as it makes a

fundamental difference in interpretation of results. When are we interested rather in a

latent variable effects model (1) and when in an association model (2)? Albeit

associations and effects are often used interchangeably in practice, a clear distinction

7 One may construct cases where an underlying could be postulated on theoretical grounds but

without any means of measuring it. In a stylized fashion, that could be seen as a social science

equivalent to the dark matter discussion in cosmology.

8

is helpful and necessary, as the term of association refers to a long tradition of

categorical data analysis (Goodman & Kruskal, 1954). If the concern relates to how

certain variables affect an individual propensity driving a choice, an attitude, or a

change of state then we are basically interested in estimating the ,-coefficients of the

latent variable model (1) or a scaled version, because under usual conditions

estimating them directly is not feasible. Individuals differ in that underlying

propensity (there is a variance) and individuals within certain groups might be more

similar as individuals between certain groups as a result of unobserved heterogeneity.

Since standard logit models do not estimate the parameters of interest (,), logit

coefficients cannot be compared, as they were estimates of parameters on the latent

scale. In other words, if we have a theoretical idea about the individual propensity

(e.g., a choice function) as a causal mechanism generating observable outcomes and

we want to assess how certain variables affect that propensity (e.g., how age drives a

certain choice), we adhere to the latent variable effects model (1). If we have

observed only a binary variable (i.e., the outcome of a choice) then approximating

Model 1 by Model 2 is basically the result of a constrained measurement.

Conversely, if we are in fact concerned rather with the binary outcomes (as truly

qualitative variables) and the question of how they are related to other variables (like

gender, social class or income) and how these relationships vary across time or

groups, we may be more inclined towards applying an association model (2). This

model estimates log-odds and multiplicative effects on the odds of having an

observable state or event (odds ratios). Importantly, the odds are not necessarily an

individual property (even if one could construct), but a ratio between those having a

certain state (or event) and those who do not (RSS> 'TI

TU

'VW XY+

+ZVW XY+ and [\ ] '

9 ' TI

TU^TI

). For instance, if for a particular social group, the odds of being married is

2 the probability of being married is twice as high as the probability of not being

married in that group. If we compare with another group having marriage odds of 4,

we calculate an odds ratio of 2 (or .5 depending on the reference group). Hence, the

ratio of occurrence versus non-occurrence of marriage in the second group is twice as

large. In other words, for the second group we observe twice as many married persons

per non-married persons as compared to the first group. Note, that the odds ratio

provides a measure of how group membership and marriage are related that is

independent from the marginal distributions of both variables by standardizing the

9

ratios of probabilities of event occurrence (or relative frequencies) by the ratios of

probabilities of non-occurrence (R_ 'TI`

TU`aTUb

TIb'TI`

TIbaTUb

TU`). Hence, the odds ratio,

the factor difference in odds, provides a measure of association of categorical

variables in a population.

The association model based on log-odds ratios is an entirely different

conceptual model as compared to the latent variable effects model, as the former may

not need any reference to any hypothetical variable underlying binary or multinomial

observations. Moreover, albeit changes in odds direct in the same direction as changes

in probability, effects in the log-odds model do not translate directly into effects on

the probability scale. Furthermore, it comes as a property of an association that the

association between X and Y will increase after accounting for additional reasons of

variation in Y (e.g., by controlling for an additional variable Z that is related to Y but

not to X); hence, the log-odds ratio of X is going to rise if we account for an

additional factor that is related to Y, even if it is not related to X.8 Consequently,

disentangling direct and indirect effects of an explanatory variable after including a

potential confounding variable in a log-odds models is not as straight forward as in

linear regression (Karlson et al., 2012).

4 – Association and Effects: An Analogy to Quantitative Variables

Associations and effects are closely related concepts capturing the relationship

between statistical variables. To make the argument clearer, it might be useful to draw

an analogy to relationships among non-categorical variables. An association between

two quantitative variables Y and X (like income and well-being or age and subjective

health) measures to which extent both variables co-vary in a sample or a population.9

For quantitative variables association is usually summarized by Pearson’s product-

momentum correlation metric.

cdX ' cdX 'e ] : e ] e7f : e f ;

gdgX

'hiJdX

gdgX

8 That argument can be illustrated by reverse induction. Imagine a (hypothetical) deterministic

situation where there is only one reason (trait A) for having an event. That is, all persons having trait A

have an event while persons not having A do not have an event. As a consequence, the odds ratio for A

on having the event approaches infinity – trait A is an ultimate cause of the event and provides a

perfect prediction. Now let’s introduce an additional condition, say having a trait B that must be

present for A causing the event. Not controlling for B the odds ratio of A takes on a finite value as

having A not necessarily predicts the event. One could go on and introduce additional conditions

decreasing the predictive power of trait A and consequently the odds ratio.

9 One could say, that association relates to distributional dependence between variables.

10

Importantly, association is a symmetric concept, i.e. the association between Y

and X is the same as for X and Y, and does not imply a causal direction. Associations

can be measured conditional on third variables via a partial correlation coefficient,

which measures the correlation between X and Y while canceling out associations

with a third variable.10 The partial correlation is given by

cdXaj 'cdX : cdjcXj

79 : cdj

/;79 : cXj

/;

Notably, the formula shows that the partial correlation between X and Y is

different from the total correlation even if X and Z are not correlated at all. If cdj ' k

but cXj l k then

cdXaj 'cdX

79 : cXj

/;

Thus, even if X and Z are uncorrelated the conditional association between X

and Y (cdXaj) is always larger than the unconditional association (cdX) as long as Z

and Y are correlated. Due to the symmetry property of association, this holds also true

in the reverse if Y and Z are uncorrelated but X and Z correlated. This ‘inflation’ of

the partial correlation makes sense intuitively from an association perspective: if one

particular reason (e.g. age) for variation in Y (e.g. well-being) is canceled out the

remaining reasons (e.g. income) gain in relative relevance and our predictions based

on that additional information will improve. The conditional association between X

and Y equals the unconditional association only in two special cases: either X and Z

as well as Y and Z are fully uncorrelated or, if all variables are correlated, the

confounding relationship of Z on X and Y offsets exactly the inflation effect.

In contrast to the association, the effect of X on Y (or the conditional

expectation E[Y|X]), expressed by slope coefficients in regression of Y on X, displays

how a unit change in X is related to a unit change in Y. In this regard, a variable Y is

expressed as a function of X like in linear regression.11 A regression model like the

following can express the functional dependence of Y on X:

! ' ( ) ,* ) -

The effect on Y of a unit increase in X is defined as

e ] * ) 9 : e ]m* ' ,

10 There are also semi-partial correlations, which hold the third variable constant for X or Y but

not for both.

11 One could say, that an effect relationship relates to functional dependence between variables.

11

Hence, the effect measures how the conditional expectation of Y is changing

with a change in X. In simple linear regression, the effect is defined as

, ' hiJdX

gd

/

which equals the correlation only if X and Y have the same variance. Note, that

in contrast to associations, effects are not symmetric (the effect of X on Y is not

necessarily the same effect of Y on X). 12 Moreover, estimating effects using

regression presupposes a classification of variables by dependent and independent –

‘explanatory’ – variables. This requires at least the identification of a sensible causal

arrow;13 a regression model estimating the effect of an individual’s education on the

same individual’s birth weight or a model regressing the probability of treatment on

the outcome would make no sense on substantial grounds. Particularly with regard to

the estimation of a causal effect it is fundamental to account for third variables that

potentially confound the functional relationship between X and Y. Nonetheless,

contrary to the association if we add a third variable Z that is not correlated with X to

the equation, the effect of X on Y conditional on Z equals the unconditional effect of

X on Y. Nonetheless, the predictive power of the model will increase if Z is correlated

with Y and this might be desirable in some applications.

Association and effect share the direction of the relationship between X and Y.

A positive (negative) correlation between X and Y implies a positive (negative)

functional dependence (‘effect’) between X and Y. In practice the notions of

association and effect may occasionally be used interchangeably, yet a clear analytical

distinction between both concepts remains crucial. An assessment of an association

practically makes sense only for variables that have a random distribution (random

variables). For instance, one might be interested in the association between class

background and educational attainment in an observational study, but usually one

would not be interested in the association between treatment (the distribution is fixed

by design) and outcome in a randomized trial; it is the treatment’s effect on the

outcome that is under study.

12 Nevertheless, in simple linear regression fully standardized coefficients equal the correlation

coefficient.

13 Of course, in practical applications relying on observational data estimation of causal effects

is usually limited by problems of endogeneity and the failure to account for confounding variables.

12

Importantly, while an association between X and Y might be weak (as

expressed by a small correlation or a small R2 in a simple linear regression), the linear

effect of X on Y could be large and the other way around. This sometimes overseen

implication is illustrated by the two simple scenarios depicted in the upper panel of

Figure 1. While the effect of X on Y is the same for both groups, the association is

clearly stronger in Group A. The reason for that is simply, that association between X

and Y depends on the co-variability between X and Y relative to both the variability

of X and Y while the effect of X on Y depends only on the co-variability of Y and X

relative to the variability of X. Consequently, given a certain effect of X on Y the

association between X and Y is lower (higher) if the residual (potentially unobserved)

heterogeneity of Y higher (lower). For instance, the effect of income on subjective

well-being could be sizable but the association between both variables could be weak

at the same time if there are many other factors that are linked to well-being.

Furthermore, the use of associations might be preferable in comparative research

settings if effects, expressed in units of Y, may not be comparable across groups or

countries.

Summing up, association (predictive power) and effects (functional

dependence) can be treated separately in the case of linear regression. Ceteris paribus,

larger residual heterogeneity will affect only the correlation between X and Y but not

the effect of X on Y. Similarly, the regression coefficient of X in a linear regression

model will not change by adding additional covariates that are related to Y but not to

X. Things are different in the context of binary outcomes and logistic regression.

Unlike to linear regression, the concepts of association and effect are interwoven in

logistic regression when one refers to the latent variable effects model (Model 1).

This discussion does not apply when referring to a log-odds model (Model 2),

modelling associations between independent variables and a binary outcome. For

instance, if Y and X are qualitative or categorical variables (like social background

and entrance into college or husband’s and wife’s educational level) association

relates to co-occurrence of states or categories. Associations between purely

qualitative variables can be studied via a contingency table which cross classifies

observations by combinations of categories. In this regard, a common metric of

association is the ratio of odds that can be derived from a contingency table. A major

part of sociological stratification literature is concerned with this concept of

13

association particularly in the context of log-linear modeling for assessing

associations within multidimensional tables.

5 – Average Marginal Effects versus Odds Ratios

In response to Mood’s discussion, Auspurg & Hinz (2011) among others proposed the

use of average marginal effects instead of log-odds ratios as a more refined effect

measure in the context of group or cohort comparisons. Replicating a previous work

from Hadjar & Berger (2010) studying cohort trends in social inequalities in

educational attainment, Auspurg & Hinz (2011) argue that relying on odds ratios as

“effect measures” is fallible as they suffer from unobserved heterogeneity between

cohorts. During their critique the go on arguing that reporting average marginal

effects should be favored in studies of social stratification as they are much more

immune against unobserved heterogeneity.

First, the claim that AME are immune from unobserved heterogeneity is not

generally true. The example given above in Figure 1 provides a counter example; the

AME of x on the probability of Y=1 is .34 in Group A compared to .17 in Group B –

albeit the effect on the underlying is the same by design. Consequently, if we were

interested in effects on the propensity y*, a comparison based on AMEs yielded the

misleading conclusion that the effect of x on y* is smaller by half in Group B.

Second, Auspurg & Hinz (2011) discuss the latent variable effects model, but

miss a conceptual specification of how this model applies to the problem at hand:

assessing trends in educational inequalities. Unobserved heterogeneity – which they

treat as a nuisance factor potentially distorting effects – may be part of the

phenomenon under study; if other factors get more relevant for educational attainment

the relative influence of social origin on educational attainment will diminish. Social

origin becomes less predictive for attainment and log-odds ratios will decline over

cohort samples. As a measure of association the odds ratio not only captures the effect

of social origin but also the effects of all other unobserved factors (Buis, 2015).

Thinking about educational attainment as the outcome of a latent variable, one could

legitimately ask whether a decline in inequality is driven by increasing heterogeneity

of educational pathways, a declining causal effect of social background, or a mixed

result of both changing heterogeneity and effects. Yet, imposing this question is only

sensible with a theoretical reference to the (unmeasured) underlying variable.

Alternatively, if one wants to study on a rather descriptive level how the association

14

between social origin and educational attainment is changing, there is no particular

problem of relying on odds ratios.

Third, a strong reliance on average marginal effects is by no means without

problems in comparative settings. For instance, Auspurg & Hinz (2011) assess how

the average marginal effect of social class on the probability of attaining higher

education change over cohorts. Yet, AME measures an absolute distance on the

probability scale that is bound between zero and one. However, a certain distance

might have different meanings (e.g., in terms of relative risks) depending on the

location on the scale. Comparing an average probability difference of .10 among

contexts with varying baseline probabilities is not straightforward given the non-

linear feature of a probability.

Fourth, while log-odds ratios are parameters of a probability function, average

marginal effects are not. An AME does not depend only on parameters of the

probability function but also on the joint distribution of covariates in a sample. AME

could rather been seen as implications of parameters being put in a context (e.g., a

sample of individuals in a particular country or cohort). 14 Hence, while being

illustrative for this particular set of data their analytical value in terms of finding more

general parameters that can be abstracted from the data and used for prediction is

strongly limited. This limitation is rendered even more problematic when the variance

of marginal effects around the average is large (that is, if a lot of heterogeneity is

captured by the model). Furthermore, one should keep in mind that statistical tests are

different for logit coefficients and AMEs which might yield contradictory inferential

conclusions.

Finally, in the field of stratification research, the distinction between absolute

and relative inequality among groups should be kept in mind. For instance, a factor

difference of 2 in the odds of attaining high education versus lower education of

Group A versus Group B translates into a probability difference about .38 if the

probability in the reference group is .50. Instead, if the probability were .80 and odds

ratio of 2 translates into a probability difference of about .17. While the relative

inequality expressed by the odds ratio is the same in both settings, the absolute

inequality expressed by the probability difference is lower in the second setting. By

focusing only on the probability differences one misses an important feature of the

14 In addition, the AME resembles only the central tendency of the effects in the sample and not

their variation.

15

data. For that reason, research agendas should be clear in the concept they are

pursuing.

6 – Conclusions

The aim of my paper was to shed some light on a confusing debate on the

comparability of logit coefficients that has been emerging in the recent years. As I

have argued the issues raised by Mood (2010) and others do not apply to all research

agendas. Importantly, logistic regression can serve different ends. It can be used to fit

log-odds models analyzing rates of event occurrence in populations. It may also be

used as a model to estimate effects on a propensity score, which is unobserved but

assumed to generate binary observations. Both are very different methodological

approaches, that are likely to be conflated, though relying on similar statistical

techniques. I would like to conclude that researchers should try to think about their

dependent variable more carefully. For instance, educational mobility studies

analyzing the link of origin and destination, may clarify first, whether it the different

rates of attainment by origin or whether it is rather the causal effect of origin on the

process of attainment itself that is in the center of the investigation. A log-odds model

serves the first purpose whereas a latent-variable model serves the second. In

addition, I discussed several problems that may result from an increasing tendency of

relying on average marginal effects in stratification research.

References

Agresti, A. (2007). An Introduction to Categorical Data Analysis Second Edition.

Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups.

Sociological Methods & Research, 28, 186–208.

http://doi.org/10.1177/0049124199028002003

Auspurg, K., & Hinz, T. (2011). Gruppenvergleiche bei Regressionen mit binären

abhängigen Variablen – Probleme und Fehleinschätzungen am Beispiel von

Bildungschancen im Kohortenverlauf. Zeitschrift Für Soziologie, 40, 62–73.

Retrieved from http://zfs-online.org/index.php/zfs/article/view/3058

Breen, R., Holm, A., & Karlson, K. B. (2014). Correlations and Non-Linear

Probability Models. Sociological Methods & Research, 43(4), 571–605.

16

http://doi.org/10.1177/0049124114544224

Buis, M. L. (2015). Logistic regression : Why we often can do what we think we.

Goodman, L. A., & Kruskal, W. H. (1954). Measures of Association for Cross

Classifications. Journal of the American Statistical Association, 49(268), 732–

764.

Hadjar, A., & Berger, J. (2010). Dauerhafte Bildungsungleichheiten in

Westdeutschland , Ostdeutschland und der Schweiz : Eine Kohortenbetrachtung

der Ungleichheitsdimensionen soziale Herkunft und Geschlecht. Zeitschrift Für

Soziologie, 39(3), 182–201.

Jacob, M., Klein, M., & Iannelli, C. (2015). The Impact of Social Origin on

Graduates’ Early Occupational Destinations - An Anglo-German Comparison.

European Sociological Review, 31(4), 460–476.

http://doi.org/10.1093/esr/jcv006

Karlson, K. B., Holm, a., & Breen, R. (2012). Comparing Regression Coefficients

Between Same-sample Nested Models Using Logit and Probit: A New Method.

Sociological Methodology, 42, 286–313.

http://doi.org/10.1177/0081175012444861

Mood, C. (2010). Logistic regression: Why we cannot do what We think we can do,

and what we can do about it. European Sociological Review, 26(ii), 67–82.

http://doi.org/10.1093/esr/jcp006

Norton, E. C. (2012). Log Odds and Ends. NBER Working Paper Series. Retrieved

from

http://papers.nber.org/papers/W18252?utm_campaign=ntw&utm_medium=email

&utm_source=ntw

Powers, D. A., & Xie, Y. (1999). Statistical Methods for Categorical Data Analysis.

Academic Press.

Williams, R. (2009). Using Heterogeneous Choice Models to Compare Logit and

Probit Coefficients Across Groups. Sociological Methods & Research, 37, 531–

559. http://doi.org/10.1177/0049124109335735

Winship, C., & Mare, R. D. (1984). Regression Models with Ordinal Variables.

American Sociological Review, 49, 512. http://doi.org/10.2307/2095465

17