Working PaperPDF Available

One Hammer, Different Nails – A Note on the Confusing Sociologists’ Debate on Comparing Coefficients in Logistic Regression

One Hammer, Different NailsA Note on the Confusing
SociologistsDebate on Comparing Coefficients in Logistic
Jan Skopek, European University Institute,
Working Paper, December 2015
1– Introduction1
It has been argued that in contrast to linear regression coefficients in logistic
regression models (and in general non-linear models) cannot be directly compared
across different samples, groups and models (Allison, 1999; Mood, 2010; Winship &
Mare, 1984). In a seminal paper, Mood elaborates that the issue of comparability
arises when one wants to interpret these coefficients as estimates of ‘substantial’
effects. Precisely, Mood (2010, p. 67f) asserts that it is problematic (1) to interpret
odds ratios as substantive effects, since they reflect also unobserved heterogeneity, (2)
to compare odds ratios across nested models because unobserved heterogeneity is
likely to vary across models, and (3) to compare odds ratios from the same model
across samples, groups, or over time because unobserved heterogeneity can vary
across samples, groups, or time. Using a fictive data example, the author recommends
average marginal effects as more robust effect estimates. Consequently, a reiterated
critique on odds ratios for being hard to interpret was recently corroborated by
arguing that odds ratios as effect measures are even more problematic as previously
thought because they are directly derived from logit coefficients (Norton, 2012).
Having a look at the practice in quantitative social sciences particularly
stratification research, one notices immediately that Mood’s paper (more than 700
1 While working on this paper, it came to my attention only late that Maarten L. Buis has
published in summer 2015 on his website a very similar paper “Logistic regression: Why we often can
do what we think we can do(Buis, 2015). While there are some differences in the argumentation, both
papers share the same idea and consequently have many commonalities. Hence, I would like to direct
readers to Buis (2015), who came first and who deserves all the merits for bringing up the issue. My
paper is different in the respect that I try to put emphasis on different research agendas (model
purposes) for which logistic regression may be applied as well as to highlight some problematic issues
with regard to relying solely on average marginal effects in recent stratification studies. Yet, I decided
to cease working on this paper and to leave it in its current form for any researcher who might be
times cited according to Google scholar) was eminently influential in inclining
applied quantitative scholars to abandon the concept of odds ratios and turning
towards average marginal effects as a robust workaround for comparing effect sizes.
Moreover, previous studies have been accused for using odds ratios instead of average
marginal effects (for instance, see critique on Hadjar & Berger, 2010, by Auspurg &
Hinz, 2011). With reference to Mood’s work several scholars even decided to
abandon logistic regression and the concept of odds ratios and to fall back to linear
probability models for studying binary outcomes (e.g., Jacob, Klein, & Iannelli, 2015,
p. 465). Concurrently, scholars of social science methodology suggested various
methodological approaches accounting for different heterogeneity across groups
(Williams, 2009), samples (Breen, Holm, & Karlson, 2014), and nested models
(Karlson, Holm, & Breen, 2012) in estimating effects in non-linear models.
In turn, I argue that up to now this debate is unbalanced and confusing as it
lacks a more differentiated discussion on various modeling purposes. Importantly, this
is not a critique to the work of Allison, Mood and others per se. The core arguments
of the methodological debate are inherently valid and have been laid down
impressively. However, they do not apply to all research agendas being concerned
with respect to binary or categorical outcomes and this is the simple but crucial
aspect I am referring to. The consequence is an at least questionable practice in social
science that is likely to emerge. A remarkable amount of studies misinterprets the
problematic of the logistic regression being prone to draw misleading conclusions.
Recalling its achievement once invented, it appears to be ironic that nowadays many
scholars tend to discard logistic regressions as actually appropriate tools for tackling
their research questions.
As an analytical framework, I propose a conceptual distinction between (a)
descriptive models dealing with associations among variables and (b) models with
causal primacy being concerned with effects/dependence of variables. Association
among variables describes how strong the predictive relation between two variables
is; while effects/dependence of variables illustrates how a change in one variable is
related to a change in another.2 Bearing this distinction in mind, in the following I will
develop my core argument that although comparability issues validly apply to causal
2 No doubt, the scholars should be clear about the question whether they are interested in
studying associations or effects. Nevertheless, applied research shows that these two fundamental
analytical and statistical concepts are sometimes conflated.
modeling purposes they do not apply to descriptive purposes. Beyond this, techniques
provided for remedying comparability issues in a causal approach are not applicable
(and may be even misleading) in a descriptive agenda. My contribution to the
literature is of particular relevance given that a large part of social stratification
research effectively pursues descriptive and not causal modeling.
In the following, I will first repeat briefly the key points of the debate regarding
the comparability issue of log-odds ratios. In a second step, I will discuss the point
that logistic regression is a statistical tool that can serve different ends, i.e. different
purposes of modelling. Afterwards, I will establish some important conceptual links
to the concepts of association and functional dependence in the world of linear
regression models. Finally, I will discuss some caveats with regard to the increasing
reliance on average marginal effects instead of odds ratios in stratification research.
2 – The Problem of Comparing Logit Coefficients
Mood motivates her paper by arguing that applied social science is not aware of the
short-comings with respect to logistic regression, when it comes to the estimation of
what she calls ‘substantive’ effects. She demonstrates the problem of comparability of
log-odds ratios starting with a latent variable model. She is concerned with a latent
variable can be understood as a continuous random variable (!") that is unobserved
but operates in the background generating observable binary outcomes. This factor
may resemble a propensity of being in a certain qualitative state (e.g. the participation
in a collective action, Mood, 2010, p. 68). Once the factor value exceeds a certain
threshold (!"# $) we observe a one otherwise a zero (!"% $). Hence, the latent
factor even if not observed analytically represents a substantial mechanism explaining
why a certain state has been observed (referring to the example this could be ‘a taste
of participation’, Mood, 2010, p.68). Starting with a simple example assuming only
one explanatory variable in the latent model:
"' ( ) *+&,+) -&
with !&
"the individual propensity, *+& the value of the explanatory variable observed
for individual i, and an error term -& assuming that errors are independent of x.
Constant term ( represents the location parameter (origin) while ,+ represents the
scale parameter (unit change). If only a binary response was observed the model can
be estimated via logistic regression under the assumption that the error term is
logistically distributed with a fixed variance of ./
01 0234:
9 : 8; ' < ) *+&=+
with P as the probability of the observed binary variable Y = 1, b the log-odds ratio,
and exp(b) the odds ratio. Since the variance of y* is not identified, the underlying
parameter , cannot be identified neither. By fixing the residual variance the logit
model fails to estimate , but instead estimates log-odds ratio b which reflects both,
the residual heterogeneity of propensity y* and the effect of x on that propensity.
More precisely, b is standardized by the ratio of the true residual variance in y* and
the fixed variance 3.29 (rescaling effect). Therefore, what the logit model estimates is:
! "
>' ? (
>) *+&
>) -&
with > ' @AB7C;
D2/E the scaling factor, F
G' < , and HI
G' =+. That has several
consequential implications. First, one cannot compare logit coefficients as what she
calls ‘substantive effects’ on the latent variable between nested models. Adding more
predictors (that are correlated with y* but not with x) to the logit equation will
necessarily lead to an increase of =+ due to the rescaling effect since residual variance
J<K7-; will decrease by adding more relevant variables. This may not be
misinterpreted as a suppressor effect (i.e. the additional predictor is positively
correlated with y* but negatively correlated with x). Conversely, if =+ does not
change compared to the restricted model that does not indicate the absence of a
confounding effect of third variables (i.e. the additional predictor is positively related
to both y* and x); confounding brings the coefficient down and rescaling brings it up
again, thus, both effects might outweigh each other.
Second, one cannot directly assess effects between groups (e.g. between men
and women) by inspecting group specific log-odds ratios. Even if the effect of x on
the underlying factor (,+) is the same across group A and group B, the effect on the
log-odds scale (=+) will not be the same if one group is more heterogeneous than the
other. 3 Unbalanced heterogeneity across groups is very likely to be present in
observational data and, thus, will distort direct effect comparisons.4 For instance, if
the log-odds ratio is smaller in group A than group B, that can be due to a lower effect
of ,+ in group A or due to a larger residual heterogeneity in group A or a combination
of both (Figure 1 provides an example using simulated data). Alternatively, group
differences in unobserved heterogeneity and effects also may cancel each other out
when group specific log-odds ratios are equal.
Third, one cannot easily use log-odds ratios to compare effects across points in
time as heterogeneity might change over time. This holds true for comparing effects
across groups of individuals defined by time (e.g. birth cohorts) but also for
comparing effects across time for the same set of individuals (e.g. in panel studies).5
Mood suggests a list of several alternatives that are more or less capable to
account for unobserved heterogeneity or to diminish the problem. Alongside log-odds
based alternatives,6 she particularly proposes the calculation of average marginal
effects (integrating the marginal effects over data points in the sample) or their
simpler approximation via linear probability models for comparisons of effects
between groups and models.
At a first glance, the dramatic conclusions presented above appear to be
devastating for many social science and particularly social stratification studies. For
decades, scholars relied on logistic regression, log-linear modeling and odds ratios for
scrutinizing relationships among sociologically relevant categories like social class,
education, gender, or cohort. Is it possible that all that is flawed? As I will argue in
the following, the answer is “it depends”.
3 Note, that in this regard it makes no difference whether one runs a joint model including
interaction terms or separate models for groups.
4 Exceptions are situations where we can reasonably assume that unobserved heterogeneity is
balanced like in randomized trials (balanced by design).
5 Regarding the latter albeit conditional logit models might account for time-constant
heterogeneity unobserved time-varying heterogeneity might affect comparisons.
6 Log-odds based approaches like Allison’s (1999) method or heterogeneous choice models
(Williams, 2009) for group comparisons and y*-standardization (Winship & Mare, 1984) for model
Figure 1 Data example: Latent variable and binary outcomes for two groups.
Note: Underlying model is y*=3x + e (Model 1) for both groups with LM
Q?? and scaling factor s=1 for
Group A and s=4 for Group B. Variable x is normally distributed with mean 0 and variance 1, identically for both
groups. Groups share the same effect of x (b*=3) on the latent variable y* but differ in the residual heterogeneity
as expressed by the R-squared (Model 1). The larger heterogeneity of Group B translates into a smaller logit
coefficient and, thus, a smaller odds ratio in the logistic regression model (Model 2).
3 – One Hammer, Different Nails
It depends on the purpose of the model and the related research agenda. Let us reflect
again on Models (1) and (2). Mood’s discussion includes the premise that (2) is used
as a technique for estimating effects or effect ratios on outcome Y* in (1). The aim
behind this approach is to assess effects on the propensity score Y* which is
conceived as an individual level but unobserved variable. In fact, if one is
substantially concerned with (1) the best solution would be finding ways to gather
more precise data, ordinal or metric instead of binary variables. That requires, first, a
rigorous theoretical and conceptual assessment of the underlying propensity under
study and, second, a measurement strategy. If neither is given (for instance in the case
-4 -3 -2 -1 0 1 2 3 4
Group A
-4 -3 -2 -1 0 1 2 3 4
Group B
-4 -3 -2 -1 0 1 2 3 4
Group A
-4 -3 -2 -1 0 1 2 3 4
Group B
of truly qualitative variables), any reference to an underlying latent variable is
obviously without substantial meaning.7
However, it could be the case that a particular research agenda is substantially
not concerned with (1), the latent variable effects model, but rather with (2), the log-
odds model, which is a model particularly suited for studying associations between a
binary dependent variable and other variables. Explicitly, this model is not concerned
with an individual propensity, but with a rate: the fraction of units that do have a
certain state (e.g., are married) versus the fraction of units that does not (e.g., those
who are unmarried) resembled by the odds of having that state. Model (2) has also
been labeled in textbooks as the transformational approach in contrast to the latent
variable approach (Powers & Xie, 1999); it represents a non-linear transformation of a
probability expression into the unbounded space of log-odds. For highlighting the
conceptual difference, I will refer in the following to the latter as an association
model synonymously to a log-odds model. Prominent applications for such models
can be found, for instance, abundantly in the sociological inquiry of association
between origin and destination with respect to sociological categories like educational
attainment or social class.
To think about an association model it is helpful to remember that logistic
regression when used in an association context resembles a generalization of specific
log-linear models (Agresti, 2007; Powers & Xie, 1999). Log-linear modeling is a
statistical tool to explore and test relationships between categorical variables in a
population of units cross-classified by contingency tables. If we deal entirely with
binary or categorical variables (like gender, cohort, country, education, class) a
logistic regression model can be employed as a reduced and usually more accessible
version of a particular log-linear model to study patterns of associations in a multi-
dimensional contingency table.
Being aware of which model fits to their substantial problem at hand is crucial
for researchers. The model they pursue should be clearly stated, as it makes a
fundamental difference in interpretation of results. When are we interested rather in a
latent variable effects model (1) and when in an association model (2)? Albeit
associations and effects are often used interchangeably in practice, a clear distinction
7 One may construct cases where an underlying could be postulated on theoretical grounds but
without any means of measuring it. In a stylized fashion, that could be seen as a social science
equivalent to the dark matter discussion in cosmology.
is helpful and necessary, as the term of association refers to a long tradition of
categorical data analysis (Goodman & Kruskal, 1954). If the concern relates to how
certain variables affect an individual propensity driving a choice, an attitude, or a
change of state then we are basically interested in estimating the ,-coefficients of the
latent variable model (1) or a scaled version, because under usual conditions
estimating them directly is not feasible. Individuals differ in that underlying
propensity (there is a variance) and individuals within certain groups might be more
similar as individuals between certain groups as a result of unobserved heterogeneity.
Since standard logit models do not estimate the parameters of interest (,), logit
coefficients cannot be compared, as they were estimates of parameters on the latent
scale. In other words, if we have a theoretical idea about the individual propensity
(e.g., a choice function) as a causal mechanism generating observable outcomes and
we want to assess how certain variables affect that propensity (e.g., how age drives a
certain choice), we adhere to the latent variable effects model (1). If we have
observed only a binary variable (i.e., the outcome of a choice) then approximating
Model 1 by Model 2 is basically the result of a constrained measurement.
Conversely, if we are in fact concerned rather with the binary outcomes (as truly
qualitative variables) and the question of how they are related to other variables (like
gender, social class or income) and how these relationships vary across time or
groups, we may be more inclined towards applying an association model (2). This
model estimates log-odds and multiplicative effects on the odds of having an
observable state or event (odds ratios). Importantly, the odds are not necessarily an
individual property (even if one could construct), but a ratio between those having a
certain state (or event) and those who do not (RSS> 'TI
+ZVW XY+ and [\ ] '
9 ' TI
). For instance, if for a particular social group, the odds of being married is
2 the probability of being married is twice as high as the probability of not being
married in that group. If we compare with another group having marriage odds of 4,
we calculate an odds ratio of 2 (or .5 depending on the reference group). Hence, the
ratio of occurrence versus non-occurrence of marriage in the second group is twice as
large. In other words, for the second group we observe twice as many married persons
per non-married persons as compared to the first group. Note, that the odds ratio
provides a measure of how group membership and marriage are related that is
independent from the marginal distributions of both variables by standardizing the
ratios of probabilities of event occurrence (or relative frequencies) by the ratios of
probabilities of non-occurrence (R_ 'TI`
TU`). Hence, the odds ratio,
the factor difference in odds, provides a measure of association of categorical
variables in a population.
The association model based on log-odds ratios is an entirely different
conceptual model as compared to the latent variable effects model, as the former may
not need any reference to any hypothetical variable underlying binary or multinomial
observations. Moreover, albeit changes in odds direct in the same direction as changes
in probability, effects in the log-odds model do not translate directly into effects on
the probability scale. Furthermore, it comes as a property of an association that the
association between X and Y will increase after accounting for additional reasons of
variation in Y (e.g., by controlling for an additional variable Z that is related to Y but
not to X); hence, the log-odds ratio of X is going to rise if we account for an
additional factor that is related to Y, even if it is not related to X.8 Consequently,
disentangling direct and indirect effects of an explanatory variable after including a
potential confounding variable in a log-odds models is not as straight forward as in
linear regression (Karlson et al., 2012).
4 – Association and Effects: An Analogy to Quantitative Variables
Associations and effects are closely related concepts capturing the relationship
between statistical variables. To make the argument clearer, it might be useful to draw
an analogy to relationships among non-categorical variables. An association between
two quantitative variables Y and X (like income and well-being or age and subjective
health) measures to which extent both variables co-vary in a sample or a population.9
For quantitative variables association is usually summarized by Pearson’s product-
momentum correlation metric.
cdX ' cdX 'e ] : e ] e7f : e f ;
8 That argument can be illustrated by reverse induction. Imagine a (hypothetical) deterministic
situation where there is only one reason (trait A) for having an event. That is, all persons having trait A
have an event while persons not having A do not have an event. As a consequence, the odds ratio for A
on having the event approaches infinity trait A is an ultimate cause of the event and provides a
perfect prediction. Now let’s introduce an additional condition, say having a trait B that must be
present for A causing the event. Not controlling for B the odds ratio of A takes on a finite value as
having A not necessarily predicts the event. One could go on and introduce additional conditions
decreasing the predictive power of trait A and consequently the odds ratio.
9 One could say, that association relates to distributional dependence between variables.
Importantly, association is a symmetric concept, i.e. the association between Y
and X is the same as for X and Y, and does not imply a causal direction. Associations
can be measured conditional on third variables via a partial correlation coefficient,
which measures the correlation between X and Y while canceling out associations
with a third variable.10 The partial correlation is given by
cdXaj 'cdX : cdjcXj
79 : cdj
/;79 : cXj
Notably, the formula shows that the partial correlation between X and Y is
different from the total correlation even if X and Z are not correlated at all. If cdj ' k
but cXj l k then
cdXaj 'cdX
79 : cXj
Thus, even if X and Z are uncorrelated the conditional association between X
and Y (cdXaj) is always larger than the unconditional association (cdX) as long as Z
and Y are correlated. Due to the symmetry property of association, this holds also true
in the reverse if Y and Z are uncorrelated but X and Z correlated. This ‘inflation’ of
the partial correlation makes sense intuitively from an association perspective: if one
particular reason (e.g. age) for variation in Y (e.g. well-being) is canceled out the
remaining reasons (e.g. income) gain in relative relevance and our predictions based
on that additional information will improve. The conditional association between X
and Y equals the unconditional association only in two special cases: either X and Z
as well as Y and Z are fully uncorrelated or, if all variables are correlated, the
confounding relationship of Z on X and Y offsets exactly the inflation effect.
In contrast to the association, the effect of X on Y (or the conditional
expectation E[Y|X]), expressed by slope coefficients in regression of Y on X, displays
how a unit change in X is related to a unit change in Y. In this regard, a variable Y is
expressed as a function of X like in linear regression.11 A regression model like the
following can express the functional dependence of Y on X:
! ' ( ) ,* ) -
The effect on Y of a unit increase in X is defined as
e ] * ) 9 : e ]m* ' ,
10 There are also semi-partial correlations, which hold the third variable constant for X or Y but
not for both.
11 One could say, that an effect relationship relates to functional dependence between variables.
Hence, the effect measures how the conditional expectation of Y is changing
with a change in X. In simple linear regression, the effect is defined as
, ' hiJdX
which equals the correlation only if X and Y have the same variance. Note, that
in contrast to associations, effects are not symmetric (the effect of X on Y is not
necessarily the same effect of Y on X). 12 Moreover, estimating effects using
regression presupposes a classification of variables by dependent and independent
‘explanatory’ – variables. This requires at least the identification of a sensible causal
arrow;13 a regression model estimating the effect of an individual’s education on the
same individual’s birth weight or a model regressing the probability of treatment on
the outcome would make no sense on substantial grounds. Particularly with regard to
the estimation of a causal effect it is fundamental to account for third variables that
potentially confound the functional relationship between X and Y. Nonetheless,
contrary to the association if we add a third variable Z that is not correlated with X to
the equation, the effect of X on Y conditional on Z equals the unconditional effect of
X on Y. Nonetheless, the predictive power of the model will increase if Z is correlated
with Y and this might be desirable in some applications.
Association and effect share the direction of the relationship between X and Y.
A positive (negative) correlation between X and Y implies a positive (negative)
functional dependence (‘effect’) between X and Y. In practice the notions of
association and effect may occasionally be used interchangeably, yet a clear analytical
distinction between both concepts remains crucial. An assessment of an association
practically makes sense only for variables that have a random distribution (random
variables). For instance, one might be interested in the association between class
background and educational attainment in an observational study, but usually one
would not be interested in the association between treatment (the distribution is fixed
by design) and outcome in a randomized trial; it is the treatment’s effect on the
outcome that is under study.
12 Nevertheless, in simple linear regression fully standardized coefficients equal the correlation
13 Of course, in practical applications relying on observational data estimation of causal effects
is usually limited by problems of endogeneity and the failure to account for confounding variables.
Importantly, while an association between X and Y might be weak (as
expressed by a small correlation or a small R2 in a simple linear regression), the linear
effect of X on Y could be large and the other way around. This sometimes overseen
implication is illustrated by the two simple scenarios depicted in the upper panel of
Figure 1. While the effect of X on Y is the same for both groups, the association is
clearly stronger in Group A. The reason for that is simply, that association between X
and Y depends on the co-variability between X and Y relative to both the variability
of X and Y while the effect of X on Y depends only on the co-variability of Y and X
relative to the variability of X. Consequently, given a certain effect of X on Y the
association between X and Y is lower (higher) if the residual (potentially unobserved)
heterogeneity of Y higher (lower). For instance, the effect of income on subjective
well-being could be sizable but the association between both variables could be weak
at the same time if there are many other factors that are linked to well-being.
Furthermore, the use of associations might be preferable in comparative research
settings if effects, expressed in units of Y, may not be comparable across groups or
Summing up, association (predictive power) and effects (functional
dependence) can be treated separately in the case of linear regression. Ceteris paribus,
larger residual heterogeneity will affect only the correlation between X and Y but not
the effect of X on Y. Similarly, the regression coefficient of X in a linear regression
model will not change by adding additional covariates that are related to Y but not to
X. Things are different in the context of binary outcomes and logistic regression.
Unlike to linear regression, the concepts of association and effect are interwoven in
logistic regression when one refers to the latent variable effects model (Model 1).
This discussion does not apply when referring to a log-odds model (Model 2),
modelling associations between independent variables and a binary outcome. For
instance, if Y and X are qualitative or categorical variables (like social background
and entrance into college or husband’s and wife’s educational level) association
relates to co-occurrence of states or categories. Associations between purely
qualitative variables can be studied via a contingency table which cross classifies
observations by combinations of categories. In this regard, a common metric of
association is the ratio of odds that can be derived from a contingency table. A major
part of sociological stratification literature is concerned with this concept of
association particularly in the context of log-linear modeling for assessing
associations within multidimensional tables.
5 – Average Marginal Effects versus Odds Ratios
In response to Mood’s discussion, Auspurg & Hinz (2011) among others proposed the
use of average marginal effects instead of log-odds ratios as a more refined effect
measure in the context of group or cohort comparisons. Replicating a previous work
from Hadjar & Berger (2010) studying cohort trends in social inequalities in
educational attainment, Auspurg & Hinz (2011) argue that relying on odds ratios as
“effect measures” is fallible as they suffer from unobserved heterogeneity between
cohorts. During their critique the go on arguing that reporting average marginal
effects should be favored in studies of social stratification as they are much more
immune against unobserved heterogeneity.
First, the claim that AME are immune from unobserved heterogeneity is not
generally true. The example given above in Figure 1 provides a counter example; the
AME of x on the probability of Y=1 is .34 in Group A compared to .17 in Group B
albeit the effect on the underlying is the same by design. Consequently, if we were
interested in effects on the propensity y*, a comparison based on AMEs yielded the
misleading conclusion that the effect of x on y* is smaller by half in Group B.
Second, Auspurg & Hinz (2011) discuss the latent variable effects model, but
miss a conceptual specification of how this model applies to the problem at hand:
assessing trends in educational inequalities. Unobserved heterogeneity which they
treat as a nuisance factor potentially distorting effects may be part of the
phenomenon under study; if other factors get more relevant for educational attainment
the relative influence of social origin on educational attainment will diminish. Social
origin becomes less predictive for attainment and log-odds ratios will decline over
cohort samples. As a measure of association the odds ratio not only captures the effect
of social origin but also the effects of all other unobserved factors (Buis, 2015).
Thinking about educational attainment as the outcome of a latent variable, one could
legitimately ask whether a decline in inequality is driven by increasing heterogeneity
of educational pathways, a declining causal effect of social background, or a mixed
result of both changing heterogeneity and effects. Yet, imposing this question is only
sensible with a theoretical reference to the (unmeasured) underlying variable.
Alternatively, if one wants to study on a rather descriptive level how the association
between social origin and educational attainment is changing, there is no particular
problem of relying on odds ratios.
Third, a strong reliance on average marginal effects is by no means without
problems in comparative settings. For instance, Auspurg & Hinz (2011) assess how
the average marginal effect of social class on the probability of attaining higher
education change over cohorts. Yet, AME measures an absolute distance on the
probability scale that is bound between zero and one. However, a certain distance
might have different meanings (e.g., in terms of relative risks) depending on the
location on the scale. Comparing an average probability difference of .10 among
contexts with varying baseline probabilities is not straightforward given the non-
linear feature of a probability.
Fourth, while log-odds ratios are parameters of a probability function, average
marginal effects are not. An AME does not depend only on parameters of the
probability function but also on the joint distribution of covariates in a sample. AME
could rather been seen as implications of parameters being put in a context (e.g., a
sample of individuals in a particular country or cohort). 14 Hence, while being
illustrative for this particular set of data their analytical value in terms of finding more
general parameters that can be abstracted from the data and used for prediction is
strongly limited. This limitation is rendered even more problematic when the variance
of marginal effects around the average is large (that is, if a lot of heterogeneity is
captured by the model). Furthermore, one should keep in mind that statistical tests are
different for logit coefficients and AMEs which might yield contradictory inferential
Finally, in the field of stratification research, the distinction between absolute
and relative inequality among groups should be kept in mind. For instance, a factor
difference of 2 in the odds of attaining high education versus lower education of
Group A versus Group B translates into a probability difference about .38 if the
probability in the reference group is .50. Instead, if the probability were .80 and odds
ratio of 2 translates into a probability difference of about .17. While the relative
inequality expressed by the odds ratio is the same in both settings, the absolute
inequality expressed by the probability difference is lower in the second setting. By
focusing only on the probability differences one misses an important feature of the
14 In addition, the AME resembles only the central tendency of the effects in the sample and not
their variation.
data. For that reason, research agendas should be clear in the concept they are
6 – Conclusions
The aim of my paper was to shed some light on a confusing debate on the
comparability of logit coefficients that has been emerging in the recent years. As I
have argued the issues raised by Mood (2010) and others do not apply to all research
agendas. Importantly, logistic regression can serve different ends. It can be used to fit
log-odds models analyzing rates of event occurrence in populations. It may also be
used as a model to estimate effects on a propensity score, which is unobserved but
assumed to generate binary observations. Both are very different methodological
approaches, that are likely to be conflated, though relying on similar statistical
techniques. I would like to conclude that researchers should try to think about their
dependent variable more carefully. For instance, educational mobility studies
analyzing the link of origin and destination, may clarify first, whether it the different
rates of attainment by origin or whether it is rather the causal effect of origin on the
process of attainment itself that is in the center of the investigation. A log-odds model
serves the first purpose whereas a latent-variable model serves the second. In
addition, I discussed several problems that may result from an increasing tendency of
relying on average marginal effects in stratification research.
Agresti, A. (2007). An Introduction to Categorical Data Analysis Second Edition.
Allison, P. D. (1999). Comparing Logit and Probit Coefficients Across Groups.
Sociological Methods & Research, 28, 186–208.
Auspurg, K., & Hinz, T. (2011). Gruppenvergleiche bei Regressionen mit binären
abhängigen Variablen – Probleme und Fehleinschätzungen am Beispiel von
Bildungschancen im Kohortenverlauf. Zeitschrift Für Soziologie, 40, 62–73.
Retrieved from
Breen, R., Holm, A., & Karlson, K. B. (2014). Correlations and Non-Linear
Probability Models. Sociological Methods & Research, 43(4), 571–605.
Buis, M. L. (2015). Logistic regression: Why we often can do what we think we.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of Association for Cross
Classifications. Journal of the American Statistical Association, 49(268), 732–
Hadjar, A., & Berger, J. (2010). Dauerhafte Bildungsungleichheiten in
Westdeutschland , Ostdeutschland und der Schweiz: Eine Kohortenbetrachtung
der Ungleichheitsdimensionen soziale Herkunft und Geschlecht. Zeitschrift Für
Soziologie, 39(3), 182–201.
Jacob, M., Klein, M., & Iannelli, C. (2015). The Impact of Social Origin on
Graduates’ Early Occupational Destinations - An Anglo-German Comparison.
European Sociological Review, 31(4), 460–476.
Karlson, K. B., Holm, a., & Breen, R. (2012). Comparing Regression Coefficients
Between Same-sample Nested Models Using Logit and Probit: A New Method.
Sociological Methodology, 42, 286–313.
Mood, C. (2010). Logistic regression: Why we cannot do what We think we can do,
and what we can do about it. European Sociological Review, 26(ii), 67–82.
Norton, E. C. (2012). Log Odds and Ends. NBER Working Paper Series. Retrieved
Powers, D. A., & Xie, Y. (1999). Statistical Methods for Categorical Data Analysis.
Academic Press.
Williams, R. (2009). Using Heterogeneous Choice Models to Compare Logit and
Probit Coefficients Across Groups. Sociological Methods & Research, 37, 531–
Winship, C., & Mare, R. D. (1984). Regression Models with Ordinal Variables.
American Sociological Review, 49, 512.
... 34 35 Therefore, we will not report on the actual coefficients (although these can be found in online supplementary appendix B), but instead use the gologit2 output to compute non-linear probability models. 30 While marginal effects and predicted probabilities are not immune to unobserved heterogeneity, 36 they are considered less sensitive to changes in the model specification than ORs. 34 All predicted probabilities are derived following marginal standardisation, that is, as the average effect of sector on quality, as opposed to the effect of sector on quality on average (ie, prediction at the means). ...
Full-text available
Objectives Social services are increasingly commissioned to third and for-profit sector providers, but little is known about whether and how these changes influence quality indicators. We assessed quality-related outcomes across for-profit, public and third sector organisations delivering social care services. Design A secondary analysis was conducted on publically available data collected by the independent regulator of social care organisations in Scotland. All outcomes are reported as predicted probabilities derived from multivariate logistic regression coefficients. Generalised ordered logit models are utilised for the quality domains and the risk assessment score and logistic regression for whether complaints or requirements were issued to organisations. Setting Organisations inspected by the Care Inspectorate in Scotland. Population 13 310 social care organisations (eg, nursing homes and day care organisations). Primary outcomes The quality and risk domains collected by the Care Inspectorate and complaints and requirements issued to organisations within the last 3 years. Results Controlling for multiple factors, we find that public and third sector providers performed consistently and statistically significantly better than for-profit organisations on most outcomes. For example, for-profit services were the most likely to be rated as high and medium risk (6.9% and 13.2%, respectively), and the least likely to be classified as low risk (79.9%). Public providers had the highest probability of being categorised as low risk (91.1%), and the lowest probability of having their services classified as medium (6.9%) and high risk (2%), followed by third sector providers (86%, 8.5% and 4.5%, respectively). Public providers performed better than third sector providers in some outcomes, but differences were relatively low and inconsistent. Conclusion Public and third sector providers were rated considerably higher than their for-profit counterparts on most observed outcomes. Regulators might use this information to consider how social care providers across sector are incentivised to manage their resources.
Full-text available
This article examines the impact of social origin on tertiary graduates’ labour market outcomes in Germany and the United Kingdom, two distinct countries in terms of higher education systems, labour market structures, and their linkages. Data from the 2005 REFLEX survey, OLS regression and linear probability models are used to analyse the effect of parental education on graduates’ occupational destinations at two time points: at labour market entry and five years after graduation. We test various hypotheses on country variation (i) in the strength of association between origin and occupational destinations, (ii) in the mechanisms by which social origin affects occupational destinations (i.e. via qualitative education differences), and (iii) in the extent to which social origin matters at different career stages. The results show that parental education effects are similar in the two countries when occupational destinations are analysed using the International Socio-Economic Index of Occupational Status (ISEI). They substantially differ when the analyses focus on entry into the higher-service class. In this latter case, both the gross and the net effects of parental education are stronger in the United Kingdom than in Germany. However, country differences in parental education effects reduce when graduates’ occupational outcomes are analysed 5 years after graduation.
Full-text available
Most discussions of ordinal variables in the sociological literature debate the suitability of linear regression and structural equation methods when some variables are ordinal. Largely ignored in these discussions are methods for ordinal variables that are natural extensions of probit and logit models for dichotomous variables. If ordinal variables are discrete realizations of unmeasured continuous variables, these methods allow one to include ordinal dependent and independent variables into structural equation models in a way that (1) explicitly recognizes their ordinality, (2) avoids arbitrary assumptions about their scale, and (3) allows for analysis of continuous, dichotomous, and ordinal variables within a common statistical framework. These models rely on assumed probability distributions of the continuous variables that underly the observed ordinal variables, but these assumptions are testable. The models can be estimated using a number of commonly used statistical programs. As is illustrated by an empirical example, ordered probit and logit models, like their dichotomous counterparts, take account of the ceiling and floor restrictions on models that include ordinal variables, whereas the linear regression model does not.
Full-text available
Logit and probit models are widely used in empirical sociological research. However, the widespread practice of comparing the coefficients of a given variable across differently specified models does not warrant the same interpretation in logits and probits as in linear regression. Unlike in linear models, the change in the coefficient of the variable of interest cannot be straightforwardly attributed to the inclusion of confounding variables. The reason for this is that the variance of the underlying latent variable is not identified and will differ between models. We refer to this as the problem of rescaling. We propose a solution that allows researchers to assess the influence of confounding relative to the influence of rescaling, and we develop a test statistic that allows researchers to assess the statistical significance of both confounding and rescaling. We also show why y-standardized coefficients and average partial effects are not suitable for comparing coefficients across models. We present examples of the application of our method using simulated data and data from the National Educational Longitudinal Survey.
Full-text available
Although independent unobserved heterogeneity—variables that affect the dependent variable but are independent from the other explanatory variables of interest—do not affect the point estimates or marginal effects in least squares regression, they do affect point estimates in nonlinear models such as logit and probit models. In these nonlinear models, independent unobserved heterogeneity changes the arbitrary normalization of the coefficients through the error variance. Therefore, any statistics derived from the estimated coefficients change when additional, seemingly irrelevant, variables are added to the model. Odds ratios must be interpreted as conditional on the data and model. There is no one odds ratio; each odds ratio estimated in a multivariate model is conditional on the data and model in a way that makes comparisons with other results difficult or impossible. This paper provides new Monte Carlo and graphical insights into why this is true, and new understanding of how to interpret fixed effects models, including case control studies. Marginal effects are largely unaffected by unobserved heterogeneity in both linear regression and nonlinear models, including logit and probit and their multinomial and ordered extensions.
Full-text available
Allison (1999) notes that comparisons of logit and probit coefficients across groups can be invalid and misleading, proposes a procedure by which these problems can be corrected, and argues that "routine use [of this method] seems advisable'' and that "it is hard to see how [the method] can be improved.'' In this article, the author argues that as originally proposed, Allison's method can have serious problems and should not be applied on a routine basis. However, this study also shows that his model belongs to a larger class of models variously known as heterogeneous choice or location-scale models. Several advantages of this broader and more flexible class of models are illustrated. Dependent variables can be ordinal in addition to binary, sources of heterogeneity can be better modeled and controlled for, and insights can be gained into the effects of group characteristics on outcomes that would be missed by other methods.
Full-text available
In logit and probit regression analysis, a common practice is to estimate separate models for two Or more groups and then compare coefficients across groups. An equivalent method is to test for interactions between particular predictors and dummy (indicator) variables representing the groups. Both methods may lead to invalid conclusions residual variation differs across groups. New tests are proposed that adjust for unequal residual variation.
Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. This fact has important implications that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model. In addition, we cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them.
Although the parameters of logit and probit and other nonlinear probability models (NLPMs) are often explained and interpreted in relation to the regression coefficients of an underlying linear latent variable model, we argue that they may also be usefully interpreted in terms of the correlations between the dependent variable of the latent variable model and its predictor variables. We show how this correlation can be derived from the parameters of NLPMs, develop tests for the statistical significance of the derived correlation, and illustrate its usefulness in two applications. Under certain circumstances, which we explain, the derived correlation provides a way of overcoming the problems inherent in cross-sample comparisons of the parameters of NLPMs.
Eine Kernfrage bildungssoziologischer Debatten ist, inwieweit über die Bildungsexpansion herkunfts- und geschlechtsspezifische Bildungsungleichheiten abgebaut werden konnten. Mit neueren Daten und im Rahmen eines Vergleichs der Entwicklungen in drei verschiedenen Untersuchungsgebieten – in Westdeutschland, Ostdeutschland und der Schweiz – wird unser Beitrag dieser Problematik nachgehen. Betrachtet werden kohortenspezifische Unterschiede (Geburtskohorten 1925 bis 1974) in der Bildungsbeteiligung zwischen sozialen Herkunftsschichten und zwischen den Geschlechtern. Eine komparative Perspektive ermçglicht die Betrachtung bildungssystemspezifischer Merkmale, welche die gesellschaftliche Entwicklung des Bildungsniveaus und das Ausmaß an Bildungsungleichheiten beeinflussen. Als Datengrundlage dienen das Schweizer Haushalt-Panel (SHP) und das Sozioçkonomische Panel (SOEP). Die Ergebnisse zeigen, dass in der Schweiz die stärkste Verbesserung der Bildungschancen der Arbeiterschicht stattgefunden hat, während in der ehemaligen DDR am frühesten Geschlechterunterschiede im Bildungserwerb eingeebnet werden konnten – wobei in den jüngeren Kohorten der ostdeutschen Teilstichprobe neue Bildungsungleichheiten zu Ungunsten von Männern auftreten.