Working PaperPDF Available

Logistic Confusion - An extended treatment on cross-group comparability of findings obtained from logistic regression

Authors:
!
1!
Logistic( Confusion( -( An( extended( treatment( on( cross-group( comparability(
of(findings(obtained(from(logistic(regression((
!
Hannes!Kröger,!German!Institute!for!Economic!Research!(DIW),!Berlin!
hkroeger@diw.de!
Jan!Skopek,!Trinity!College!Dublin!
jan.skopek@tcd.ie!
Working!Paper;!version!April!13,!2017!
!
Number!of!words:!12625!
Number!of!figures:!4!!
Number!of!tables:!7!
!
2!
Table!of!contents!
!
1!INTRODUCTION* 4!
1.1!TYPES!OF!COMPARABILITY! 7!
1.2!LATENT!VARIABLE!VERSUS!NATURAL!CATEGORICAL!APPROACHES! 8!
2!COMPARABILITY*UNDER*THE*LATENT*VARIABLE*AND*NATURAL*CATEGORICAL*
FRAMEWORK*11!
2.1!LATENT!VARIABLE!APPROACH!11!
!"#"#!$%&'()*%+,,'*'+-(.)/-0)%00.12/('%.)##!
!"#"!!34+2/&+)5/2&'-/6)+,,+*()#!!
!"#"7!8(/-0/20'9+0)*%+,,'*'+-(.)#:!
!"#";!3)<%-(+1=/26%).'5>6/('%-).(>0?)#:!
2.2!NATURAL!CATEGORICAL!DEPENDENT!VARIABLES!21!
!"!"#!34+2/&+)5/2&'-/6)+,,+*()!#!
!"!"!!@00.12/('%.)A)B'4/2'/(+)5%0+6.)!#!
!"!"7!@00.12/('%.)A)(C+)5>6('4/2'/(+)*/.+)!!!
!"!";!D'.E)2/('%)!7!
2.3!CONDITIONAL!AND!MARGINAL!INTERPRETATIONS!OF!OR!IN!MULTIVARIATE!MODELS!WITHIN!THE!
NATURAL!CATEGORICAL!FRAMEWORK!23!
!"7"#!=%-0'('%-/6)%00.12/('%.)!F!
!"7"!!8?-(C+('*)5/2&'-/6)%00.12/('%.)>.'-&)'-4+2.+)G2%B/B'6'(?)H+'&C('-&)!F!
!"7"7!IJ/5G6+)%,).?-(C+('*)5/2&'-/6)*%5G/2'.%-)!K!
3!AVERAGE*MARGINAL*EFFECTS,*RISK*RATIOS*AND*ODDS*RATIOS*–*UNITED*WE*
UNDERSTAND*36!
3.1!THE!COMPLEMENTARY!NATURE!OF!AME,!RR!AND!OR!36!
3.2!EXAMPLE!!EDUCATIONAL!ATTAINMENT!AND!INTERGENERATIONAL!MOBILITY!37!
4!CONCLUSION*42!
5!REFERENCES*45!
6!APPENDIX*49!
6.1!S1!-!A!FORMAL!TREATMENT!OF!COMPARISON!49!
6.2!S2!!ADDITIONAL!TABLES!AND!GRAPHS!51!
0
!
3!
Abstract!
!
Our! paper! discusses! cross-group! comparability! of! findings! obtained! from!
logistic!regression!in!a!systematic!way.!Recent!methodological!literature!in!sociology!
pointed! to! serious! pitfalls! of! logistic! regression! when! it! comes! to! comparability! of!
estimates! between! groups! and! samples.! Whereas! this! critique! is! mainly! driven! by!
statistical! concerns! we! argue! that! comparability! of! findings! depends! essentially! on!
the!conceptual!treatment!of!the!outcome!as!either!natural!categorical!or!based!on!a!
latent! variable! approach.! We! demonstrate! that! the! prevailing! methodological!
skepticism!about!cross-group!comparability!of!logistic!regression!is!preoccupied!by!a!
latent! variable! perspective.! In! addition,! we! show! that! under! the! latent! variable!
framework! the! use! of! average! marginal! effects! from! comparison! across! groups! or!
sample!is! as! unreliable! as! the!comparison! of! (log)! odds-ratios! that!has! been! in! the!
focus!of!criticism.!!
When!we!treat!outcome!variables! as! natural! categorical,! though,! cross-group!
comparisons!work!differently,!and! the! generalized!claim!that!odds-ratios!cannot!be!
compared!across!groups!does!not!hold.!Furthermore,!we!argue!that!the!crucial!point!
is! the! preference! in! many! sociological! applications! for! 5/2&'-/6! instead! of!
*%-0'('%-/6!interpretations!of!effect!estimates.!Our!paper!proposes!a!procedure!that!
allows! estimating! 5/2&'-/6!odds-ratios! that! are! adjusted! for! control! variables! and!
are!comparable!between!groups.!In!addition,!we!show!that!in!addition!to!odds!ratios!
(OR)!and! average!marginal!effects!(AME),!the!relative!risk!(RR)! is!another!useful!but!
largely! underused! metric! for! making! comparisons.! As! they! reflect! different!
perspectives!that!are!not!simply!exchangeable,!we!conclude!that!researchers!should!
use!AME,!OR!and!RR!jointly!to!evaluate!findings!obtained!from!logistic!regression.!!
!
4!
1 !Introduction!
The! systematic! comparison! of! observed! regularities! in! populations! is! an!
essential! part! of! research! in! social! sciences.! The! questions! behind! most! of! this!
comparative! work! is! whether! associations! between! variables! systematically! differ!
across! groups,! time! and! space.! In! most! circumstances,! descriptive! analysis,! i.e.!
assessing! whether! patterns! of! associations! are! different! (or! not),! precedes!
identifying!causal!processes!generating!these!patterns.!!
As! comparability! of! concepts! is! a! central! part! of! sociological! inquiry,!
comparability! of! statistical! quantities! reflecting! relationships! between! these!
constructs! is! highly! desirable.! In! sociological! research! comparability! of! regression!
coefficients!from! logistic! regression!became! subject! of! strong! debates.! In! our!view!
the! currently! dominant! view! in! sociology! proposes! that! in! contrast! to! linear!
regression!coefficients!in!logistic!regression!models!(and!in!other!non-linear!models)!
cannot!be! directly! compared! across! different! samples,! groups! and! models! (Allison,!
1999;! Holm,! Ejrnæs,! &! Karlson,! 2014;! Mood,! 2010;! Winship! &! Mare,! 1984).! In! a!
seminal! paper,! Mood! elaborates! that! the! issue! of! comparability! arises! when! one!
wants!to!interpret! these! coefficients! as! estimates!of!‘substantial’! effects.! Precisely,!
Mood!(2010)!asserts!that!it!is!problematic!(1)!to!interpret!odds!ratios!as!substantive!
effects!since!they!reflect!also!unobserved!heterogeneity,!(2)!to!compare!odds!ratios!
across! nested! models! because! unobserved! heterogeneity! is! certain! to! vary! across!
such!models,!and! (3)! to!compare!odds!ratios!from!the! same! model!across!samples,!
groups,! or! over! time! because! unobserved! heterogeneity! can! vary! across! samples,!
groups,!or!time.!We!agree!with!point!(2)!and!do!not!discuss!the!use!of!nested!models!
as!we!think!they!reflect!a!different!kind!of!research!logic!(mediation!vs.!moderation)!
and!are!not!covered!by!the!argument!we!make!here.!Treatment!of!these!models!has!
been!discussed! thoroughly! (e.g.! Karlson,! Holm,!&! Breen,! 2010;! Tchetgen! Tchetgen,!
2013).!!
However,! in! our! paper! we! first! argue! that! research! practice! has! been!
generalizing! issues! (1)! and! (3)! to! settings! in! which! the! arguments! elaborated! by!
Mood!(2010)! are!no!longer! valid!or!highly! contingent!on!the! researcher’s!agenda.!A!
major!problem!underlying! the! discussion!about!comparability!of!logit! coefficients!is!
!
5!
that! no! substantive! criterion! for! comparability! across! groups! is! explicitly! defined,!
although,!this!is!a!logical!prerequisite!when!insinuating!incomparability.!Establishing!
a!conceptual! scheme! for! comparison,! we! propose! to!use! an! old! distinction! for! the!
classification!of!research!problems!that!address!a!categorical!dependent!variable!(a)!
as! a! proxy! measurement! for! an! unmeasured! latent! constructs! or! (b)! as! a! natural!
categorical!outcome!G+2).+#)(Winship!&!Mare,!1983).!Note!that! this!distinction!is!of!
genuinely!theoretical!nature!and!should!not!be!guided!by!statistical!reasoning!alone.!
It! is! part! of! the! epistemological! orientation! and! theoretical! considerations!
underpinning! a! particular! empirical! study.! Important! in! this! respect! is! that!
comparability! of! the! same! statistical! quantities! depends! on! the! theoretical!
framework!as!we!will!show!in!this!paper.!!
Using!/4+2/&+) 5/2&'-/6) +,,+*(.! on! the! probability! outcome! (AME)! has! been!
emerging!as!a!popular!technical! way! of! circumventing!the!issue!of!comparability!of!
logit! coefficients! resulting! in! a! practice! of! an! entire! withdrawal! of! regression!
coefficients!(see! recommendations!by!Mood,!(2010)).!Yet,!while!this!may!be!a!good!
advice! in! some! circumstances! it! may! be! rendered! problematic! in! others.! Hence,!
secondly,! our! paper! will! systematically! elaborate! where! the! use! of! AME! might! be!
appropriate! and! where! not.! Without! preempting! much! of! the! later! discussion,! we!
argue!that!for!research!questions!falling!in!category!(a)!–!latent!variables!–!a!reliance!
on! AME! is! -%(! a! remedy! for! problems! in! comparability! across! groups.! The! recent!
methodological!literature!has!been!unclear!on!this!issue.!
Third,! we! argue! that! for! research! questions! falling! in! category! (b)! –! natural!
categorical! –! several! metrics! like! AME,! %0012/('%.! (OR)! or! 6%&1%00.) 2/('%.! (logit!
coefficients),! and! also! the! rather! seldom! used! 2'.E) 2/('%) %2) 2+6/('4+) 2'.E! (RR)! are!
comparable! across! groups.! Yet,! whereas! in! bivariate! models! the! odds-ratio! has! a!
straight-forward! 5/2&'-/6! interpretation,! in! multivariate! models! the! interpretation!
of!*%-0'('%-/6!odds!ratios!is!often!more!difficult!especially!when!comparing!their!size!
across!groups.!In!our!paper!we!propose!a!procedure!that!involves!the!estimation!of!
.?-(C+('*) 5/2&'-/6!odds-ratios! which! enables! meaningful! cross-group! comparisons!
of! odds! ratios! from! multiple! logistic! regression.! Our! approach! preserves! both! the!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1!The! argument! made! in! this! paper! can! be! generalized! to! the! analysis! of! latent! categorical!
outcomes!measured!by!both!categorical!(latent!classes)!and!continuous!(latent!profiles)!indicators.!
!
6!
odds-ratio!and!marginal!interpretation!while! allowing! adjustments! for! covariates!at!
the!same!time.!
Fourth,! we! argue! that! under! a! natural! categorical! framework,! a! joint! use! of!
AME,! OR! and! RR! might! often! be! the! best! practice! for! both! reporting! and!
interpretation! of! results.! The! quantities! complement! each! other! and! yield! insights!
into! cross-group! comparisons! that! are! not! obtainable! if! we! focus! only! on! one! of!
them.!
The! rest! of! the! paper! is! organized! as! follows.! We! first! define! two! types! of!
comparability!which!we!will!consistently!apply!to!various!metrics!related!to!logistic!
regression!as!discussed!in!the!paper:!comparability!of!.'9+!and!comparability!of!.'&-.!
We!then!introduce!the!conceptual!distinction!between!research!questions!focusing!
on!-/(>2/6)*/(+&%2'*/6)LM=N!dependent!variables!and!those!with!a!6/(+-()4/2'/B6+)L$ON!
framework!in!mind.!!
In!the!second!section,!we!assess!comparability!under!NC!and!LV!frameworks!of!
the!(log)! OR,! RR,!AME,! and! y*!standardized! coefficients.! For!the!LV! framework! we!
present!results!from! a! simulation! study! demonstrating!that!none! of! the! commonly!
estimated!quantities!is!comparable!in!size!across!groups!without!making!very!strong!
and!usually!non-testable!assumptions.!Furthermore,!we!provide!examples!for!group!
comparisons! within! both! the! LV! and! NC! framework.! In! the! NC! framework! we!
introduce!the!concept!of!a! .?-(C+('*) 5/2&'-/6!%00.) 2/('%! (SMOR)! as!a!quantity!that!
might!be!easier!to!interpret!and!compare!across!groups!for!certain!types!of!research!
questions.!
In!the!third!section!we!recommend!that!researchers!do!not!limit!themselves!to!
reporting!and!interpreting! just! one! of! AME,!RR!or! OR,! even! if! our! theory!seems!to!
suggest! one! of! them! fits! our! research! purpose! better.! We! give! an! example! for!
interpretation!and!suggestions!for!reporting!and!presentation.!!
Our!discussion!is!not!aiming!at! criticizing! previous! methodological! studies! G+2)
.+.!The!core!arguments!of!the!methodological! debate!are!inherently!valid! and!have!
been!laid!down!impressively.!Aiming!to!reduce!confusion!in!applied!social!research!
dealing! with! categorical! outcomes,! this! paper! systematizes! the! debate! on!
comparability! of! logistic! regression! model! across! groups,! points! weaknesses! and!
misconceptions! in! the! literature! and! gives! some! practical! suggestions! how! to! link!
!
7!
different!research!questions!to!various!quantities!estimated!from!logistic!regression.!
Therefore,! we! conclude! in! the! fourth! section! with! an! appeal! for! a! closer! link!
between! theory! and! methodological! application! as! well! as! more! openness! for!
different!types!of!research!question!and!their!methodological!implementations.!
!
1.1 Types*of*comparability**
The!first!step!to!a!more!systematic!discussion!of!comparability!is!first!to!reflect!
and!define!what!comparability!means.!While!this!might!seem!obvious!at!first!glance,!
we!will!show!that! it! is! not!obvious!what!comparability!in!logistic!regression!models!
means! and! that! there! are! different! legitimate! answers! to! this! question.! It! is! also!
noteworthy! that! almost! all! studies! referring! to! the! argument! in! Mood! (2010)! or! a!
similar! earlier! version! of! the! argument! do! not! define! comparability,! and! many!
methodological! articles! do! not! give! an! explicit! definition! either! (exceptions! are!
usually!studies!explicitly!referring! to!a!latent!variable!model!like! (Holm! et!al.,!2014;!
Karlson!et!al.,!2010)).!This!includes!work!criticizing!Mood!or!its!reception!(Buis!2016;!
Skopek!2016).!
At! the! most! general! level,! one! can! distinguish! comparability! of! quantities! in!
terms! of! .'&-! and! in! terms! of! .'9+.! The! first! relates! to! our! ability! to! assess! and!
compare! the! direction! of! statistical! effects! (as! based! on! a! particular! quantity)! that!
could! be! either! positive,! negative! or! zero.! The! second! dimension! relates! to! our!
ability! to! assess! and! compare! the! size! of! statistical! effects! based! on! particular!
statistical! quantities,! thus,! it! involves! a! quantification! of! differences! across! groups.!
Note!that!these!types!of!comparability!are!so!general!in!nature!that!they!do!not!only!
apply! to! applications! of! logistic! regression,! but! also! other! types! of! statistical!
estimation! technique.! Even! if! these! distinctions! are! useful! for! the! purpose! of! our!
paper,! we! do! not! claim! that! they! are! exhaustive! or! the! only! way! in! which!
comparability! could! be! classified.! Whereas! comparability! of! sign! usually! does! not!
represent! any! conceptual! problem! in! the! context! of! logistic! regression,! the!
comparability! of! size! does.! Appendix! S1! provides! a! more! detailed! and! formalized!
elaboration!on!these!two!types!of!comparability.!
!
!
8!
1.2 Latent*variable*versus*natural*categorical*approaches*
When!dealing!with!categorical!dependent!variables,!empirical!research!should!
define! whether! a! variable! is! treated! as! a! -/(>2/6) */(+&%2'*/6!(NC)! variable! or! a!
variable! representing! manifestations! of! an! underlying! 6/(+-() 4/2'/B6+! (LV,!
continuous)! that! cannot! or! is! not! directly! observed.! In! the! first! case,! one! may! be!
exclusively!interested!in!which!category!a!unit!or!individual!is!sorted!into.!From!that!
vantage,!differences! between!the!categories!are!manifest!and/or!have!a!substantial!
meaning! or! consequence.! For! instance,! an! A-level! equivalent! degree! is! needed! in!
most!countries! to! attend! university! and,!consequently,! having! this! degree! or!not!–!
independent! of! the! true! abilities! an! individual! possesses! –! bears! important!
consequences! for! individuals! being! eligible! for! admission! to! higher! education.! In!
many!cases,! we!would!also!think!that!these!categories!exist!beyond!our! research!in!
the!real! world,!although!this!does!not!need!to!be!the!case.!Our!theory!and!possible!
(social/causal)! mechanisms! would! refer! to! the! categories! of! the! observed! variable!
and! how! membership! in! these! categories! is! determined,! as! well! as! what!
consequences!the!membership!in!these!categories!might!have!for!the!subjects!under!
study.! Research! questions! under! a! NC! framework! would! rely! on! the! odds! and!
probability!scales!as!the!occurrence!or!non-occurrence!of!events!and!their!respective!
probabilities!or!odds!are!of!interest!G+2) .+.! Historically! this! approach! can! be! traced!
back!to!Georg!Udny! Yule! (Yule,! 1900,! 1903),! who! believed! some!variables!(but!not!
all)!can!be!seen!as!inherently!discrete!*6/..+.)or!natural!categorical!in!our!terms:!
“[…],! any! one! object! must! be! held! either! to! possess! the! attribute! or! not.”!!
(Yule,!1911)!
From! this! stance,! the! investigation! of! categorical! dependent! variables! is!
rendered!as!a!problem!of)*6/..','*/('%-.!
In!contrast!to!Yule,!Karl!Pearson! (Pearson!&!Heron,!1913)! advocated!the!view!
that! associations! of! categorical! variables! are! only! reflections! of! underlying!
continuous! distributions! (Agresti,! 2013).! Based! on! Pearson’s! idea,! we! can!
conceptualize!categorical!variables!as!manifestations!of!a!latent!variable!operating!in!
the! background.! It! is! not! the! membership! in! an! observed! category! that! we! are!
ultimately! interested! in! but! rather! what! this! membership! implies! for! the! score! on!
the!underlying!latent!variable.!Individuals’!membership!in!one!or!another!category!is!
!
9!
determined! by! their! score! on! a! latent! (usually)! unobserved! variable! which! we!
conceive! as! the! substantive! process! under! study! generating! (observable)!
classificatory!outcomes.!For!instance,!individuals!with!good!driving!skills!will!be!more!
likely!to!pass!a!driving!test.!Thus,!in!absence!of!a!measurement!of!driving!skills!we!
may!use!a!categorical!variable!‘having!passed!the!driving!test’!as!a!categorical!proxy!
for!measuring!those!skills.!In!a!LV!framework!the!conclusions!we!want!to!draw!would!
refer!to!the! underlying! concept!that!is!relevant!to! our!research!question!not!to! the!
observed! categories.! Our! theory! would! refer! to! the! latent! process! underlying! the!
observed! variable! and! how! it! is! (causally)! shaped! by! other! variables,! or,! how! it!
(causally)!shapes!other!variables.!Such!a! paradigm! imposes! research! questions!that!
are! interested! in! quantities! measured! on! the! scale! of! the! latent! variable! or!
standardized!forms!of! this!scale!and!not!in! the! probability!of!the!occurrence!of! any!
discrete!event!G+2).+.!
Note!that!while!it!is!useful!to!make!this!distinction! between! a! latent! variable!
and!a!categorical! approach,! any! single! dependent! variable!can!often! be! linked! in! a!
plausible!way!to!both!approaches! (Winship! &! Mare,!1983,!p.!56)!depending!on!the!
theoretical!framework,!the!mode!of!data!collection!and!the!research!question!that!is!
chosen!(see!Table!1).!For! example,! we! might!consider!completing!an!A-level!school!
degree! as! being! a! naturally! categorical! variable,! because! we! are! motivated! by! the!
consequences! implied! by! crossing! the! threshold! to! attainment! versus!non-
attainment!(e.g.!for!future!career!and!life!chances).! Alternatively,! we! could! also! be!
interested!in!the!educational!performance!that!corresponds!to!the! attainment!of!A-
level.!In! this!case,!it!is!not!of!major!importance!if!an!individual!is!correctly!classified!
based!on!a!model!prediction!in!having!or!not!having!an!A-level,!as!we!want!to!draw!
conclusions!about!the!latent!variable,!general!educational!performance!or!ability.!
!
10!
Table!1:!!Examples!of!dependent!variables!and!how!they!might!categorized!into!NC!and!LV!framework!
!
!
Categorized!…!
Conceptualized!as…!
During!the!process!under!study!
!
Indicator!
Concept!
Indicator!
Concept!
Natural!categorical!
Obesity!
Obesity!
College!degree!
Educational!attainment!
Latent!variable!
Obesity!
Weight!
College!degree!
Academic!performance!
!
!
11!
2 Comparability!under!the!latent!variable!and!natural!
categorical!framework!
2.1 Latent)variable)approach)
Based!on! our!definitions!of! comparability!and!the! distinction!between!NC! and!
LV! research! questions! we! can! assess! which! of! the! commonly! used! quantities! in!
logistic! regression! reflect! meaningful! comparisons! of! concepts! in! the! context! of!
research!questions!related!to!a!NC!or!LV!thinking.!We!begin!by!considering!research!
questions!adopting!a! LV! approach.!Similar!to!both!early! and! recent!studies!(Allison,!
1999;! Breen,! Holm,! &! Karlson,! 2014;! Mood,! 2010;! Winship! &! Mare,! 1983)! we!
assume! that! there! is! an! underlying! variable! !"! that! is! determined! by! a! set! of!
predictors!and!linked!to!the!observed!dichotomous!#!in!the!following!way:!
!"$ % & '()(& *!!(1)!
# $ +,-.,#", / 0!
# $ 0,-.,#"1 0!
An!inherent! problem! of! this! approach!is! that! the! scale! of! #"!is! unknown! and!
consequently!the!variance!of!*!cannot!be!estimated,!but!is!set!to!a!fixed!value,!3.29!
in! the! case! of! logistic! regression.! Therefore,! we! are! not! able! to! estimate! the!
structural!coefficient!)(! ,! i.e.!the!coefficient!on!the!scale!of! the! latent! variable,! but!
only!the! !23
43
$ 5(,!the!logit!coefficient,!a! rescaled!variant!with! 6 $ 789:;<
=>?@ !the!scaling!
factor.!!
2.1.1 Logit!coefficients!and!odds-ratios!
Based! on! the! latent! variable! model,! logit! coefficients! are! not! generally!
comparable! across! groups! as! estimates! for! the! effect! on! the! latent! scale! without!
additional!assumptions.!If!we!take!the!expectation!of!the!difference!in!the!estimates!
of!logit!coefficients,!we!will!not!get!the!true!difference!between!the!LV!coefficients.:!
A 5(B 5CD , )(B )C!!(2)!
!
If! we! assume! 6($ 6C,! logit! coefficients! are! in! fact! comparable! in! size,!
otherwise! they! only! have! comparability! of! sign.! Assuming! equal! unobserved!
!
12!
heterogeneity,!which!implies!EFG:*<($EFG:*<C,!is!likely!to!be!very!unrealistic!and!
hard!to!defend!in!many!empirical!settings!(Mood,!2010).!!
!
2.1.2 Average!marginal!effect!
In! the! literature! that! builds! on! Mood! and! her! paper! itself! it! is! claimed! that!
marginal! effects! are! not! affected! (or! less! affected)! by! the! problem! of! unobserved!
heterogeneity.!Therefore,! the!use!of!AME!is!recommended!as!one!possible!solution!
to! the! comparability! problem.! However! –! based! the! previous! distinction! of!
approaches!–!we!need!to!reconsider! whether! AME! indeed! are! comparable!under!a!
LV!scenario.!
A+
H(
)(
6(
IJ%(& '(K )(
6(
L3
KMN
B+
HC
)C
6C
IJ%C& 'CK)C
6C
LO
KMN
D )(B )C,:P<!
!
The! term! above! only! equals! the! difference! in! structural! coefficients! of! the!
latent!variable! equation! if! 1)! truncation! of!the!latent! variable! (%),! 2)! the! degree! of!
unobserved! heterogeneity! (6),! and! 3)! the! distribution! of! the! independent! variable!
and! all! covariates! are! equal! (different! moments! of! the! distribution!
A ' Q A '?Q A:'=<<.!Usually,!we!cannot!assert!this!in!applied!social!science!research!
(Holm!et!al.,!2014).!!
We!can!also!get!an!intuitive!understanding!why!a!comparison!of!AMEs!cannot!
be! immune! to! unobserved! heterogeneity! if! they! are! to! represent! structural!
coefficients! in! the! latent! variable! model.! If! there! was! very! little! unobserved!
heterogeneity!(UH)!and!a!given!structural!effect!on!the!LV,!in!the!observed!model!we!
would! expect! a! rather! strong! AME.! If! in! the! other! groups! there! are! many! other!
factors!influencing!the!outcome,!we!might!have!a!huge!amount!of!UH.!This!leads!to!
the!fact!that!most!variation!in!the!observed!dichotomous!outcome!is!not!due!to!the!
predictor.!In!other!words,!discrimination!among! categorical!outcomes!based!on! the!
predictor!will!become!increasingly!weak.!If!we!are!approaching!an!infinite!amount!of!
UH,!assignment!to!the!outcome!based!on!the!predictor!would!effectively!be!random!
and!there!would! be!no!probability!difference! (AME)!between!different!levels! of!the!
predictor! variable.! Figure! 1! illustrates! the! sensitivity! of! the! AME! to! UH! for! a! less!
!
13!
extreme! case.! The! structural! latent! variable! coefficient! is! the! same! in! each! of! the!
groups,! but! the! UH! is! increased! for! each! group! from! left! to! right.! This! leads! to! a!
higher! dispersion! of! the! latent! variable! score! and! the! overlap! of! the! observed!
dichotomous!indicator! #!becomes!larger,!so!that! both!OR!and!AME!become!smaller!
in! groups! with! higher! UH.! We! can! see! that! the! #"-standardized! coefficient! also!
becomes!smaller.!This!reflects!the!reduction!in!explanatory!power!of!predictor!x!that!
we!see!from!left!to!right!and!that!can!be!seen!in!the!LV!model!using!the,R?.!
The!conclusion!that!AME!is!not!comparable!across!groups!if!the!latent!variable!
coefficient! is! of! interest! corroborates! the! finding! of! Holm! et! al.! (2014),! who! show!
that!the!coefficient!from! a!linear!probability!model!is!not!comparable!across!groups!
if!the!latent!variable!coefficient!is!the!point!of!reference!for!the!comparison.!
However,! AME! also! has! comparability! of! sign,! but! –! in! essence! –! is! no! more!
helpful! than! the! logit! coefficient! for! across! group! comparisons! within! the! LV!
framework.!To!the! contrary,! the!assumptions!underpinning!comparability! are! much!
more! complex! for! the! AME! than! for! the! logit! coefficient.! Summing! up,! we! can!
identify!differences!in!sign!of!association!with!any!of!these!quantities,!but!not!more.!
Why! did! the! misconception! arise! that! AME! might! be! immune! against!
unobserved!heterogeneity?!On!a!superficial!glance,!such!a!claim!can!also!be!found!in!
Wooldridge! (2002,! pp.! 471472).! However,! he! argues! that! within& one& model,! the!
degree!of!unobserved! heterogeneity!does!not!affect! the!estimation!of!the! marginal!
effect.!This!refers!to! the! inclusion! or!exclusion!of!(unrelated)!variables.!This! implies!
that! nested! models! are! compared! which! are! all! special! cases! of! a! more! general!
model.! There! is! no! claim! that! the! AME! is! a! useful! approximation! for! comparing!
latent!variable!coefficients!across!groups.!To!the!contrary!he!notes:!!
“The! bottom! line! is! that,! except! in! cases! where! the! magnitudes! of! the! )Sin!
equation!(15.34)!have&some&meaning,!omitted!heterogeneity!in!probit!models!is!not!
a!problem.”![emphasis!added]!(Wooldridge,!2002,!p.!471)!
Within!a!LV! framework!the!meaningfulness!of!the!coefficients!on! the!LV!scale!
are!exactly!the!main!assumption!and!distinction!from!the!NC!framework.!!!
In! Mood’s! (2010)! analysis,! simulations! for! demonstrating! the! robustness! of!
AME! against! unobserved! heterogeneity! are! done! in! the! context! of! nested! models,!
not!in!the!context!of!comparisons!across!groups.!Yet,!eventually,!these!results!were!
!
14!
generalized!in!the!conclusion!to!hold!true!also!for!comparisons!across!groups!(Mood,!
2010,!p.!80).!Under!the!latent!variable!framework!that!Mood!obviously!adopts!in!the!
first!place,!this!claim!is!not!accurate.!To!our!view,!this!important!detail!got!lost!in!the!
reception!of!the!argument!and!the!undifferentiated!claim!that!AME!are!less!affected!
or!unaffected!by!unobserved!heterogeneity!was!resonating!in!subsequent!research.!!
Contrary,! our! proposed! distinction! between! natural! categorical! and! latent!
variable! framework! can! contribute! to! the! discussion! as! it! makes! it! is! easier! to!
distinguish! in! which! cases! AME! is! comparable! and! in! which! it! is! not.! It! is! readily!
comparable! as! a! quantity! that! estimates! (conditional)! absolute! probability!
differences,! but! it! is! only! comparable! in! size! for! latent! variable! coefficients! to! the!
degree!that!very!strong!and!usually!untestable!assumptions!hold.!!
!
!
!
!
!
!
!
!
!
!
!
!
!
15!
Figure! 1! Simulated! data! example:! Latent! variable! and! binary! outcomes! for! four!
groups.!
!
Note:!Underlying!model! is!#"$ P' & *!(Model! 1)!for! both!groups! with!TU
V$ WVXV
Y,,!and!scaling! factor!s=1!for! Group!A! and!
s=4!for! Group!B.! Variable!x!is! normally!distributed! with!mean! 0!and!variance! 1,!identically! for!both! groups.!Groups!share! the!
same!effect!of!'!() $ P)!on!the!latent!variable!#"!but!differ!in!the!residual!heterogeneity!as!expressed!by!the!R?(Model!1).!The!
larger! heterogeneity! of! Group! B! translates! into! a! smaller! logit! coefficient! and,! thus,! a! smaller! odds! ratio! in! the! logistic!
regression!model!(Model!2).!!!!!
R2=.73
b*=3
-30
-20
-10
0
10
20
30
y*
-4-3-2-1 0 1 2 3 4
x
Group 00
R2=.17
b*=3
-30
-20
-10
0
10
20
30
y*
-4-3-2-1 0 1 2 3 4
x
Group 10
R2=.04
b*=3
-30
-20
-10
0
10
20
30
y*
-4-3-2-1 0 1 2 3 4
x
Group 01
R2=.02
b*=3
-30
-20
-10
0
10
20
30
y*
-4-3-2-1 0 1 2 3 4
x
Group 11
OR=19.31
AME=0.34
y*std=.85
0
1
Y
-4 -3 -2 -1 0 1 2 3 4
x
Group 00
OR=2.22
AME=0.17
y*std=.4
0
1
Y
-4 -3 -2 -1 0 1 2 3 4
x
Group 10
OR=1.35
AME=0.07
y*std=.16
0
1
Y
-4 -3 -2 -1 0 1 2 3 4
x
Group 01
OR=1.14
AME=0.03
y*std=.07
0
1
Y
-4 -3 -2 -1 0 1 2 3 4
x
Group 11
!
16!
!
!
2.1.3 Standardized!coefficients!
If!we!use!the!y-standardized!coefficients! we!have!a!metric! on!the!scale!of! the!
SD!of!the!y*!variable.!Given!that!the!assumption!about!the!distribution!of!the!error-
term!(logistic!distribution)!is!fulfilled!it!can!be!shown!that!(Breen!et!al.,!2014):!
!
AZ3
Z3
[789 \3]^[
_
BZO
ZO
[789 \O]^[
_
$23"`a:\3<
`a:b3
"<B2O"`a:\O<
`a:bO
"<,!(4)!
!
However,! for! the! absolute! difference! on! the! latent! variable! scale! the!
standardized!coefficients!is!also!not!exactly!comparable!(Duncan,!1975):!!
!
AZ3
Z3
[789 \3]^[
_
BZO
ZO
[789 \O]^[
_
D )(B )C!(5)!
!
The! inequality! result! from! the! fact! that! the! model! is! under-identified! if! the!
scale! is! not! arbitrarily! fixed.! Put! differently,! there! is! a! lack! of! information! that! can!
only!be!solved!by!changing!assumptions!(like!in!heterogeneous!choice!models)!or!by!
gathering!additional!information!like!repeated!measurements.!
!
2.1.4 A!Monte-Carlo!simulation!study!!!!
To!demonstrate!that!our!claims!for!the!LV!hold!despite!different!perceptions!in!
applied! research,! we! conducted! a! Monte-Carlo! simulation! comparing! the!
performance!of!logit!coefficients,!relative!risk!from!a!log-binomial!model,!AME!from!
logistic!regression,!standardized! coefficients!and!estimates!from!a! linear!probability!
model! (LPM)! in! estimating! the! difference! in! the! structural! coefficients! across! two!
groups.!The!simulation!study!is!similar!in!design!to!(2014).!!
In!the!simulation! study! the!structural!coefficients!in! the! latent!variable!model!
are! the! same! in! both! groups! ()(Q )C),! set! to! be! 1.! The! error-term! in! the! latent!
!
17!
variable! is! constructed! to! follow! a! logistic! distribution! in! line! with! assumption! of!
logistic!regression.!Group!B!is!the!reference!group!for!which!the!threshold!(%C)!is!set!
to!0,!the!scale!parameter!to!1!(6c *C$d[
=" 6C).!We!vary!the!threshold!for!group!
A! (%()! as! well! as! the! scale! parameter!(6().! The! threshold! represents! differences! in!
response!behavior!or!conversion! of!the!underlying!score!into!the!observed!variable.!
It!reflects!how!high! the! latent! score! has!to!be!in!a!group,!so! that! individuals! would!
get!a!positive!result!on!the!observed!indicator.!For!example,!it!has!been!argued!that!
in!certain!countries!(e.g.!Germany)!the!true!health!has!to!be!markedly!higher!than!in!
other!countries!(e.g.!Sweden)!to!achieve!a!subjective!response!of!good!or!very!good!
versus!on!a!five!point!subjective!health!scale!(Jürges,!2007)!If!the!intercept!is!higher!
it!takes! more!of!the!underlying!score!to!get!a!positive!observed!outcome.!The!scale!
parameter! represents! the! degree! of! unobserved! heterogeneity,! the! key! issue!
discussed! in! the! literature! so! far.! A! larger! scale! factor! implies! higher! degrees! of!
unobserved!heterogeneity,!meaning!more!or!more!influential!factors!that!determine!
the! latent! variable! which! are! not! modeled.! We! conduct! 10,000! replications! and!
estimate!the!average!degree!of!bias!in!the!estimation!of!the!difference!between!the!
two!coefficients.!!
The!true!models!are:!
Group!A!#(
"$ %(& )(" '(& *(!
Group!B!#C
"$ %C& )C" 'C& *C!
The!bias!(e)!is!calculated!as!the!difference!of!the!average!estimates!across!f!
simulations,!in!proportion!to!the!average!estimate!of!the!reference!group!B.!This!
proportion!is!taken!as!the!scale!of!the!coefficients!is!arbitrary!and!does!not!reflect!
the!scale!of!the!underlying!latent!variable,!but!their!relative!size!can!be!captured:!
e $ 5(S
g
SMN B 5CS
g
SMN
5CS
g
SMN
!
If!the!quantities! under!study!are!comparable!in! the!sense!that!they!represent!
the! difference! in! the! coefficients! in! the! underlying! latent! variable! model,! the! bias!
should!be!close!to!zero.!A!bias!of!0.5!would!indicate!the!average!difference!between!
the!two!quantities!is!50%!of!the!size!of!the!reference!group!when!it!should!be!zero.!
!
18!
Table! 2! and! Table! 3! show! the! results! from! the! simulation! study.! We! can! see!
that!differences!in!truncation!leads!to!strong!bias!in!log-binomial!models!and!a!small!
degree!of! bias! using! LPM!and! AME! based! on!logistic! regression.! Interestingly,! logit!
coefficients!seem!to!be!quite!unaffected!by!variation!in!truncation.!
However,! looking! at! bias! due! to! unobserved! heterogeneity,! we! can! see! that!
the!bias!increases!with!the!difference!in!unobserved!heterogeneity.!A!scale!factor!of!
1.5!translates! into! a! 2.25! times! higher! variance!in! group! A! than! in! group! B.!This! is!
substantial! increase! in! unobserved! factors! influencing! the! outcome,! but! might! still!
reflect!certain!situations!in!applied!research,!for!example!in!comparing!labor!market!
outcomes! between! men! and! women.! The! amount! of! bias! is! very! similar! for! all!
quantities,!the!(log)!OR!is!somewhat!larger!than!the!rest.!
Based!on!this!restricted!set!of!scenarios!we!can!conclude!the!following.!First,!
against! common! believe! AME! and! LPM! are! not! immune! to! changes! in! unobserved!
heterogeneity! between! groups! and! does! not! even! necessarily! perform! better! than!
logit! coefficients.! Regarding! differences! in! threshold! logit! coefficients! performed!
actually! best.! Second,! RR! perform! poor! for! both! differences! in! threshold! and!
unobserved!heterogeneity.!!
!
19!
!
!
Table! 2:! Degree! of! bias! in! estimation! of! difference! in! LV! coefficient! dependent! on!
truncation!
!
!
Observations!
!
!
100!
1000!
5000!
%(=0!!
log(RR)!
0.01!
-0.00!
0.00!
log(OR)!
0.00!
-0.00!
0.00!
AME!
0.00!
-0.00!
0.00!
LPM!
0.00!
-0.00!
0.00!
%(!=0.25!
log(RR)!
-0.06!
-0.07!
-0.07!
log(OR)!
0.00!
0.00!
-0.00!
AME!
0.01!
0.00!
0.00!
LPM!
0.01!
0.00!
0.00!
%(=!1!
log(RR)!
-0.22!
-0.22!
-0.22!
log(OR)!
0.01!
0.00!
0.00!
AME!
0.09!
0.08!
0.08!
LPM!
0.09!
0.08!
0.08!
)
Table!3:!Degree!of!bias!in!estimation!of!difference!in!LV!coefficient!dependent!on!the!
degree!of!unobserved!heterogeneity!
!
!
Observations!
!
!
100!
1000!
5000!
6(=!1!
log(RR)!
0.01!
-0.00!
0.00!
log(OR)!
0.00!
-0.00!
0.00!
AME!
0.00!
-0.00!
0.00!
LPM!
0.00!
-0.00!
0.00!
6(=!1.1!
log(RR)!
0.08!
0.09!
0.09!
log(OR)!
0.10!
0.10!
0.10!
AME!
0.08!
0.09!
0.09!
LPM!
0.08!
0.09!
0.09!
6(!=!1.5!
log(RR)!
0.43!
0.43!
0.43!
log(OR)!
0.49!
0.50!
0.50!
AME!
0.43!
0.43!
0.43!
LPM!
0.43!
0.43!
0.43!
!
20!
Table!4:!Comparability!of!quantities!within!the!latent!variable!framework!
!
Comparability!
Interpretation!
Note!
!
Bivariate!
multivariate!
!
!
Odds-ratio!
Sign!
Sign!
Rescaled! LV!
coefficients!
Log-odds! more! useful! than! OR,! limited!
interpretation!
AME/LPM!
Sign!
Sign!
Complexly! rescaled! LV!
coefficients!
Not! immune! to! heterogeneity,! limited!
interpretation!
RR!
Sign!
Sign!
Complexly! rescaled! LV!
coefficients!
Especially! sensitive! to! differences! in! truncation,!
limited!interpretation!
y*-std!
sign!
(standardized!
size)!
sign!
(standardized!
size)!
Differences! std.! by!
distribution!of! LV,! rank!
or!relative!inequalities!
Useful! for! many! purposes;! see! comparisons! of!
intra-class! correlations! in! ML! modelling,! or!
standardized!coefficients! in! SEM! literature;! does!
not!identify!absolute! differences! in! structural! LV!
coefficients!
!
!
!
!
21!
2.2 Natural)categorical)dependent)variables))
This! section! considers! comparability! if! interest! is! in! the! categories! of! the!
dependent! variable! as! such! instead! of! treating! them! as! manifest! values! of! an!
unobserved! latent! variable.! Three! different! scales! are! commonly! applied! in! such!
settings,! although! others! are! surely! imaginable.! We! will! consider! the! additive! (or!
absolute)!probability!scale,!the!multiplicative!(or!relative)!probability!scale,! and! the!
odds-scale,!which!is!always!multiplicative.!
In! general,! instead! of! assuming! the! observed! outcome! to! be! an! imperfect!
measurement! of! an! underlying! latent! construct,! we! assume! that! outcomes! are!
determined!in!a!way!that!the!probability!of!event!occurrence!is!a!functional!form!of!
the!predictors!(Winship!&!Mare,!1983,!p.!61):!
! " # $ # %&' ( )*+!(6)!
G! is! the! cumulative! distribution! function! for! a! probability! distribution,! in! our!
case!the!logistic!function.!'!and!*!are!model!parameters!to!be!estimated.!
2.2.1 Average!marginal!effect!
Using! the! additive! probability! scale! we! are! interested! in! absolute! probability!
differences!and!whether!these!are!smaller!or!larger!between!the!comparison!groups.!
A! possible! research! question! which! would! require! this! scale! could! be:! Are! the!
absolute!inequalities!in!tertiary!education!(as!measured!by!the!difference!in!absolute!
rates!of!tertiary!education!attainment!by!social!origin)!larger!in!countries!with!higher!
proportion!of!tertiary!graduates!(Triventi,!2013)?!Based!on!this!research!question!we!
would!like!to!compare!the!difference!in!probability!of!educational!attainment!(")!by!
levels! of! the! predictor! (parental! education)! ,.! The! AME! does! exactly! that! (see!
Wooldridge,!2002,!p.!471).!
!
-$
./
0/12' ( ,0/
34
567
8$
.9
0912' ( ,09
3:
567
#;< "
/
;=,/
8;< "
9
;=,9
=&>+!
2.2.2 Odds-ratios!–!bivariate!models!
We! could! move! on! and! modify! the! research! question! by! asking:! Are! relative!
inequalities! in! educational! attainment! larger! (the! difference! in! relative! rates! of!
!
22!
tertiary!education! enrolment!by!social!origin)!in!countries!with!higher!proportion!of!
tertiary!graduates?!This!research! question! requires! the! odds-scale!and!it!holds!that!
OR! are! comparable! in! size! across! groups.! Note! that! the! OR! in! a! bivariate! logistic!
regression!model!are!often!called!marginal!odds-ratio!as!they!represent!the!OR!that!
might!be!calculated!from!a!marginal!table!(Loux,!Drake,!&!Smith-Gagen,!2014)!if!the!
predictor!was!categorical!as!well.!
-?@4
?@:# =
;< "
/# $
< "
/# A
;=,/
;< "
9# $
< "
9# A
;=,9
=&B+!
The! statement! that! marginal! OR! are! comparable! in! size! across! groups! might!
seem!at!odds!with!many!statements!found!in!the!literature.!However,!we!think!that!
most! scholars! would! agree! that! within! the! frame! of! the! research! question! the!
statement! holds! true.! For! the! bivariate! case! logic! postulates! that! if! predicted!
probabilities! and! their! differences! (AME)! are! comparable,! so! are! odds! ratios! (OR)!
derived! from! probabilities.! The! multivariate! case! is! more! complex! and! will! be!
discussed!next.!
2.2.3 Odds-ratios!–!the!multivariate!case!
When! stating! that! ORs! cannot! be! compared! across! groups,! one! needs! to! be!
more!specific!and!state!that:!A!comparison!of!the!magnitude!of!OR!across!groups!in!
a! multivariate! model! does! not! necessarily! reflect! the! difference! in! the! marginal(
chance!of!event!occurrence.!This!means!that!a! higher! conditional! effect! on! odds! in!
group!A!than!in!group!B!does!not!necessarily!mean!that!the!effect!in!the!population!
(the!marginal!effect)!will!also!be!larger!in!group!A!than!in!group!B.!!!
However,!the!conditional!effect!can!be!compared!in!size.!It!is!only!that!we!are!
in!practice!most!often!interested!in!the!marginal!change!(in!odds,!or!probability)!that!
we! come! to! say! that! we! cannot! compare! OR! in! size! across! groups.! The! non-
equivalence! of! conditional( and! marginal!odds-ratio! in! multivariate! models! is!
therefore! translated! into! a! generalized! statement! of! non-comparability! (which! is!
sometimes!correct,!but!not!always).!
!
23!
2.2.4 Risk!ratio!
A!third!research!question!that!might!arise!has!a!slightly!different!focus:!Are!the!
relative!inequalities!in! tertiary! education! larger! in! countries! with! higher!proportion!
of! tertiary! graduates?! In! contrast! to! the! first! research! question! this! addresses!
relative!inequalities,!but!on!the!probability!scale!instead!of!the!odds-scale.!One!way!
to!achieve!the!relative!interpretation!is!to!take!the!quotient!of!probabilities!between!
two!groups!or!levels!of! a! predictor.! If! we! then! take!log!of!this!quotient!we!can!see!
that!a!multiplicative!prediction!on!the!probability!scale!is!like!predicting!additively!on!
the! log-probability! scale.! The! quotient! of! the! probabilities! is! known! as! the! relative(
risk(or( risk( ratio,! commonly! used! in! epidemiology.! We! can!get! a! direct! estimate! of!
the!risk!ratio!from!a!log-binomial!or!a!Poisson!model!(Cummings,!2009;!Gail,!Wieand,!
&!Piantadosi,!1984)!and!get!comparability!in!size!for!this!type!of!research!question2.!
Using!a! logistic!regression!model,!only!indirect!methods!are!available!for!estimating!
the!risk!ratio.!First,!with!low!baseline! levels! of! the! outcome! (prevalence)! the! odds-
ratio! approximates! the! risk! ratio! (usually! for! cases! with! less! than! .1! baseline!
probability).! Second,! we! can! estimate! either! two! predicted! probabilities! and! take!
their!ratio!for!dichotomous!predictors!or!estimate!the!marginal!effect!and!divide!it!
by!the!average!probability!of!success.!
-C@D:E4
C@D:E: #
< "
/# $F,/# $
< "
/# $F,/# A
< "
9# $F,9# $
< "
9# $F,9# A
&G+!
!
2.3 Conditional)and)marginal)interpretations)of)OR)in)multivariate)models)
within)the)natural)categorical)framework)
In! many! situations! in! applied! research! we! might! want! to! compare! effects!
conditional! on! a! set! of! covariates.! In! logistic! regression! this! complicates! the!
interpretation!of!OR,!but!not!the!interpretation!of!the!average!marginal!effect!or!RR.!
The!reason! is! that! AME! and! RR! retain!marginal! interpretation! when! covariates!are!
introduced! into! the! equation! while! OR! does! not.! However,! one! should! resist! the!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2!The!analytical!proof!of!collapsibility!of!RR!(Gail!et!al.,!1984,!p.!437;!Neuhaus!&!Jewell,!1993,!p.!
812)!corrobarates! the! simulation! results! of! Norton! (2012)!who! claim! RR! is! unaffected! by!
(uncorrelated)!unobserved!heterogeneity.!
!
24!
temptation!of!choosing!research!question!based!on!convenience!of!interpretation!of!
certain!quantities.!If!we!want!to!know!the!development!of!educational!inequalities!
over! cohorts! after! accounting! for! achievement! as! measured! by! test! scores,! it! is!
unsound! to! change! our! whole! research! interest! to! absolute! differences,! reflecting!
absolute!inequalities,!only!because!it!is!easier!for!us!to!interpret!and!compare!ME!in!
multivariate! models! than! it! is! to! interpret! OR.! For! OR,! we! take! two! different!
perspectives! that! have! been! seen! as! a! problem! of! logistic! regression,! because! we!
want! to! apply! linear! logic! in! non-linear! (multiplicative)! models.! We! suggest! a!
practical! approach! that! might! be! useful! for! certain! research! questions! that! aim! at!
relative! chances,! but! want! to! adjust! for! other! covariates! and! compare! these!
estimates!with!a!marginal!interpretation!across!models.!
The!important!difference!is!the!distinction!between!a!conditional!estimate!and!
a! marginal! estimate! or! whether! a! quantity! is! collapsible.! A! conditional! effect!
estimate!shows!the!association!at!a!certain!level!of!the!covariates.!A!marginal!effect!
estimate!shows!the!estimate!marginalized!(summed!up! over)! the! set! of! covariates,!
meaning!the!actual! values! in!the!data!set!or! population.!If!the!weighted!average! of!
the!conditional!estimated!of!a!quantity! equals! the! marginal! estimate! for!covariates!
that! are! not! confounders,! we! say! that! this! quantity! is! collapsible! (Whittemore,!
1978).! We! could! collapse! two! conditional! cross-tables! to! get! the! marginal! cross-
table.!However,!while!AME!and!RR!are!collapsible,!the!OR!is! not.! The! discussion! of!
collapsibility! and! its! consequences! is! very! advanced! in! the! epidemiologic! literature!
(Greenland,!Robins,!&!Pearl,!1999;!Pang,!Kaufman,!&!Platt,!2013)!and!can!be!seen!as!
the!counter-part!to!the!discussion!of!the!consequences!of!unobserved!heterogeneity!
in!social!sciences!within!a!NC!framework.!
When! estimating! the! AME! we! marginalize! the! conditional! probability!
differences!(PD)!over!the!set!of!covariates! that! we! are! adjusting!for.! Therefore,! we!
speak!of! average!marginal!effect.! We!want!the! same!property!that! allows!adjusting!
for! covariates,! but! retaining! a! marginal! interpretation! in! our! statistical! quantity!
without!changing!to!an!absolute!probability!scale.!We!want!a!form!of!marginal!odds-
ratio!as!we!get!it!from!a!bivariate!model!for!a!multivariate!model!and!we!know!that!
standard!regression!with!covariate!adjustment!does!not!do!the!trick.!
!
25!
2.3.1 Conditional!odds-ratios!
At! first,! let! us! consider! what! we! interpret! and! compare! if! we! estimate!
conditional!odds-ratios.!For!example,!we! might! want! to! estimate!the!association!of!
college!attendance!on!(high!versus!low)! parental! education! conditional! on! regional!
features! and! gender.! We! can! interpret! this! as! the! OR! that! we! get! if! we! compare!
individuals! from! the! same! regions! and! of! the! same! gender! with! each! other.!
Conditional!odds-ratio!always!have!the!differences!“at!the!same!level!of!covariates”!
(conditional! interpretation).! They! do! not! carry! a! marginal! interpretation! like!
differences!in!“in!the!population”.!The!conditional!OR!are!indeed!comparable!in!size!
across!groups!if!and!only!if!this!interpretation!of!the!OR!is!used.!Hence,!if!we!get!an!
OR! estimate! of! 1.7! for! high! versus! low! parental! education! when! controlling! for!
gender!and!regional!dummies,!we!could!not!say!that!the!odds!of!attending!college!in!
the!total!population!are!1.7!higher!for!those!from!high!parental!background!if!there!
was! no! confounding! with! region! and! gender.! Rather! it! is! the! odds-ratio! of! high!
versus! low! parental! education! if! we! compare! individuals! who! are! from! the! same!
region! and! of! the! same! sex.! Further,! it! is! important! to! remember! that! this!
conditional! estimate! of! OR! will! always! be! larger! than! the! unconditional! (marginal)!
OR!even! if!region,!gender! and!parental!education!are!not!related!at!all!but!if!region!
and!gender!predict!the!outcome!(Neuhaus!&!Jewell,!1993,!p.!812).!
For!pursuing!this! type!of!comparison,!we!propose!an!approach!that!combines!
the! advantages! of! a! marginal! interpretation! with! covariate! adjustment! for! OR.! We!
call!this!the!synthetic!marginal!odds-ratio!(SMOR).!
!
2.3.2 Synthetic!marginal!odds-ratios!using!inverse!probability!weighting!!
We! define! the! SMOR! as! the! ratio! in! chances! of! success! between! different!
levels! of! the! predictor! in! the! population! if! the! predictor! of! interest! would! be!
unrelated!to!a!specified!set!of!covariates.!While!this!marginal!OR!can!be!interpreted!
as!a!causal!effect!when!certain!additional!assumptions!are!fulfilled,!it!will!be!useful!in!
many!descriptive!applications!as!well.!!
For!comparability!of!OR,!the!distinction!between!studying!an!association!or!a!
causal!effect!is!not!decisive.!However,!drawing!a!distinction!between!conditional!and!
!
26!
marginal! OR! is! important.! A! marginal! OR! represents! the! aggregate! difference! in!
event!occurrence! between! groups,! an!attractive! feature! that! makes!a! marginal! OR!
ready! for! comparisons! across! groups.! In! contrast,! a! conditional! OR,! for! instance!
estimated!by!a!multiple!logistic! regression! model,! is! defined! only! with! respect! to! a!
set!of!covariates!(Zhang,!2008),!a!fact,!that!is!imposing!problems!for!between!group!
comparisons.!!
In! the! following,! we! propose! an! approach! that! aims! at! combining! useful!
features! of! marginal! and! conditional! OR! while! preserving! comparability! across!
groups.! Our! strategy! involves! applying! inverse! probability! weighting! (IPW)! in! the!
context!of!logistic!regression!as!it!was! previously! applied! to! survival! curves! (Cole! &!
Hernán,!2004).!IPW!is! most! commonly! referred! to! in! the! context!of!causal!analysis!
where! it! is! used! to! calculate! inverse! probabilities! of! treatment! to! account! for!
selection!into!treatment!(Morgan!&!Winship,! 2007).!However,!IPW!can!also!applied!
in!regression! analysis! when! researchers! aim! for! descriptive! rather! than! causal!
inference.!!
In! general,! IPW! works! in! three! steps.! First,! a! (logistic)! regression! model! is!
estimated,!taking!the! (dichotomous)! predictor! of!interest!(X)!as!dependent! variable!
and! on! all! other! control! variables! (C)! that! are! to! be! considered! as! independent!
variables.!!
HI < J # $
$ 8 < J # $ # ' ( 07K , ( L K MN=&$A+!
!
Second,!based! on!this!model,! we!predict!the! probability!of!(a)! having!the!trait!
for!those!who!in!fact!have!the!trait!(<&J # $FL # O+)!and!of!(b)!not!having!the!trait!
for!those!who!in! fact!do!not!have!the!trait!($ 8 <&J # $FL # O+).!Then!we!take!the!
inverse! of! the! probabilities! as! weights! (P # QPRE P7S)! and! standardize! it! in! the!
nominator!with!the!overall!probability!of!having!the!trait!to!reduce! variance! of! the!
weights.!
!
P7#<&J # $+
<&J # $FL # O+!
!
27!
PR#$ 8 <&J # $+
$ 8 <&J # $FL # O+ = &$$+!
!
Afterwards,!in!a!third! step,! we! run!the!substantial!regression!model!including!
only!the!predictor!of!interest!but!using!a!weighted!estimator.!The!likelihood!function!
for! estimating! the! coefficient! in! the! logistic! regression! model! is! then! modified! to!
include!the!weights!(as!for!example!implemented!in!Stata!14,!see!StataCorp,!2015,!p.!
1291):!
T.U # P
V=T.% ' ( ,V0WX (
VYW
P
V=HI=&$ 8 % ' ( ,V0WX +
VZW
=&$[+!
!
U=is!the!likelihood,!%the!logistic!function,!0!the!coefficient!of!the!predictor!of!
interest,!and!\=includes!all!observations!]!with=J # $.!
If! the! predictor! of! interest! has! more! than! one! category,! multinomial! logistic!
regression!can! be!used!in!analogue!way!(Imbens,!2000).!If!the!variable!of!interest!is!
continuous,! several! ways! of! estimating! inverse! density! weights! based! on! normal!
distributions! or! quantile! binning! are! available! (Naimi,! Moodie,! Auger,! &! Kaufman,!
2014).!The!weights!are!formally!defined!independent!of!the!distribution!of!X!as:!
P # ^
_&J` a7` b7
c+
^
_Fd &JFL # O` ac` bc
c=+=&$e+!
^
_is!the! functional! form!in! which! X!is! related! to!the! other! covariates!(e.g.!
(multinomial)! logistic,! linear,! log-binomial),! a!is! the! threshold! and! bcthe!
variance!estimate!(fixed!in!logistic!regression).!
From! the! logistic! regression! model! estimated! in! the! third! step,! we! obtain! a!
synthetic! marginal! odds-ratio! (SMOR)! which! can! be! interpreted! as! follows! (in!
equation,! this! is! C)f=&0WX++.! Taking! the! example! from! above,! the! SMOR! measures!
the! difference! in! odds! of! attending! college! between! individuals! with! high! and! low!
educated! parents! that! cannot! be! attributed! to! gender! and! region.! It! is! the! factor!
difference! in! the! odds,! if! the! parental! education! were! unrelated! to! gender! and!
region!in!the!population! under! study.! Note,! this! is! not! the! same!as!the!conditional!
OR! we! obtained! from! an! ordinary! logistic! regression! model! that! just! controls! for!
!
28!
gender!and!region,!i.e.!the!average!factor!difference!in!odds!at!different!levels!of!the!
controls.!!
For! the! case! of! categorical! predictors,! a! very! useful! feature! of! the! IPW!
approach!is!that!we!actually!could!resort!to!cross-tabulation!based!on!the!weighted!
data.!That!way! we! can!construct!a!synthetic! marginal! table!(for!the!use!of! a!similar!
way!of!presenting!tables!for!causal!analysis,!see!Yamaguchi,!2012),!which!in!the!form!
of! a!cross-tabulation! provides! information! in! an! easily! accessible! way.! From! the!
synthetic! marginal! table! we! can! recover! OR,! RR,! or! ME! which! are! numerically!
(approximately)! the! same! as! estimated! by! logistic! regression.! However,! a! practical!
advantage! of! using! the! latter! might! be! that! most! common! statistical! software!
packages! enrich! regression! outputs! with! information! on! statistical! inference! (like!
standard!errors!or!interval!estimates)!which!might!be!more!tedious!to!calculate!from!
a!table.! Nonetheless,! the! important! thing! to! remember! is! that! we! are! looking! at! a!
counterfactual3!or!synthetic!cross-table! that! does! not! have! a! real!life! equivalent! as!
conditional!cross-tables!do.!
!
2.3.3 Example!of!synthetic!marginal!comparison!
Using! the! IPW! approach! to! construct! a! synthetic! marginal! table! is! best!
illustrated! by! giving! an! example! from! the! research! on! educational! mobility.! We!
would! like! to! compare! the! direct! association! of! parental! education! with! college!
attendance! (net! of! academic! performance)! across! two! subsequent! cohorts! to!
evaluate! whether! the! ‘secondary! effect’! of! social! background! has! been! changing!
over!time.! For! illustration,! we! use!simulated!data.! The! data! set! contains!a! variable!
that! indicates! whether! an! individual! attended! college! or! not,! a! dichotomous!
indicator! for! high! vs.! low! parental! education! and! two! possible! control! variables.!
Gender! is! a! strong! predictor! of! college! in! this! example,! but! unrelated! to! parental!
education.!Academic!performance!in!high!school!is!a!continuous!variable!(measured!
via!test!scores)!and!is!correlated!to!both!college!attendance!and!parental!education.!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
3!While!this! table!and! the!OR!calculated!from! it!can!be! labelled!counterfactual!(“What!if!the!
control! variables! were! equally! distributed”),! we! refrain! from! using! this! term! as! counterfactual! is!
strongly!associated! with! causal! research! designs,! while! the! synthetic! marginal! OR! or! table! might! be!
used!for!causal!research!designs,!but!often!this!might!not!be!the!goal.!
!
29!
Being!interested!in! the! ‘secondary!effect’!of!parental!background!on! absolute!
or! relative! probability,! we! could! simply! estimate! a! logistic! regression! model! of!
college! attendance! including! parental! education! and! academic! performance! as!
independent! variables.! Our! research! question! is:! What! is! the! factor! difference! in!
odds!between!the!student!populations!from!high!versus!low!educational!background!
that! cannot! be! attributed! to! academic! performance?! And,! how! did! this! difference!
change! over! cohort?! Comparing! the! conditional! odds-ratios! of! parental! education!
based!on!an!ordinary!logistic!regression!model!is!problematic,!because!the!estimate!
from! a! multivariate! model! has! the! interpretation! of! “at! the! same! level! of!
performance”.! However,! our! research! question! addresses! the! difference! between!
the! groups! in! the! total! population! taking! adjusting! for! the! correlation! between!
parental!education!and!performance.!!
We! can! apply! the! suggested! IPW! method! to! estimate! the! desired! quantity.!
Based!on!this!procedure!we!calculated!a!weighted!and!an!unweighted!OR!for!both!
cohorts.4! Inspecting! results! for! cohort! A! first! (Tables! 5! and! 6),! we! see! that! the!
difference! in! probability! of! attending! college! between! students! from! high! and! low!
educated!parents!is!more!than!19!percentage!points.!The!relative!risk!is!1.34,!which!
means!the!probability!of! attending! college!is!34!percent!higher!for! those! from!high!
educational!background.! The! (marginal)! OR! is!2.41,! meaning! the! odds! of!attending!
college!are!about! 141! percent! larger! for! students! from!higher!educated! parents! as!
compared!to!those!from! lower!educated!backgrounds.!If!we!weight!this!data!by!the!
inverse!probability!of!(not)!having!high!educated!parents!based!on!prediction!only!by!
performance,!we!get!a! synthetic! marginal! table!(right!hand!side!of!Tables!5! and! 6).!
The!synthetic!situation,!the!weighted!data,!is!constructed!in!a!way!that!the!marginal!
distribution! of! parental! education! remains! the! same! while! performance! is! equally!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
4!Note,!that!conceptually!this!approach!is!very!similar!to!decomposition!of!effects!into!primary!
(indirect)!and!secondary!(direct)!effects!in!logit!models!as!proposed!in!previous!studies!(Buis,!2010;!
Erikson,!Goldthorpe,!Jackson,!Yaish,!&!Cox,!2005).!The!difference!is!that!we!do!not!propose!to!
integrate!over!predicted!probabilities,!although!this!would!lead!to!very!similar!results.!The!second!
difference!is!that!the!method!has!previously!been!used!for!the!calculation!of!indirect!effects!within!
one!model,!not!the!comparison!across!two!groups.!
!
!
30!
distributed! across! groups! of! parental! education.! Weighting! the! data! leads! to! a!
change! in! absolute! frequencies! of! college! attendance! (Table! 4)! and! probabilities!
conditional!on!parents’!education!(Table!5).!!
The!absolute! probability! difference! in! the! synthetic!table!about! 9! percentage!
points,! the! relative! risk! is! 1.15! and! the! synthetic! marginal! odds-ratio! amounts! to!
1.51.! As! expected,! part! of! the! association! of! parental! education! and! college!
attendance! can! be! attributed! to! differential! performance! in! school.! Yet,! a! direct!
association! between! parents’! education! and! college! remains,! which! would! be! the!
association! found! if! performance! was! equally! distributed! between! the! groups.!
Quantified! by! the! odds! ratio,! the! odds! of! attending! college! would! be! higher!by! 51!
percent! for! those! with! high! educated! parents! compared! with! those! having! lower!
educated!parents.!!
A! multivariate! logistic! regression! adjusting!for! performance! yields! the! same!
absolute!difference!in!probability,!the!same!relative!risk,!but!a!conditional!odds!ratio!
odds-ratio!of!1.67.! Holding! constant!performance,!we!see!that!there! is!a!difference!
of! 67! percent! in! the! odds! of! attending! college,! 16! percentage! points! higher! as!
compared!to!the!synthetic!marginal!OR.!Note,!that!there!is!a!performance!difference!
between!groups!in! the!marginal,!but!not!in! the!synthetic!marginal!table,!where! the!
performance!difference! between!college!and! not!college!is! equal!for!both!weighted!
and!unweighted! data.! Hence,! although! the!distribution! of! performance! conditional!
on!parental!education!is!altered,!the!distribution!conditional!on!college!is!not.!One!
could!think! of!constructing!other!scenarios!like!what!would!be!the!group!difference!
if!the!low!education!group! had! the! same!performance!as!the!high!education!group!
or!vice!versa.!!
For!the!second!cohort!B,!the!overall!level!of!inequality!in!college!attendance!is!
much!higher.!The!marginal!effect!is!over!40!percentage!points,!the!relative!risk!ratio!
about!2! and!the!factor! difference!in!odds!more!than!6.!Accounting!for!performance!
differences!via!IPW,!differences!are!much!reduced.!The!absolute!inequality!is!a!little!
lower! than! in! cohort! A! (7.78! percentage! points),! the! relative! inequality! about! the!
same! (RR! 1.14)! and! the! OR! is! a! little! smaller! with! 1.38.! Based! on! this! we! can!
conclude! that! overall! inequalities! are! stronger! in! cohort! B! than! in! cohort! A,! both!
relatively! and! absolutely! speaking.! The! secondary! effect! –! group! differences! in!
!
31!
college!attendance!that!cannot!be!explained! by!group!differences!in!performance!–!
is!about!the!same!which!implies! that!the!indirect! effect!is!larger!in!cohort!B!than! in!
cohort!A.!
Our!conclusion!is!based!on!comparing!the!synthetic!marginal!OR!across!groups.!
If!we!now!compared!the!conditional!OR!we!would!see!an!OR!of!1.67!in!cohort!A!and!
a!conditional!OR!of! 2.68!in!cohort!B.!So,!if!we!took!the!conditional!OR!as!a!measure!
we!would!conclude!that!the!secondary!effect! is!substantially!larger!in!cohort! B!than!
in! cohort! A.! The! reason! is! that! performance! is! much! more! predictive! for! college!
attendance!in!cohort!B!and!conditioning! on!it!increases! the!predictive!power!of!the!
model!in!cohort!B.!Therefore,!knowing!performance,!the!differences!in!odds!of!those!
at! the! same! level! are! larger! between! high! and! low! educated! in! cohort! B! than! in!
cohort! A! (conditional! interpretation).! However,! if! we! compare! the! high! versus! low!
education!groups! under!the!assumption! that!performance!were!equally!distributed,!
the!odds!ratio!for!parental!education!were!roughly!the!same,!even!slightly!higher!in!
cohort!A!than!in!cohort!B!(marginal!interpretation).!Depending!on!the!interpretation!
of!odds!ratios,!conditional!or! marginal,! we! would! draw! different! conclusions! about!
the!relative!importance!of!the!secondary!effect!of!parental!education!within!the!two!
cohorts.! Both! the! calculation! and! interpretation! of! AME! and! RR! are! unaffected! by!
our! approach! by! the! weighting! approach.! The! IPW! approach! yields! approximately!
the!same!results!in!for!AME!and!RR!as!the!unweighted!regression.!!
What!happens!in!this!example!if!we!control!for!a!predictor!that!is!unrelated!to!
parental! education,! but! a! strong! predictor! of! college! attendance?! In! our! example!
that! could! be! gender.! The! adjusted! conditional! OR! estimate! from! multivariate!
regression!increases!to!2.11!in!cohort!A.!If!we!do!not!want!to!compare!odds-ratio!for!
individuals!of!the!same! gender! between!cohorts,!but!for!the! whole! population,! but!
still!want!to!adjust!for!gender,! we!need!to!marginalize! over!gender.!We!can! do!this!
by!including!gender!as!an!additional!variable! in! the! first! step! of! the! IPW! approach,!
the!prediction!of!parental!education.!Even!though!gender!might!be!related!to!college!
attendance,! we! expect! that! accounting! for! gender! should! not! alter! the! synthetic!
marginal!odds-ratio!because!there!is!no!reason!to!believe!that!an!individual’s!gender!
is!related!to!their!parents’!education.!In!fact,!accounting!for!gender!does!not!change!
the! inverse! probabilities! significantly! and,! thus,! we! get! almost! exactly! the! same!
!
32!
synthetic! marginal! table! (see! Table! 10! and! Table! 11! in! the! appendix).! This!
demonstrates!another!viable! feature! of! using! the! IPW! approach:!the!robustness! of!
the! marginal! interpretation! of! SMOR! when! accounting! for! other! variables! that! are!
predictive!for!the!outcome!under!study,!but!not!the!predictor!of!interest.!!
!
33!
Table!5:!Marginal!and!synthetic!marginal!table!linking!parental!education!and!children's!education!–!cell!frequencies!for!Cohort!A!
!
Marginal!table!
Synthetic!marginal!table!
Parental!education!
No!!
college!
College!
Total!
Performance!
No!
college!
College!
Total!
Performance!
Low!
1,303!
1,718!
3,021!
-0.008!
1,181!
1,840!
3,021!
0.412!
High!
474!
1,505!
1,979!
1.017!
590!
1,389!
1,979!
0.410!
Total!
1,777!
3,223!
5,000!
0.398!
1,771!
3,229!
5,000!
0.411!
Performance!!
-0.827!
1.073!
0.398!
ME:19.18((
RR:1.34((
OR:(2.41!
-0.805!
1.050!
0.411!
ME:9.26((
RR:1.15((
OR:(1.51!
Note:! Simulated! data.! The! synthetic!table! was! created! using! weights! that! balances! performance!level! in! high! s chool! to!create! a! synthetic!data! set! in! which!performance!and! parental! education! are! unrelated.!
Performance!level!is!equally!distributed!by!parental!background.!
!
Table!6:!Marginal!and!synthetic!marginal!table!linking!parental!education!and!children's!education!–!row!percentages!for!Cohort!A!
!
Marginal!table!
!
Synthetic!marginal!table!
Parental!education!
No!!
college!
College!
Total!
Performance!
No!!
college!
College!
Total!
Performance!
Low!
43.13!
56.87!
100.00!
-0.008!
39.08!
60.92!
100.00!
0.412!
High!
23.95!
76.05!
100.00!
1.017!
29.82!
70.18!
100.00!
0.410!
Total!
35.54!
64.46!
100.00!
0.398!
35.41!
64.59!
100.00!
0.411!
Performance!
-0.827!
1.073!
0.398!
ME:19.18((
RR:1.34((
OR:(2.41(
-0.805!
1.050!
0.411!
ME:9.26((
RR:1.15((
OR:(1.51!
Note:!Simulated!data.!The!synthetic!table!was!created!using!weights!that!balances!performance!level!in!high!school!to!create!a!synthetic!data!set!in!which!performance!and!parental!education!are!unrelated.!
Performance!level!is!equally!distributed!by!parental!background.
!
34!
!
Table!7:!Marginal!and!synthetic!marginal!table!linking!parental!education!and!children's!education!–!cell!frequencies!for!Cohort!B!
!
Marginal!table!
Synthetic!marginal!table!
Parental!education!
No!!
college!
College!
Total!
Performance!
No!
college!
College!
Total!
Performance!
Low!
1,822!
1,199!
3,021!
-0.004!
1,388!
1,633!
3,021!
0.435!
High!
370!
1,609!
1,979!
1.009!
755!
1,224!
1,979!
0.423!
Total!
2,192!
2,808!
5,000!
0.397!
2,143!
2,857!
5,000!
0.429!
Performance!
-0.552!
1.137!
0.397!
ME:41.61((
RR:(2.04((
OR:6.61!
-0.549!
1.140!
0.429!
ME:7.78((
RR:(1.14((
OR:1.38!
Note:! Simulated! data.! The! synthetic!table! was! created! using! weights! that! balances! performance!level! in! high! schoo l!to!create!a! synthetic!data! set! in! which! performance!and! parental! education! are! unrelated.!
Performance!level!is!equally!distributed!by!parental!background.!
!
Table!8:!Marginal!and!synthetic!marginal!table!linking!parental!education!and!children's!education!–!row!percentages!for!Cohort!B!
!
Marginal!table!
Synthetic!marginal!table!
Parental!education!
No!!
college!
College!
Total!
Performance!
No!!
college!
College!
Total!
Performance!
Low!
60.31!
39.69!
100.00!
-0.004!
45.94!
54.06!
100.00!
0.435!
High!
18.70!
81.30!
100.00!
1.009!
38.16!
61.84!
100.00!
0.423!
Total!
43.84!
56.16!
100.00!
0.397!
42.86!
57.14!
100.00!
0.429!
Performance!
-0.552!
1.137!
0.397!
ME:41.61((
RR:(2.04((
OR:6.61(
-0.549!
1.140!
0.429!
ME:7.78(
RR:(1.14(
OR:1.38!
Note:!Simulated!data.!The!synthetic!table!was!created!using!weights!that!balances!performance!level!in!high!school!to!create!a!synthetic!data!set!in!which!performance!and!parental!education!are!unrelated.!
Performance!level!is!equally!distributed!by!parental!background.
!
35!
Table!9:!Comparability!of!quantities!under!natural!categorical!framework!
!
!
!
Comparability!
Interpretation!
Note!
Quantity!
Bivariate!
Multivariate!
!
!
Odds-ratio!
Size!
Sign!(size)!
Classificatory!power;!degree!of!
stratification,!Change!in!odds!
Given!known!covariates,!multivariate!only!if!
interpreted!as!conditional!(at!same!level!of!
covariates)!
SMOR!
Size!
Size!
Degree!of!stratification,!Change!of!odds!
in!population!
Given!correct!IPW!model!
AME/LPM!
Size!
Size!
Absolute!probability!difference!
ME!varies!between!individuals!
RR!
Size!
Size!
Relative!probability!difference!
Marginal!interpretation;!not!symmetric!like!OR,!
coding!of!event!important!
y*-std!
Sign!
Sign!
Underlying!propensity!
Unclear!meaning,!counter!intuitive!
Standardized!
ratio!
Size!
Size!
Relative!probability!difference!
Has!RR!interpretation,!but!is!based!on!probability!
predictions,!captures!relative!aspect;!in!
univariate!case!identical!to!RR;!in!multivariate!
not!identical!due!to!Jensen’s!inequality.!!
!
36!
3 Average!Marginal!Effects,!Risk!Ratios!and!Odds!Ratios!–!
United!we!understand!
The! discussion! of! comparability! under! a! natural! categorical! framework! is!
summarized!in!table!9.!
3.1 The'complementary'nature'of'AME,'RR'and'OR'
As! we! discussed,! under! the! LV! framework! OR,! RR! and! AME! only! have!
comparability! of! sign! and! are! equally! problematic! for! comparisons! across! groups.!
Within!the!natural!categorical!research!framework,!however,!all!three!can!be!useful!
with!the!effect!that!we!either!decided!for!one!that!best!fits!the!quantity!to!measure!
or,!alternatively,!use!all!three!in!a!complementary!way.!!
There! are! several! arguments! why! an! exclusive! reliance! on! AME! –! which! has!
become!more! common! in! recent! years! –!limits! interesting! aspects! of! comparisons.!
First,! while! log-odds! ratios! are! parameters! of! a! statistical! model,! average! marginal!
effects! are! not.! An! AME! does! not! depend! only! on! parameters! of! the! probability!
function! but! also! on! the! joint! distribution! of! covariates! in! a! sample.! Hence,! while!
being!illustrative!for!a! specific! set!of!data,!it!is!impossible! to! reproduce! the!original!
model!parameters!from!the!AME!which!severely!limits!the!ability!to!replicate!results!
of!studies.!If!study!results!cannot!be!reproduced!it!is!unclear!whether!this!is!due!a!
difference! in! the! estimation! of! the! model! parameters! or! due! to! subsequent!
calculation!of!AME!in!the!used!sample.!!
Second,!and! substantively! more! important,! in!sociology!in! general! and! in!the!
field! of! stratification! research! in! particular,! the! distinction! between! absolute! and!
relative! inequality! among! groups! should! be! kept! in! mind.! An! exclusive! reliance! on!
AME!for!comparative!purposes!would!mean!that! we! eliminate! all! kinds! of! research!
questions! that! address! relative! differences,! e.g.! relative! differences! in! educational!
attainment.! Absolute! probability! differences! (AME)! and! relative! rates! (OR/RR)!
represent! a! different! concepts.! A! comparison! might! lead! to! the! same! conclusion!
based! on! absolute! or! relative! perspectives.! However,! a! difference! in! conclusion!
based! on! either! the! absolute! probability! scale,! the! relative! odds! or! relative!
probability! scale! is! not! only! a! theoretically! valid! result,! but! can! happen! using! real!
!
37!
world! data! and! might! be! of! particular! substantive! interest.! This! argument! is! not!
novel,! but! builds! on! the! conclusion! that! was! drawn! in! (Mood,! 2010)! who! also!
advocates!a!careful!choosing!of!quantities!to!report!and!advises!against!treating!any!
single!estimate!as!a!panacea.!
We! will! now! illustrate! how! we! can! use! OR,! AME! and! RR! jointly! to! compare!
groups!and!how!this!e.g.!can!enhance!our!understanding!of!changes!over!cohorts.!!!!!!!!!
!
3.2 Example'–'Educational'attainment'and'intergenerational'mobility'
We! present! a! fictional! example! of! development! of! secondary! school!
attainment! (Figure! 2).! The! log! of! the! OR! is! visually! represented! as! the! size! of! the!
diamonds!with!the!first!cohort! being! the! category! of! reference.! The! comparison! of!
AME!shows!that!absolute!inequalities!in!secondary! school!attainment!have!become!
smaller! (from! 19! to! 6! percentage! points)! over! cohorts.! The! reduction! in! relative!
inequalities!is!similar!(RR!reduced!from! 1.46!to!1.06).!Therefore,!we!could!conclude!
that!inequalities!in!secondary! school! attainment!are!strongly!reduced!over!cohorts,!
which! is! correct.! However,! we! argue! that! the! statement! that! secondary! school!
attainment! is! no! longer! socially! stratified! based! on! these! results,! is! only! half! the!
truth!and! hides! an! important!fact.! We! can! see!that! the! OR! has! remained!constant!
across!cohorts.!The! reason! for!increasing!OR!is!that! social! stratification!is!no!longer!
relevant!for!the!question!of!who!gets!a!secondary!school! degree!(almost!everybody!
does),!but!is!still!relevant!for!the!question!of!who!does!not!get!a!secondary!school!
degree.! While! winners’! are! no! longer! socially! stratified,! ‘losers’! are.! The! high!
baseline! probability! in! the! later! cohorts! also! indicates! that! sociological! analyses!
would! rather! focus! on! describing! and! explaining! patterns! of! drop-out! instead! of!
completion.!!
In! Figure! 3! we! present! an! example! based! on! real! data! on! cohort! change! in!
inequality!in!secondary!school!attainment!(min.!ISCED!level!3!and!above)!by!parental!
education.5!The!pattern! is! slightly!different!than!in!the! simulate! example.!We!see!a!
marked! decrease! in! absolute! (AME! from! 43! to! 26! percentage! points)! and! relative!
inequality! (RR! from! 1.91! to! 1.35)! as! well,! but! substantial! inequality! remains! in! the!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
5!Analyses!based!on!data!from!the!National!Educational!Panel!Study!(NEPS).!
!
38!
younger!cohorts.!At!the!same!time,!we!see!that!there!is!no!decrease!in!the!OR.!Quite!
to!the!contrary,!the!youngest! cohort! even! displays! a!substantially!higher!OR.!While!
inequality! in! secondary! school! attainment! and! non-attainment! clearly! existed! in!
earlier! cohorts,! the! decrease! in! inequality! in! attainment! is! mirrored! to! the!
observation! that! non-attainment! in! the! youngest! cohorts! is! almost! exclusively! an!
issue! of! individuals! coming! from! lower! educated! parents! which! is! reflected! in! the!
OR,!but!neither!in!the!RR!nor!the!AME.!This!trend!becomes!more!apparent!if!we!flip!
the!Figure!3!on!its!head!and!plot!non-attainment!of!secondary!education!as!done!in!
Figure!4.!
Note!that!the!scaling!of!the!diamonds! is! exactly! the! same! as! in! Figure! 3! as! is!
the!absolute!difference!in!probability.!However,!the!relative!risk!of!non-attainment!is!
not! simply! the! inverse! of! the! relative! calculated! for! attainment.! While! ME! and! OR!
are!symmetric!to!the!coding!of!the!dependent!variable,!the!relative!risk!is!not.!Figure!
4!shows!more!prominently!that!the!risk!of!not!finishing!secondary!education!is!more!
than! 10! times! higher! in! the! youngest! cohorts! for! those! from! low! educational!
background! compared! to! those! from! high! educational! background.! This! is! despite!
the! fact! that! the! absolute! difference! has! decreased! by! between! 17-19! percentage!
points.!
In!sum,!we!can!say!that!using!the!OR!in!combination!with!ME!and!RR,!we!could!
conclude! that! while! relative! and! absolute! inequalities! in! secondary! school!
attainment!have!diminished!substantially,!social!stratification!of!drop-outs!and!non-
completion! is! still! imminent! and! could! be! the! focus! of! future! research.! While!
educational! expansion! has! changed! relevant! degrees! of! absolute! and! relative!
inequality! in! secondary! school! attainment,! there! seem! to! be! still! mechanisms! that!
link! social! background! and! this! educational! attainment,! in! the! sense! that! they!
determine! failure! instead! of! success! and! this! degree! of! determination! has! not!
diminished!over!cohorts.!!
This!example!tells!us!that!it!can!be!very!helpful!to!report!and!interpret!ME!and!
RR,! alongside! the! OR.! Our! graph! is! just! one! of! many! ways! to! combine! this!
information.!
!
!
!
39!
!
!
40!
Figure!2:!Simulated!development!of!secondary!school!attainment!
!
Note:!Simulated!data.!Size!of! the!diamonds! is!scaled!to! reflect!differences! in!the!OR!with!the!oldest!cohort!set!to!be!the!point!
of!reference.!
ME: 0.19
RR: 1.46
OR: 2.22
ME: 0.19
RR: 1.37
OR: 2.88
ME: 0.11
RR: 1.16
OR: 2.30
ME: 0.09
RR: 1.12
OR: 4.71
ME: 0.06
RR: 1.07
OR: 6.25
0.1 .2 .3 .4 .5 .6 .7 .8 .9 1
1947/1952 1953/1957 1958/1962 1963/1967 1968/1972
Birth Cohort
High parental SEP Low parental SEP 95%-CI
!
41!
Figure!3!Development!of!secondary!school!attainment!in!Germany!
!
Note:! Illustrative! data! from! the! National! Educational! Panel! Study! (NEPS)! in! Germany.! Size! of! the!
diamonds!is!scaled!to!reflect!differences!in!the!OR!with!the!oldest!cohort!set!to!be!the!point!of!reference.!
!
Figure! 4! Development! of! secondary! school! Non-attainment! in!
Germany!
!
Note:! Illustrative! data! from! the! National! Educational! Panel! Study! (NEPS)! in! Germany.! Size! of! the!
diamonds!is!scaled!to!reflect!differences!in!the!OR!with!the!oldest!cohort!set!to!be!the!point!of!reference.!
!
!
ME: 0.43
RR: 1.91
OR: 11.45
ME: 0.32
RR: 1.50
OR: 11.39
ME: 0.27
RR: 1.39
OR: 15.31
ME: 0.24
RR: 1.32
OR: 13.46
ME: 0.26
RR: 1.36
OR: 22.75
0.1 .2 .3 .4 .5 .6 .7 .8 .9 1
1947/1952 1953/1957 1958/1962 1963/1967 1968/1972
Birth Cohort
Low parental Education High Parental education 95%-CI
Inequality in secondary school attainment over cohorts - All
ME: 0.43
RR: 6.01
OR: 11.45
ME: 0.32
RR: 7.58
OR: 11.39
ME: 0.27
RR: 11.02
OR: 15.31
ME: 0.24
RR: 10.17
OR: 13.46
ME: 0.26
RR: 16.68
OR: 22.75
0.1 .2 .3 .4 .5 .6 .7 .8 .9 1
1947/1952 1953/1957 1958/1962 1963/1967 1968/1972
Birth Cohort
Low parental Education High Parental education 95%-CI
Inequality in secondary school NON-attainment over cohorts - All
!
42!
4 Conclusion!
Our!paper!aimed!to!shed!light!on!a!confusing!debate! on! the! comparability! of!
logit! coefficients! that! has! been! emerging! in! the! recent! years.! We! started! with!
arguing! that! issues! raised! by! Mood! (2010)! and! others! do! not! apply! to! all! research!
agendas.!Importantly,!logistic!regression! can! serve!different!ends.!It!can!be! used! to!
analyze!natural!categorical!dependent!variables.!It!may!also!be! used! as! a! model! to!
estimate!effects!on!a!latent!variable!(propensity),!which!is!unobserved!but!assumed!
to!generate!binary!observations.!Both!are!very!different!theoretical!approaches!that!
cannot! be! distinguished! empirically,! but! have! far! reaching! consequences! for! the!
interpretation! of! the! model! results.! We! argued! in! detail! that! the! comparability! of!
model! results! depend! on! whether! we! have! a! natural! categorical! (NC)! or! a! latent!
variable!(LV)!approach!in!mind.!!
Second,! we! pointed! out! that,! contrary! to! common! beliefs,! AME! are! not!
immune! to! unobserved! heterogeneity! under! the! LV! framework! (for! a! similar!
argument,!see!Holm! et!al.,!2014).!In!fact,! none!of!the!possible! quantities! estimated!
from! logistic! regression! (with! a! partial! exception! of! the! standardized! coefficients!
(Breen!et!al.,!2014))!are!helpful!for!across!group!or!sample!comparisons!of!size.!
Third,!we!showed! that!AME,!OR!and! RR!are!all!comparable! in!size!in!bivariate!
models!across!groups! and! AME!and!RR!also!in! multivariate!models.!Contrary!to!the!
common! belief,! OR! are! comparable! in! size! even! in! multivariate! models! if! the!
conditional!interpretation!is!used.!If!a!marginal!interpretation!–!while!controlling!for!
other! covariates! –! is! desired,! we! proposed! an! inverse! probability! weighting!
technique! that! combines! these! two! properties! to! make! OR! comparable! in! size! for!
marginal!interpretations!in!multivariate!models.!
Fourth,! we! showed! that! for! research! questions! in! the! natural! categorical!
framework!AME,!OR!and!RR!complement!each!other!in!interpretation!and!illustrated!
the!joint!use!for!cohort!comparisons!that!yielded!insights!that!would!have!been!lost!
if!only!one!of!the!quantities!would!have!been!reported.!
We! have! four! main! suggestions! for! future! research.! First,! when! cross-group!
comparisons! of! effects! are! made,! researchers! should! be! clear! about! what! which!
effect! on! what! they! are! referring! to:! Probability,! relative! probability,! odds,!
!
43!
(standardized)! latent! variable?! Further,! researchers! could! think! about! hypotheses!
that! combine! comparisons! on! these! different! scales! given! that! theory! is! detailed!
enough.!In! any! case,! it! is!advisable! for! any! comparison! to!report! effects! should! on!
different!scales.!!
Second,! AME! should! not! be! used! in! comparisons! if! the! interest! is! directed!
towards!coefficients!in!the!LV!model!unless!convincing!argument!are!presented!that!
the!underlying!assumptions!are!likely!to!hold.!Furthermore,!research!should!be!more!
clear!and! consistent!in!clarifying! whether!the!dependent! variable!is!treated!in!LV!or!
NC!framework.!!
Third,! in! many! cases! comparisons! of! AME,! RR,! and! OR! give! a! more! coherent!
picture! of! differences! between! groups! if! we! conceive! our! dependent! variable! as!
being! naturally! categorical.! Further,! we! should! take! substantial! differences! in!
baseline! between! groups! into! account! and! discuss! if! the! meaning! of! the! variable!
remains!the!same!or!whether!the!absence'of'a' condition!might!be!more!interesting!
than!the!condition!itself.!
Fourth,! we! suggest! the! usage! of! inverse! probability! weighting! to! estimate!
synthetic!marginal!odds-ratio!(SMOR)!for!comparisons!across!groups!or!samples.!For!
many! research! contexts! this! might! be! favored! over! comparison! of! conditional! OR!
which!are!more!difficult!to!interpret.!However,!we!want!to!stress!the!conclusion!that!
both! kinds! of! comparison! are! possible! within! a! NC! framework,! depending! on! the!
precise!research!question!and!interpretation!of!the!results.!
In! sum,! we! believe! that! a! stronger! reliance! on! theory! grounded! decisions! is!
needed! for! deciding! about! which! quantities! to! be! reported! and! interpreted! when!
using!logistic!regression!for!comparisons!across! groups! and! samples.! There! are!few!
rules!that!hold! for!all!perspectives!and! research!questions!and!generalizations!have!
been! shown! to! be! faulty! under! certain! circumstances! (an! example! of! a! close! link!
between!theoretical!discussion!of!inequality!and!methodological!implications,!can!be!
found! in! Bulle,! 2016).! Further,! forcing! ourselves! to! think! again! about! which!
quantities! to! interpret! also! allows! thinking! more! carefully! about! our! theories! and!
whether!they!might!be!able!to!guide!analysis!in!absolute,!relative!or!odds!terms!and!
whether!they!might!actually!make!predictions!on!different!levels.!For!example,!the!
idea!of!persistent'inequality!(Shavit!&! Blossfeld,! 1993)! proposes! that! absolute! level!
!
44!
of!inequality!(as!could!be!tested!using!AME)!in!education!have!declined!over!certain!
periods!while!relative!inequalities!have!remained!constant!(as!could!be!tested!using!
RR!or!OR)!while!opposing!claims! could!equally!draw!on!different!kinds!of! quantities!
to!test!their!claims!about!relative!of!absolute!inequalities!(e.g.!Breen,!Luijkx,!Müller,!
&!Pollak,!2009).!This! way! a! methodological! discussion! would! not! only!facilitate!the!
statistical!implementation!of!certain!models,!but!also!contribute!to!improving!theory!
and!its!predictions.!
!
!
!
45!
5 References!
!
Agresti,! Alan.! 2013.! Categorical' Data' Analysis.! 3rd! ed.! Wiley! Series! in!
Probability!and!Statistics!792.!Hoboken,!NJ:!Wiley.!
Allison,!Paul!D.!1999.!Comparing!Logit!and!Probit!Coefficients!Across!Groups.!
Sociological'Methods'&'Research!28:!186208.!doi:10.1177/0049124199028002003.!
Bailis,!Daniel!S,! Alexander!Segall,!and!Judith! G!Chipperfield.!2003.!Two! Views!
of! Self-Rated! General! Health! Status.!Social' Science' &' Medicine!56! (2):! 20317.!
doi:10.1016/S0277-9536(02)00020-5.!
Blane,! D.,! G.! Netuveli,! and! J.! Stone.! 2007.! The! Development! of!Life! Course!
Epidemiology.!Revue' d’Épidémiologie' et' de' Santé'Publique!55! (1):! 3138.!
doi:10.1016/j.respe.2006.12.004.!
Breen,! Richard,! Anders! Holm,! and! Kristian! Bernt! Karlson.! 2014.! Correlations!
and! Nonlinear! Probability! Models.!Sociological' Methods' &' Research!43! (4):! 571
605.!doi:10.1177/0049124114544224.!
Buis,!Maarten!L.!2010.!Direct!and!Indirect!Effects!in!a!Logit!Model.!The'Stata'
Journal!10!(1):!11.!
Cole,!Stephen!R.,!and! Miguel! A.! Hernán.!2004.!Adjusted!Survival!Curves!with!
Inverse! Probability! Weights.!Computer' Methods' and' Programs' in' Biomedicine!75!
(1):!4549.!doi:10.1016/j.cmpb.2003.10.004.!
Cummings,!Peter.! 2009.! Methods! for! Estimating!Adjusted! Risk! Ratios.!Stata'
Journal!9!(2):!175.!
Dowd,! Jennifer! B.,! Amanda! M.! Simanek,! and! Allison! E.! Aiello.! 2009.! Socio-
Economic! Status,! Cortisol! and! Allostatic! Load:! A! Review! of! the! Literature.!
International'Journal'of'Epidemiology,!August,!dyp277.!doi:10.1093/ije/dyp277.!
Dowd,!Jennifer!Beam,!and!Anna!Zajacova.!2010.!Does!Self-Rated!Health!Mean!
the! Same! Thing! Across! Socioeconomic! Groups?! Evidence! From! Biomarker! Data.!
Annals'of'Epidemiology!20!(10):!74349.!doi:10.1016/j.annepidem.2010.06.007.!
Duncan,!Otis!Dudley.!1975.!Introduction'to'Structural'Equation'Models.!Studies!
in!Population.!New!York:!Academic!Press.!
!
46!
Erikson,! Robert,! John! H.! Goldthorpe,! Michelle! Jackson,! Meir! Yaish,! and! D.! R.!
Cox.! 2005.! On! Class! Differentials! in! Educational! Attainment.!Proceedings' of' the'
National' Academy' of' Sciences' of' the' United' States' of' America!102! (27):! 973033.!
doi:10.1073/pnas.0502433102.!
Gail,!M.!H.,!S.!Wieand,!and!S.!Piantadosi.!1984.!Biased!Estimates!of!Treatment!
Effect! in! Randomized! Experiments! with! Nonlinear! Regressions! and! Omitted!
Covariates.!Biometrika!71!(3):!43144.!doi:10.1093/biomet/71.3.431.!
Greenland,!Sander,!James!M!Robins,!and!Judea!Pearl.!1999.!Confounding!and!
Collapsibility!in!Causal!Inference.!Statistical'Science,!2946.!
Holm,! Anders,! Mette! Ejrnæs,! and! Kristian! Karlson.! 2014.! Comparing! Linear!
Probability! Model! Coefficients! across! Groups.!Quality' &' Quantity,! 112.!
doi:10.1007/s11135-014-0057-0.!
Imbens,! G.! W.! 2000.! The! Role! of! the! Propensity! Score! in! Estimating! Dose-
Response!Functions.!Biometrika!87!(3):!70610.!doi:10.1093/biomet/87.3.706.!
Jylhä,! Marja.! 2009.! What! Is! Self-Rated! Health! and! Why! Does! It! Predict!
Mortality?!Towards!a!Unified!Conceptual!Model.!Social'Science'&'Medicine!69!(3):!
30716.!doi:10.1016/j.socscimed.2009.05.013.!
Jylhä,!Marja,!Jack!M.!Guralnik,!Luigi!Ferrucci,!Jukka!Jokela,!and!Eino!Heikkinen.!
1998.!Is!Self-Rated! Health!Comparable!across!Cultures! and!Genders?!The'Journals'
of'Gerontology'Series'B:'Psychological'Sciences'and'Social'Sciences!53B!(3):!S14452.!
doi:10.1093/geronb/53B.3.S144.!
Karlson,! Kristian! Bernt,! Anders! Holm,! and! Richard! Breen.! 2012.! Comparing!
Regression!Coefficients!Between!Same-Sample!Nested!Models!Using!Logit!and!Probit!
A! New! Method.!Sociological' Methodology!42! (1):! 286313.!
doi:10.1177/0081175012444861.!
Leopold,! Liliya.! 2016.! Cumulative! Advantage! in! an! Egalitarian! Country?!
Socioeconomic!Health!Disparities!over!the!Life!Course!in!Sweden.!Journal'of'Health'
and'Social'Behavior!57!(2):!25773.!doi:10.1177/0022146516645926.!
Mood,!Carina.!2010a.!Logistic!Regression:!Why!We!Cannot!Do!What!We!Think!
We!Can!Do,!and!What!We!Can!Do!about!It.!European'Sociological'Review!26!(ii):!67
82.!doi:10.1093/esr/jcp006.!
!
47!
———.!2010b.!Logistic!Regression:!Why! We! Cannot! Do! What! We! Think! We!
Can!Do,!and!What!We!Can!Do!About!It.!European'Sociological'Review!26!(1):!6782.!
doi:10.1093/esr/jcp006.!
———.! 2013.! Life-Style! and! Self-Rated! Global! Health! in! Sweden:! A!
Prospective!Analysis!Spanning!Three!Decades.!Preventive'Medicine!57!(6):!802806.!
Morgan,! Stephen! L.,! and! Christopher! Winship.! 2007.! Counterfactuals' and'
Causal'Inference:'Methods'and'Principles'for'Social'Research.!Analytical!Methods!for!
Social!Research.!New!York:!Cambridge!University!Press.!
Naimi,!Ashley!I.,!Erica!E.!M.!Moodie,!Nathalie!Auger,!and!Jay!S.!Kaufman.!2014.!
Constructing!Inverse! Probability! Weights! for!Continuous! Exposures:! A! Comparison!
of! Methods.!Epidemiology' (Cambridge,' Mass.)!25! (2):! 29299.!
doi:10.1097/EDE.0000000000000053.!
Norton,!Edward!C.!2012.!Log!Odds!and!Ends.!Working!Paper!18252.!National!
Bureau!of!Economic!Research.!http://www.nber.org/papers/w18252.!
Pang,! Menglan,! Jay! S.! Kaufman,! and! Robert! W.! Platt.! 2013.! Studying!
Noncollapsibility!of!the!Odds! Ratio! with! Marginal!Structural!and!Logistic!Regression!
Models.!Statistical' Methods' in' Medical' Research,! October,! 0962280213505804.!
doi:10.1177/0962280213505804.!
Pearson,!K.,!and!D.!Heron.!1913.!On!Theories!of!Association.!Biometrika!9!(1
2):!159315.!doi:10.1093/biomet/9.1-2.159.!
Tchetgen!Tchetgen,!Eric!J.! 2013.! Inverse! Odds! Ratio-Weighted!Estimation!for!
Causal!Mediation!Analysis.!Statistics'in'Medicine!32!(26):!45674580.!
Triventi,! Moris.! 2013.! Stratification! in! Higher! Education! and! Its! Relationship!
with! Social! Inequality:! A! Comparative! Study! of! 11! European! Countries.!European'
Sociological'Review!29!(3):!489502.!
Whittemore,! Alice! S.! 1978.! Collapsibility! of! Multidimensional! Contingency!
Tables.!Journal'of'the'Royal'Statistical'Society.'Series'B'(Methodological),!328340.!
Willson,! Andrea!E.,! Kim!M.! Shuey,! and! Jr.! Glen!H.!Elder.! 2007.! Cumulative!
Advantage!Processes!as!Mechanisms!of!Inequality!in!Life! Course! Health.!American'
Journal'of'Sociology!112!(6):!18861924.!doi:10.1086/509520.!
Winship,! Christopher,! and! Robert! D.! Mare.! 1984.! Regression! Models! with!
Ordinal!Variables.!American'Sociological'Review!49:!512.!doi:10.2307/2095465.!
!
48!
Yule,! G.! Udny.! 1900.! On! the! Association! of! Attributes! in! Statistics:! With!
Illustrations!from!the!Material!of!the!Childhood!Society.!Philosophical'Transactions'
of' the' Royal' Society' of' London' A:' Mathematical,' Physical' and' Engineering' Sciences!
194!(252261):!257319.!doi:10.1098/rsta.1900.0019.!
———.!1903.!Notes! on!the!Theory!of!Association! of!Attributes!in!Statistics.!
Biometrika!2!(2):!121.!doi:10.2307/2331677.!
Yule,!George!Udny.!1911.!An'Introduction'to'the'Theory'of'Statistics.!C.!Griffin,!
limited.!
Zhang,! Zhiwei.! 2008.! Estimating! a! Marginal! Causal! Odds! Ratio! Subject! to!
Confounding.!Communications' in' Statistics' -' Theory' and' Methods!38! (3):! 30921.!
doi:10.1080/03610920802200076.!
!
!
!
!
!
49!
6 Appendix!
!
6.1 S1%-%A%formal%treatment%of%comparison%
In! our! definition,! we! use! !! as! a! placeholder! for! the! construct! we! want! to!
compare! from! a! theoretical! perspective! and! "! the! quantity! we! actually! estimate!
from!our!model!that!is!to!represent!!.!In!the! following! we! give! a! formal! definition!
when!comparisons!of! "! across!groups!represents!a!comparison! of! !!across!groups.!
We!use!group!A!and!B,!as!stand-ins!for!any!kind!of!groups!comparisons!e.g.!between!
countries,!men!and!women,!cohorts,!ethnic!groups!or!periods.!
We!define!comparability!of!size!on!additive!scales!as!follows:!
!
# "$% "&' !$% !&!
(1a)!
!
This!means! that!the!difference!between!group!A!and!group!B!in!our!construct!
equals! the! expectation! of! the! difference! of! our! estimated! quantities,! a!
straightforward!definition.! For!multiplicative'scales!the!analogue!definition!refers!to!
the!ratio!instead!of!the!difference:!
#"()
"(*
'!()
!(*
!
(1b)!
Our!definition! implies!that!the!difference!(ratio)!of!the!quantities!we!estimate!
needs! to! be! an! unbiased! estimator! of! the! difference! (ratio)! of! the! true! difference!
(ratio).!
In!contrast,!the!comparability'of'sign!lets!us!only!answer!the!simple!question!
whether! D! has! the! same! sign! in! both! groups.! Comparability! of! sign! is! given! if! the!
following!conditions!hold!(3):!
# "$+ ,--.//---!$+ ,!
# "$0 ,--.//---!$0 ,!
# "&+ ,--.//---!&+ ,-!
# "&0 ,--.//---!&0 ,!
!
!
50!
The!acceptance!that!comparability!of!estimates!depends!on!the!definition!of!!!
is! crucial! to! our! argument.! How! we! define! !! either! as! an! absolute! distance! on! an!
additive! metric! or! a! ratio! between! two! quantities! would! be! ideally! rooted! in!
theoretical!grounds.!Thus,!we! deliberately! omitted! a!definition!of!the!scale!of!!.! !!
could!be! measured! on! different! scales!depending!on! the! research! context.! For!our!
purpose! the! probability! scale! (Pr(Y)),! either! additive! or! multiplicative,! ! the! odds!
(12-345
)612-345)! scale,! and! the! (standardized)! scale! of! the! latent! variable! (y*)! will! be! the!
central! scales! under! which! most! research! questions! can! be! subsumed.! Which! of!
these! scales! is! relevant! to! determine! comparability! is! mainly! dependent! on! our!
choice!of!conceptual!framework!that!we!apply!to!our!dependent!variable.!!
!
51!
6.2 S2%–%Additional%tables%and%graphs%
Table! 10:! Marginal! and! synthetic! marginal! table! linking! parental! education! and! children's! education! (weighting! additionally! for! gender)! –!
frequencies!for!Cohort!A!
!
Marginal!table!
Synthetic!marginal!table!
Parental!education!
No!college!
College!
Total!
Mean!level!of!skill!
No!
college!
College!
Total!
Mean!level!of!skill!
Low!
1,303!
1,718!
3,021!
-0.008!
1,175!
1,846!
3,021!
0.412!
High!
474!
1,505!
1,979!
1.017!
595!
1,384!
1,979!
0.410!
TOTAL!
1,777!
3,223!
5,000!
0.398!
1,770!
3,230!
5,000!
0.411!
Mean!level!of!skill!!
-0.827!
1.073!
0.398!
ME:19.18%RR:1.34%.OR:%2.41!
-0.807!
1.053!
0.411!
.%ME:8.8%RR:1.14%.OR:%1.48!
Note:!Simulated! data.!The! counterfactual!table!was!created!using! weights!that! balances!skill! level!in! high!school,! so!as!to!create!a!counterfactual! data!set!in!which!skill!level!and!parental!education!are!unrelated.!
Skills!level!is!equally!distributed!across!individuals!from!high!and!low!parental!background.!
!
Table!11:!Marginal!and!synthetic!marginal!table!linking!parental!education!and!children's!education!(weighting!additionally!for!gender)!–!row!
percentages!for!Cohort!A!
!
Marginal!table!
!
Synthetic!marginal!table!!
Parental!education!
No!college!
College!
Total!
Mean!level!of!skill!
No!college!
College!
Total!
Mean!level!of!skill!
Low!
43.13!
56.87!
100.00!
-0.008!
38.90!
61.10!
100.00!
0.412!
High!
23.95!
76.05!
100.00!
1.017!
30.08!
69.92!
100.00!
0.410!
TOTAL!
35.54!
64.46!
100.00!
0.398!
35.41!
64.59!
100.00!
0.411!
Mean!level!of!skill!
-0.827!
1.073!
0.398!
ME:19.18%RR:1.34%.OR:%2.41%
-0.805!
1.050!
0.411!
ME:8.8%RR:1.14%.OR:%1.48!
Note:!Simulated!data.!The!synthetic!table!was!created!using!weights!that!balances!skill!level!in!high!school,!so!as!to!create!a!synthetic!data!set!in!which!skill!level!and!parental!education!are!unrelated.!Skills!level!is!
equally!distributed!across!individuals!from!high!and!low!parental!background.!
... These plots, which are derived from the same sequential logit model, are not affected by this identification problem and provide an easily interpretable picture of cohort trends in educational expansion as well as absolute educational inequality (Blossfeld, Blossfeld, and Blossfeld 2015). In combination, these two metrics provide a comprehensive picture of between-group changes in IEO across cohorts (Kröger and Skopek 2017). ...
... It is likely that the later cohort is more negatively selected on unobserved characteristics, such as geographic isolation. In other words, when a particular level of education nears universal attendance, the dropouts become an increasingly stratified group (Kröger and Skopek 2017) (this is the inverse of the 'survivor bias' in higher educational transitions described by Mare (1981) and others, discussed in more detail below). ...
Preprint
Full-text available
This study looks at educational inequality in China, a country that has greatly expanded access to education in recent decades. It uses a sequential logit model to study the changing impact of family background on educational transitions and educational attainment, comparing birth cohorts that completed their schooling during different stages of the market transition process. Data are derived from the China Family Panel Studies (CFPS), a large and nationally representative household survey that provides detailed retrospective information. The findings show that in reform-era China educational inequality has increased despite large-scale educational expansion. Since the onset of the market reforms the importance of social origin has continuously increased, particularly at the crucial transition to senior high school. I suggest that the resulting pattern of expanding inequality can be explained by a combination of market-based educational reforms, increasing returns to education and massive increases in wider social and economic inequality.
... In fact such a calculation would constitute on average a conservative estimate of the indirect effect (Breen et al. 2018). Therefore, we recalculated the indirect effects for the total sample based on an inverse probability weighting approach (IPW) that takes the difference in scale into account and allows a direct comparison (Cole and Hernán 2004;Kröger and Skopek 2017;Van der Weele 2009). This is done as a robustness check for our comparisons of total effects and net effects. ...
Article
Full-text available
Differences in mortality between groups with different socioeconomic positions (SEP) are well-established, but the relative contribution of different SEP measures is unclear. This study compares the correlation between three SEP dimensions and mortality, and investigates differences between gender and age groups (35–59 vs. 60–84). We use an 11% random sample with an 80% oversample of deaths from the Finnish population with information on education, occupational class, individual income, and mortality (n = 496,658; 274,316 deaths between 1995 and 2007). We estimate bivariate and multivariate Cox proportional hazard models and population attributable fractions. The total effects of education are substantially mediated by occupation and income, and the effects of occupation is mediated by income. All dimensions have their own net effect on mortality, but income shows the steepest mortality gradient (HR 1.78, lowest vs. highest quintile). Income is more important for men and occupational class more important among elderly women. Mortality inequalities are generally smaller in older ages, but the relative importance of income increases. In health inequality studies, the use of only one SEP indicator functions well as a broad marker of SEP. However, only analyses of multiple dimensions allow insights into social mechanisms and how they differ between population subgroups.
Article
Full-text available
This study looks at educational inequality in China, a country that has greatly expanded access to education in recent decades. It uses a sequential logit model to study the changing impact of family background on educational transitions, comparing birth cohorts that completed their schooling during different stages of the market transition process. Data are derived from the China Family Panel Studies (CFPS), a large and nationally representative household survey that provides detailed retrospective information. The findings show that educational inequality in reform-era China followed a pattern of maximally maintained inequality. Although educational expansion diminished disparities in obtaining basic education, inequality persisted or even increased in the more advanced levels, especially at the crucial transition to senior high school. Inequalities only started to decrease for the most recent cohorts, when higher-level transitions became almost universal among high-status groups. These findings can be explained by the nature of China’s economic and educational policies, which heavily favoured urban residents and other privileged groups.
Chapter
The chapter provides information on the data and methodology of the study. It starts with a brief justification of the study’s comparative perspective on countries, topics, and empirical material. This study focuses on four EU member states (at the time of investigation): Austria, Germany, Ireland, and the United Kingdom. Plenary debates on the EU Constitutional Treaty, the Treaty of Lisbon, and the first Eurozone bailout mechanism, the European Financial Stability Facility (EFSF) are analyzed as diverse, critical decision cases. Interviews with members of the European Affairs Committees and Budget Committees provide insights into comprehensive representative patterns. We describe the two methods of data collection, namely the Representative Claims Analysis (RCA) and the semi-structured qualitative interviews. It clarifies their functions within the overall research design and discusses sampling of debates and interview partners as well as the coding procedure and interview questionnaire. This chapter may be particularly useful for teaching purposes as it provides an excellent example of how to combine both quantitative and qualitative methods of data collection and analysis in a comparative case study design. Given its brevity and focus on the key methodological decisions, it is at the same time essential for readers to judge the scope and quality of the study’s empirical findings.
Article
Full-text available
According to the cumulative advantage hypothesis, health gaps between socioeconomic groups widen with age. In the United States, studies have supported this hypothesis. Outside this context, evidence remains scarce. The present study tests the cumulative advantage hypothesis in Sweden – a society that contrasts sharply with the United States in terms of policies designed to reduce social disparities in health-related resources. I draw on longitudinal data from the Swedish Level of Living Survey (N = 9,412 person-years), spanning the period between 1991 and 2010. The results show that gaps in self-rated health increase from early to middle adulthood. This applies to differences between educational groups and between occupational classes. In older age, health gaps remain constant. Cross-cohort analyses reveal a rising importance of cumulative advantage between educational groups, but not between occupational classes. I conclude that the forces of accumulation prevail even in one of the most egalitarian welfare states.
Article
Full-text available
The main goal is to explore institutional stratification within higher education in a comparative perspective, and its relationship with social inequality. In the first part, the theoretical framework is developed connecting theories on inequality in education and labour market, and adapting them to the higher education context. In the second part, data from the REFLEX survey on recent tertiary graduates in 11 European countries are used to assess whether social origin affects the type of tertiary education attained. Consistently with the hypotheses, parental education strongly affects the probability of graduation in a long programme, but not the transition to a PhD course. In most countries, parental education is positively related with graduation in a top institution and a prestigious field of study. The gross effect of parental education is reduced, but still significant, when controlling for previous school achievement. At the end, it is shown that vertical and horizontal gross inequalities are stronger in those countries with a higher proportion of tertiary graduates (a proxy for competition among graduates in the labour market) and where the institutional differentiation is more relevant for graduates' occupational outcomes.
Article
Full-text available
Most discussions of ordinal variables in the sociological literature debate the suitability of linear regression and structural equation methods when some variables are ordinal. Largely ignored in these discussions are methods for ordinal variables that are natural extensions of probit and logit models for dichotomous variables. If ordinal variables are discrete realizations of unmeasured continuous variables, these methods allow one to include ordinal dependent and independent variables into structural equation models in a way that (1) explicitly recognizes their ordinality, (2) avoids arbitrary assumptions about their scale, and (3) allows for analysis of continuous, dichotomous, and ordinal variables within a common statistical framework. These models rely on assumed probability distributions of the continuous variables that underly the observed ordinal variables, but these assumptions are testable. The models can be estimated using a number of commonly used statistical programs. As is illustrated by an empirical example, ordered probit and logit models, like their dichotomous counterparts, take account of the ceiling and floor restrictions on models that include ordinal variables, whereas the linear regression model does not.
Article
The present paper reviews the development of life course epidemiology since its origins during the 1990s from biological programming, birth cohort research and the study of health inequalities. Methods of studying the life course are examined, including birth cohort studies, linked register datasets and epidemiological archaeology. Three models of life course epidemiology are described: critical periods, accumulation, and pathways. Their conceptual and empirical differentiation can be difficult, but it is argued that accumulation is the underlying social process driving life course trajectories, while the critical period and pathway models are distinguished by their concern with specific types of aetiological process. Among the advantages of the accumulation model are predictive power, aetiological insights, contributions to health inequality debates and social policy implications. It is emphasised that the life course approach is not opposed to, or an alternative to, a concern with cross-sectional and current effects; major social disruption can have a large and immediate impact on health. Other limitations of the life course approach include a spectrum of impact (life course effects can be strong in relation to physiology, but often are weaker in relation to behaviour and psychological reactions to everyday life) and, more speculatively, the possibility that life course effects are diluted in the older age groups where morbidity and mortality are highest. Three issues for the future of life course epidemiology are identified. Many life course data are collected retrospectively. We need to know which items of information are recalled with what degree of accuracy over how many decades; and what methods of collecting these retrospective data maximise accuracy and duration. Second, the two partners in life course research need to take more seriously each other's disciplines. Social scientists need to be more critical of such measures as self-assessed health, which lacks an aetiology and hence biological plausibility. Natural scientists need to be more critical of such concepts as socio-economic status, which lacks social plausibility because it fails to distinguish between social location and social prestige. Finally, European comparative studies can play an important part in the future development of life course epidemiology if they build on the emerging infrastructure of European comparative research
Article
Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. This fact has important implications that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model. In addition, we cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them.
Article
Although the parameters of logit and probit and other nonlinear probability models (NLPMs) are often explained and interpreted in relation to the regression coefficients of an underlying linear latent variable model, we argue that they may also be usefully interpreted in terms of the correlations between the dependent variable of the latent variable model and its predictor variables. We show how this correlation can be derived from the parameters of NLPMs, develop tests for the statistical significance of the derived correlation, and illustrate its usefulness in two applications. Under certain circumstances, which we explain, the derived correlation provides a way of overcoming the problems inherent in cross-sample comparisons of the parameters of NLPMs.
Article
This article offers a formal identification analysis of the problem in comparing coefficients from linear probability models (LPM) between groups. We show that differences in coefficients from these models can result not only from genuine differences in effects, but also from differences in one or more of the following three components: outcome truncation, scale parameters and distributional shape of the predictor variable. These results point to limitations in using LPM coefficients for group comparisons. We also provide Monte Carlo simulations and real examples to illustrate these limitations, and we suggest a restricted approach to using LPM coefficients in-group comparisons.
Article
Multidimensional contingency tables can be summed over factors, without affecting the log‐linear parameters describing interactions of other factors, under less restrictive conditions than generally recognized. Examples are given that contradict a theorem of Bishop, Fienberg and Holland (1975). Necessary and sufficient conditions for collapsibility are given, and an example applying the results to categorical data is presented.