Invited Commentary: Simple Models for a Complicated Reality

Boston University, Boston, Massachusetts, United States
American Journal of Epidemiology (Impact Factor: 5.23). 09/2006; 164(4):312-4; discussion 315-6. DOI: 10.1093/aje/kwj238
Source: PubMed


Available from: Enrique Schisterman, Feb 12, 2016
Invited Commentary
Invited Commentary: Simple Models for a Complicated Reality
Enrique F. Schisterman
and Sonia Herna
Epidemiology Branch, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD.
Department of Epidemiology, Harvard School of Public Health, Boston, MA.
Slone Epidemiology Center, Boston University, Boston, MA.
Received for publication February 27, 2006; accepted for publication March 6, 2006.
The renowned statistician George P. Box famously said
that all models are wrong, but some are useful. Far from an
indictment of statistical models, Box’s statement can be
taken to mean that even when complex realities are not
exactly represented by simple fitted models, much can be
learned. The paper by Basso et al. (1) in this issue of the
Journal provides an opportunity to consider the costs and
benefits that arise from the simplification necessary for gen-
erating statistical models of complex biologic processes.
Considering the relation among birth weight, mortality,
and third factors, Basso et al. postulate that birth weight is
not itself on the causal path to mortality; rather, the relation
between birth weight and mortality might be explained by
a confounding factor. The authors conclude that, to produce
the observed inverse J shape of the birth-weight-specific
mortality curve, the putative confounding factors (matrix
X ¼ (X
and X
)) must be very rare and have very large
In reducing complex situations to simple models, assump-
tions are made, especially when modeling a biologic process.
For example, parametric models make assumptions regard-
ing distributions. The flexibility of the models is limited by
that of the assumptions on which it depends. Basso et al.s
model makes the following assumptions: 1) birth weight
follows a Gaussian distribution, 2) there is a uniform effect
of the confounding factors X
and X
on mortality, 3) birth
weight does not cause neonatal mortality, and 4) there is no
interaction between factors X
and X
and birth weight.
Notably, some of these assumptions are interrelated, and a
change in one might affect the others.
As previously stated, consideration of the potential limi-
tations of this model requires further attention to the assump-
tions on which it is based. Let us review each of these
assumptions regarding their substance, the possible impact
on the findings if they are violated, and whether they seem
Gaussian distribution for birth weight
This assumption requires that birth weight follow a Nor-
mal distribution at all strata of the confounding factor. The
hypothetical birth weights in the left tail of this distribution
may be regarded with some skepticism because of their
questionable compatibility with viability. If the left tail is
indeed truncated, the Gaussian birth-weight assumption will
be violated. Moreover, a confounding factor might increase
the proportion of low-birth-weight babies without shifting
the whole birth-weight distribution, resulting in a skewed
distribution within that stratum. However, given the rela-
tively low prevalence of fetal-growth-restricted babies, ma-
jor deviations from a Gaussian distribution are unusual in
real life. Additionally, a small violation of this assumption
will have little impact on the shape of the overall association
between birth weight and neonatal mortality.
Uniform effect of risk factors
This assumption states that all babies exposed to the pu-
tative confounding factor X are assumed to have identical
shifts in birth weight and identical elevation of their mor-
tality risk. In their paper, Basso et al. acknowledge that this
assumption is unlikely to be true. However, a modification
of this assumption seems to entail minor changes on the
noncausal link between birth weight and neonatal mortality.
The authors refined the model by substituting distributions
for the constant effects and obtained similar results.
Correspondence to Dr. Enrique F. Schisterman, Division of Epidemiology, Statistics and Prevention Research, NICHD, NIH, 6100 Executive
Boulevard, 7B03, Rockville, MD 20852 (e-mail:
312 Am J Epidemiol 2006;164:312–314
American Journal of Epidemiology
ª 2006 by the Johns Hopkins Bloomberg School of Public Health
All rights reserved; printed in U.S.A.
Vol. 164, No. 4
DOI: 10.1093/aje/kwj238
Advance Access publication July 17, 2006
by guest on January 13, 2016 from
Page 1
Noncausal effect of birth weight on neonatal mortality
This assumption states that neonatal mortality is indepen-
dent of birth weight conditional on the confounding factor
that links them or, stated more simply, that birth weight in
no way has any effect on neonatal mortality. In probability
terms, this assumption implies conditional independence
and can be rewritten as
PrðNM ¼ 0jBW ¼ bw; X ¼ xÞ
¼ PrðNM ¼ 0jX ¼ xÞ¼f ðxÞ; ð1Þ
where NM ¼ 0,1 indicates neonatal mortality status, BW is
birth weight and X is the unmeasured confounder, and f(x)is
a function that depends only on a value x of factor X. De-
parture from this assumption could lead to different conclu-
sions. As soon as the model allows for a causal effect of
birth weight on neonatal mortality, the risk factors (X
, following the notation of Basso et al.) no longer need to
be rare or have large effects.
Considering equation 1, what are the conditions and char-
acteristics of the putative confounding factor, X, that would
give rise to the observed shape of the curve between birth
weight and neonatal mortality? Such a factor must comply
with the following equality that depicts the joint probability
of neonatal mortality and birth weight so that it results in the
inverse J-shaped curves, with form
PrðNM ¼ 0; BW ¼ bwÞ
f ðxÞPrðBW ¼ bwjX ¼ xÞF
ðdxÞ: ð2Þ
It is worth noting that, from such conditions, the strength
of the effect of the factor X on birth weight plays a crucial
role in determining the shape of the crude association be-
tween birth weight and neonatal mortality under conditional
independence assumptions. Given these circumstances,
Basso et al. show that the confounding factor X cannot be
smoking, or any other factor with similar properties, be-
cause it neither is sufficiently rare nor conveys sufficient
risk to satisfy the conditions set forth by equation 2.
However, if one allows for a departure from the condi-
tional independence assumption, the added flexibility allows
for a very large range of circumstances and attendant con-
clusions. When a causal effect between birth weight and
neonatal mortality is assumed to be possible, the unmea-
sured confounder X is not constrained to be rare or have
large effects, and the following applies:
f ðbw; xÞ¼PrðNM ¼ 0jBW ¼ bw; X ¼ xÞ
PrðNM ¼ 0jX ¼ xÞ: ð3Þ
As a result, the equality in equation 2 becomes
PrðNM ¼ 0; BW ¼ bwÞ
f ðx; bwÞPrðBW ¼ bwjX ¼ xÞF
ðdxÞ: ð4Þ
Adding birth weight to function f creates a large number of
alternative routes to solving the equality represented by
equation 4. Specifically, nonlinear relations and/or interac-
tions could lead to the observed shape of the birth weight–
neonatal mortality curve. Actually, under the relaxed
assumption, an infinity of models is possible, including that
in which the J-shaped pattern is completely explained by the
effect of birth weight on mortality.
FIGURE 1. Neonatal mortality (log scale) per 10,000 births per year as a function of the interaction between birth weight and smoking. Left: birth-
weight density by smoking status; right: neonatal mortality and the interaction between birth weight and smoking.
Simple Models for a Complicated Reality 313
Am J Epidemiol 2006;164:312–314
by guest on January 13, 2016 from
Page 2
Lack of interaction between factors X
or X
birth weight
In reaching their conclusions, Basso et al. assume no in-
teractive effect between risk factors and fetal growth on
neonatal mortality. However, in seeking explanations for
the observed shape of the birth-weight curves, there are
alternatives that merit consideration as well. We simulated
an interactive effect between a common factor and fetal
growth. To do so, we had to relax another model assumption
by allowing a causal effect of fetal growth on neonatal mor-
tality. We were able to generate curves that were similar to
the empirically observed data, as shown in figure 1. This
provides an alternative to the proposed rare exposure with
extreme effects on mortality.
There is biologic plausibility to both scenarios—that of
no direct effect from birth weight to fetal growth and that of
at least a small direct effect with or without interactions. In
addition, both scenarios have similar practical implications.
Under the one presented by Basso et al., the putative ‘‘rare’
exposures will be difficult to uncover, although perhaps we
could identify them among the causes of death attributed to
low-birth-weight babies on death certificates. Unfortunately,
the same is true under the alternative scenario. If an inter-
action between fetal growth and a common factor is respon-
sible for the observed association, it would be very grueling
to uncover the true responsible interactive and perhaps non-
linear factor.
After considering the assumptions of Basso et al., several
questions merit deliberation to determine the future course
of research. First, is the assumption of no causal link be-
tween birth weight and neonatal mortality correct? If so,
Basso et al. have shown us that the shape of the birth-
weight-specific mortality curve might result from the pres-
ence of very rare confounders with very large effects. This
finding suggests a search for the proverbial needle in the
haystack. Second, if a causal link does exist, what is the
nature of this link? If we relax the model to allow a causal
link, the inverse J pattern of birth-weight-specific mortality
could be also explained by a range of more common and
weaker confounders that might have nonlinear effects and
might interact with other risk factors. That is, speculations
under the relaxed assumption basically take us to square
one, where, if there is a needle in the haystack, we would
have to accidentally sit on it to find it. Interestingly, perhaps
we have to conclude that unrestricted models are correct, but
often useless.
We have shown how the model used by Basso et al. may
fail because of violations of the assumptions on which it is
based. However, we should heed the lessons of their simple
model, which opens up the appealing possibility of no direct
causal effect between birth weight and neonatal mortality
and shows how, in this context, the putative confounder
would have to be rare and extremely strong. From a meth-
odological point of view, their findings should change the
way we consider birth-weight data when evaluating the ef-
fect of other risk factors on perinatal outcomes (2). More
importantly, if they are right and no causal link exists, our
research efforts to reduce neonatal mortality would be well
advised to shift away from birth weight and toward direct
causes of infant mortality and morbidity and toward broad-
ening our efforts to acknowledge this possibility.
Supported by the Intramural Research Program of the
National Institutes of Health, National Institute of Child
Health and Human Development.
Conflict of interest: none declared.
1. Basso O, Wilcox AJ, Weinberg CR. Birth weight and mortal-
ity: causality or confounding? Am J Epidemiol 2006;164:
2. Herna
az S, Schisterman EF, Herna
n MA. The birth
weight paradox uncovered? Am J Epidemiol (in press).
314 Schisterman and Herna
Am J Epidemiol 2006;164:312–314
by guest on January 13, 2016 from
Page 3
  • Source
    Preview · Article · Jul 2006 · American Journal of Epidemiology
  • [Show abstract] [Hide abstract] ABSTRACT: We developed and evaluated a structural model of the determinants of neonatal mortality in Hungary that embodies the causal mechanisms by which its proximate and indirect determinants--socio-economic, behavioural, and biological--are related. The statistical model used distinguishes between endogenous and exogenous variables and allows the causal effect of each to be correctly estimated. Unobserved variables are integrated into the model, which was tested using Hungarian data for the periods 1984-88 and 1994-98. The principal findings are as follows: weight at birth and duration of gestation are the most important of the (direct) causal determinants of neonatal mortality. Mother's age has an indirect and detrimental effect: when mothers are older than 30 years of age, the risk of lower birth weight or multiple births and, in consequence, neonatal mortality is increased. Father's age has no direct or indirect causal effect on neonatal mortality.
    No preview · Article · Apr 2008 · Population Studies
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: For decades, epidemiologists have observed that, among lower birth weight infants, higher risk infants have lower mortality rates than do lower risk infants. However, among higher birth weight infants, the pattern reverses, leading to a riddle of crossing birth weight-specific mortality curves. The riddle has been considered from different perspectives, including relative z scores, directed acyclic graphs, and, most recently, simulated mathematical models of underlying causal factors that produce the observed curves; similarly paradoxical gestational age-specific mortality curves uncross when calculations include all fetuses-at-risk rather than just infants delivered at a particular gestational age. However, researchers have generally focused on birth weight rather than gestational age, likely because birth weight is accurately measured and, if one assumes that birth weight is an intermediate variable between the underlying causal factors and mortality, is easier to model. Within the framework of existing analytical approaches, adding the complexity of a direct relation between gestational age and mortality, and possibly more complex relations among the casual factors, may be difficult. Nevertheless, duration of pregnancy seems a better proxy for the true construct of interest, whether the baby is mature enough to survive, so shifting attention to understanding the riddle of gestational age-specific mortality is encouraged.
    Full-text · Article · Mar 2009 · American journal of epidemiology
Show more