Content uploaded by (James) Niels Rosenquist
Author content
All content in this area was uploaded by (James) Niels Rosenquist on Feb 04, 2015
Content may be subject to copyright.
Cohort of birth modifies the association between FTO
genotype and BMI
James Niels Rosenquist
a,1
, Steven F. Lehrer
b,c
, A. James O’Malley
d
, Alan M. Zaslavsky
e
, Jordan W. Smoller
f
,
and Nicholas A. Christakis
g,h,i,j
a
Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114;
b
School of Policy Studies and Department of Economics, Queens University,
Kingston, Ontario, Canada K7L 3N6;
c
National Bureau of Economic Research USA, Cambridge, MA 02138;
d
The Dartmouth Institute for Health Policy
and Clinical Practice, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755;
e
Department of Health Care Policy, Harvard Medical School,
Boston, MA 02115;
f
Psychiatric and Neurodevelopmental Genetics Unit, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
02114;
g
Department of Sociology, Yale University, New Haven, CT 06520;
h
Department of Medicine, Yale University, New Haven, CT 06520;
i
Department of
Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520; and
j
Yale Institute for Network Science, Yale University, New Haven, CT 06520
Edited by Kenneth W. Wachter, University of California, Berkeley, CA, and approved November 11, 2014 (received for review June 25, 2014)
A substantial body of research has explored the relative roles of
genetic and environmental factors on phenotype expression in
humans. Recent research has also sought to identify gene–envi-
ronment (or g-by-e) interactions, with mixed success. One poten-
tial reason for these mixed results may relate to the fact that
genetic effects might be modified by changes in the environment
over time. For example, the noted rise of obesity in the United
States in the latter part of the 20th century might reflect an in-
teraction between genetic variation and changing environmental
conditions that together affect the penetrance of genetic influences.
To evaluate this hypothesis, we use longitudinal data from the
Framingham Heart Study collected over 30 y from a geographically
relatively localized sample to test whether the well-documented
association between the rs993609 variant of the FTO (fat mass and
obesity associated) gene and body mass index (BMI) varies across
birth cohorts, time period, and the lifecycle. Such cohort and pe-
riod effects integrate many potential environmental factors, and
this gene-by-environment analysis examines interactions with
both time-varying contemporaneous and historical environmental
influences. Using constrained linear age–period–cohort models
that include family controls, we find that there is a robust relation-
ship between birth cohort and the genotype–phenotype correla-
tion between the FTO risk allele and BMI, with an observed
inflection point for those born after 1942. These results suggest
genetic influences on complex traits like obesity can vary over
time, presumably because of global environmental changes that
modify allelic penetrance.
population genetics
|
obesity
|
birth cohort
The rise in obesity in the United States and other Western
countries is a major public health concern, and obesity is
known to have both genetic and environmental determinants (1–
3). Changes in the population distribution of body mass index
(BMI), a common measure of obesity, have attracted the at-
tention of researchers from disciplines across the health and
social sciences. Social scientists have attributed changes in obesity
to macroenvironmental developments, such as urban design, oc-
cupational shifts, dietary modifications, and social effects (4–10).
Many of these arguments are plausible and hold considerable in-
tuitive appeal. In parallel, research in the health sciences provides
significant evidence to suggest that genetic factors, notably the FTO
gene, play an important role in BMI over the lifespan (11–14).
Although these research studies were typically not designed to
assess interactions between genetic variants and environmental
factors, it is likely that environmental effects are modulated by
genetic pathways, causing some individuals or population groups to
be differentially affected by changes in the environment (7).
To date, gene–environment interaction studies have primarily
examined within-birth-cohort differences among individuals with
varying environmental exposures in a narrow time period (3).
The foregoing research design uses a cross-sectional approach to
sample environmental variation and focuses on whether the effects
of a single specific environmental variable (e.g., childhood mal-
treatment) with respect to some outcome (e.g., adult depression)
depend on a specific genetic polymorphism (15). This empirical
strategy has prompted some debate regarding its ability to detect
g-by-e effects (16, 17).
On the other hand, using between-birth-cohort differences is
different, allowing for the testing of hypotheses related to time-
varying changes in the whole of the environment affecting a
population. To our knowledge there have been no longitudinal pop-
ulation studies that seek to determine whether there are between-
birth-cohort differences in genotype–phenotype associations. Dis-
entangling the extent to which historical versus contemporaneous
environmental factors interact with genetic features, and how these
in turn differ from simple aging, can shed light on the mechanisms
underlying the rise in obesity (and similar phenomena).
Here, we extend the statistical approach used for decades by
epidemiologists and social scientists to understand temporal
trends in health outcomes. This approach, known as “age–
period–cohort analysis”(18), presumes that the patterns of obesity
rates across people of different ages at one point in time do not
solely reflect the physiological effects associated with aging but
also the accumulation of varied experiences over the lifecycle.
These experiences include external factors (such as technological
innovations or cultural changes) that influence multiple birth
cohorts simultaneously (albeit at different moments in their
lives)—known as “period effects”—but that also, in addition,
differentially affect specific groups of individuals born within the
same era—known as “cohort effects”. This distinction is impor-
tant because, for example, younger cohorts might be more likely
to either embrace new technologies and their corresponding
Significance
Our finding of a significant gene-by-birth-cohort interaction
adds a previously unidentified dimension to gene-by-environ-
ment interaction research, suggesting that global changes in
the environment over time can modify the penetrance of ge-
netic risk factors for diverse phenotypes. This result also sug-
gests that presence (or absence) of a genotype–phenotype
correlation may depend on the period of time study subjects
were born in, or the historical moment researchers conduct
their investigations.
Author contributions: J.N.R., S.F.L., and N.A.C. designed research; J.N.R., S.F.L., and A.J.O.
analyzed data; and J.N.R., S.F.L., A.J.O., A.M.Z., J.W.S., and N.A.C. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
1
To whom correspondence should be addressed. Email: JRosenquist@partners.org.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
1073/pnas.1411893111/-/DCSupplemental.
354–359
|
PNAS
|
January 13, 2015
|
vol. 112
|
no. 2 www.pnas.org/cgi/doi/10.1073/pnas.1411893111
modes of work and leisure or be exposed to a sophisticated
marketing campaign at more impressionable ages.
Our approach allows for differential responses to age, period,
and cohort factors depending on the genetic markers one carries,
thereby providing insights into the source of gene–environment
interactions. In addition, we use an estimation strategy that sta-
tistically determines the optimal breakpoint (if any) at which the
effects of the explanatory variables differ by genetic variant (13).
This allows us to directly examine the hypothesis that genetic
effects on a phenotype vary meaningfully according to the era of
birth of an individual (i.e., the specific cohort to which people
belong). Specifically, using a unique dataset, we test the hypoth-
esis that a particular genetic variant with an established associa-
tion with BMI may have differential influence on the phenotype
of BMI depending on when, exactly, an individual was born,
suggesting a gene-by-birth cohort (g-by-c) interaction.
To quantify the separate effects of age, period, and cohort
(APC) and their interactions with genetic variation, we analyze
longitudinal data from the Offspring Cohort of the Framingham
Heart Study (FHS) collected between 1971 and 2008 (www.
framinghamheartstudy.org/participants/offspring.php). To eval-
uate statistically which environmental or demographic factors
interact with rs9939609 to affect BMI, we estimate augmented
versions of age–period–cohort models. These models partition
the time-related variation in obesity to the three distinct sources.
Intuitively, age effects represent the influence of a person’s cur-
rent age on obesity, thereby reflecting biological and social
processes of maturation and aging internal to individuals. Pe-
riod effects represent temporal variations in obesity rates over
time affecting all age groups simultaneously and subsume
a complex set of historical events and environmental factors. In
our case, period is quantified as the subintervals of time captured
by the eight waves of data collection from 1971 through 2008.
Cohort effects represent differences in obesity across groups of
individuals born in different eras, implying that members of
a given group encounter the same historical and social events at
the same ages. Thus, to argue for a g-by-c interaction (the idea
that the genotype–phenotype relationship varies by era of birth),
it becomes necessary to show, through results and reasoned
arguments, that one of the other interactions is not confounding
our results. In this case, we argue that g-by-p (gene-by-period)
effects are minimal using empirical and circumstantial evidence.
Our main analyses (described in Materials and Methods)begin
with a simple descriptive analysis and then postulate a linear model
for associations between BMI of person iin family fat time twith
a particular age, period (i.e., wave), and cohort (YOB). That is,
BMIift =β0+β1ageift +β2waveit +β3YOBi+β4genei+β5Xift +μift;
[1]
where age and wave are a series of indicators for an individual’s
age in 5-y intervals when the measurement occurred, respec-
tively, and YOB is the year of birth. We also include a genetic
main effect for each genetic variant being investigated (gene),
controls for relevant covariates including sex (X), and μ
ift
, which
is a random error term with a mean of zero. This model makes
an assumption of stationarity by assuming the parameters βare
constant across APC. To address the research question posed
above, we first augment Eq. 1by interacting each of the key
variables with indicators for genotype (gene
i
), thus allowing for
differential coefficients by genotype. A nonzero interaction of
age, period, or cohort with the genetic factors would indicate
differential effects for individuals at a given age, in a different
period, or in a different cohort group, identification of which is
described in detail in Supporting Information, though it is impor-
tant to note that our identification is inherently constrained as in
any APC model due to collinearity. We used a previously de-
veloped estimator (19) to identify whether there is a change
point in the parameters that represents a discontinuity in the
genotype–phenotype relationship. By allowing the parameters
for YOB to undergo a structural shift in an unspecified year, this
allows us to test for a structural break of unknown timing. Our
approach assumes that birth cohort effects, as well any of their
interactions with genetic markers, are homogenous before and
after the year of the identified structural break, but allows the
effects to vary between the pre- and postbreak periods.
The main advantage of this approach is that we can conduct
specification tests to determine whether future research should
focus on genetic interactions with specific historical influences
(cohort effects) and/or contemporaneous influences (period
effects), and/or exposure accumulation (age effects). Our ap-
proach requires restrictions to be placed on two parameters of
the model because it is well known that no statistical model can
simultaneously estimate all of the linear APC effect parameters
in Eq. 1, given their collinearity (i.e., cohort =period −age). Thus,
we followed earlier research relating to identification of these
effects (detailed in Supporting Information) and used graphical data
describing the obesity trends by period, age, and cohort to establish
the choice of constraints for this model; and we investigated
whether the results were sensitive to the chosen constraints. Our
preferred estimates are obtained by selecting the first age and
period groups as the reference categories and also restricting any
linear birth cohort effect to be zero, allowing only for a nonlinear
effect of cohort. We argue that it is natural in our setting to set the
linear cohort effect to zero because, in a model with separable age
and time effects and only a linear cohort effect, we would only
observe parallel shifts of the cross-sectional age profiles over time.
This is unlikely to be the case for BMI, and we wish to observe how
these responses varied across genetic markers using the most
common genotype (TT) at rs9939609 as the reference category in
the underlying regression specifications.
By restricting the first age and period groups as well as the
most common genotype to be reference categories, we can
identify unique parameter estimates. The choice of which
restrictions that constrain any two specific APC variables to serve
as reference categories does affect the estimated coefficient
values and SEs. Unfortunately, there is no empirical method of
differentiating between alternative variables whose effects are
constrained because, irrespective of the restrictions, all esti-
mated models yield identical fits of the data. Thus, to investigate
the sensitivity of our estimated g-by-c effects, we conducted
numerous robustness exercises including (i) fixing alternative age
or period effects to be zero allowing for only a nonlinear cohort
effect, (ii) treating birth year as a continuous variable so that the
function of the cohort variable does not have a perfect linear
relationship with the discrete age and period effects we condition
upon, and (iii) constraining a set of parameters (i.e., the effect of
two age effects) to be equal. In general, these alternative models
placed different constraints that were also chosen using external
information on obesity prevalence over time. However, these
alternative models placed restrictions that were more difficult to
justify in our setting based on a graphical examination of our data
that showed rising rates of obesity both across time and age. That
said, our analyses led to identical findings of a significant g-by-c
interaction irrespective of the constraints and restrictions imposed.
Results
We first undertook a primarily descriptive analysis by reviewing
the average BMI in cells of a two-way table presented in Table 1.
Each cell denotes the age–period combination where the rows
represent categories of subject age and the columns define cate-
gories of year when the measurement was taken. The diagonal of
Table 1 (going from upper left to lower right) defines the patterns
of mean BMI for successive cohorts of the FHS Offspring sample
who were born together and hence age together. Looking across
rows, columns, and the diagonal, we generally see increased values
for BMI. For example, moving down each column, we document
Rosenquist et al. PNAS
|
January 13, 2015
|
vol. 112
|
no. 2
|
355
SOCIAL SCIENCESGENETICS
the well-established age profile that generally reflects rising BMI
over the lifecycle. The trajectories observed across waves and
lifecycle documented in Table 1 also justify setting the first age
and period categories as reference groups; and, looking across the
diagonal, there does not appear to be a linear relationship be-
tween BMI and cohort. This suggests that restricting the first age
and period groups to be reference categories is acceptable. Cau-
tion should be exercised in reaching any further conclusions from
this table, however, because it simply provides a general qualita-
tive impression about APC rate patterns and does not decompose
their separate effects. To more rigorously assess these effects we
use the methods described below.
Modeling birth year as a continuous variable, we find evidence
from estimates of the augmented version of Eq. 1of a significant
change in the relationship between FTO genetic variants and BMI
in the early 1940s (Table S1). That is, we use an estimation ap-
proach (Supporting Information) that finds the point at which the
genotypes have the greatest overall difference in their effect on
BMI between subgroups of the population born before and after
this threshold. The change points supported by estimating various
models ranged from 1942 to 1945. We chose 1942 as the change
point in further models that treated the YOB as a discrete variable,
but results were insensitive to alternative values from 1942 to 1945.
As shown in Fig. 1, mean BMI evolves over the lifecycle for
individuals with the same genotype, comparing the pre- and post-
1942 birth cohorts in the full dataset. However, mean BMI differs
across the three genotypes in the later birth cohort compared with
the pre-1942 cohort. The between-birth-cohort differences in
mean BMI are statistically significant (P<0.017) for individuals
with one or two of the risk (“A”)FTO allele, particularly during
early middle age. This difference (and the lack of difference be-
tween cohorts without the risk allele) suggests that differences
between BMI growth curves from different birth cohorts are
more pronounced among individuals carrying A alleles.
Table 2 presents estimates from our preferred specification of
the age–period–cohort regression models, allowing for differential
relationships between the genetic effects and BMI on the basis of
sex and APC variables (for details, see Materials and Methods).
Tests of the joint significance of regression parameter estimates
indicate a highly significant cohort-gene interaction [Fstatistic for
joint effects, F(2, 19,617) =17.51, P=2.54 ×10
−8
] controlling for
age–gene and period–gene interactions. This suggests that the
effect of FTO varies across cohorts or eras. More specifically, we
find a highly significant interaction between the post-1942 birth
cohort indicator and genotype, with the more efficient random
effects estimator (Supporting Information)showinginteractions
with both AA and AT genotypes compared with the TT genotype.
The results indicate that, among individuals in the cohort born
after 1942, the AA and AT genotypes are associated with an ad-
ditional average gain in BMI of 1.04 units [95% confidence in-
terval (CI) 0.15–2.03, P=0.023] and 1.14 units (95% CI 0.50–1.77,
P=0.0005), respectively, relative to individuals with the same
genotype born before 1942 (Table S2). Our results provide evi-
dence that only AA homozygosity is associated with a statistically
significant BMI difference for both cohorts born before and after
1942. Further, our estimates indicate that the AT genotype is
characterized by different rates of increase in BMI between
cohorts; and, for homozygous TT subjects, there was little change
in BMI across cohorts. Several of the period–genetic variant in-
teractions are individually statistically significant at conventional
levels, but they are jointly insignificant (F=0.59, P=0.69), sug-
gesting that these effects are likely to be artifacts of multiple testing.
In Figs. 2–4, we demonstrate that the age gradient in BMI
does not significantly differ for individuals with the TT genotype
across birth cohorts (Fig. 4). In contrast, we not only observe
a significantly different FTO–BMI relationship across ages for those
with the AT genotype, but the age gradient documented in Fig. 3
becomes steeper in the post-1942 cohort. Last, whereas the estimates
in Table 2 showed that individuals with the AA polymorphism had
significantly higher BMI in both the pre- and post-1942 cohorts, we
did not find a significant difference in the BMI age gradient between
cohorts (Fig. 2), although this may be due to low power resulting
from the smaller sample size. Taken together, the set of Figs. 2–4
illustrate that there is an age gradient across all genotypes, but it does
not point to an overall steepening of the age gradient. The results
continue to point out differences in the estimated relationships be-
tween those born before and after 1942, and, given our sample size, it
would not be surprising if, with additional data, we would see the
observed difference in the BMI age gradient for the AA genotype
become statistically significant. Last, we note that the statistically
significant differences in BMI between and within birth cohorts on
the basis of genotype do not arise due to the specification of our
linear model and are also observed when simply comparing the
unconditional sample means of BMI across genetic variant, birth
cohorts, and 5-y age intervals (as reported in Table S3).
We conducted several robustness exercises that exploit the fa-
milial structure of the FHS data by estimating a further augmented
age–period–cohort model that incorporates family-specific un-
observed heterogeneity through random effects, as suggested in ref.
20 (Tables S1,S2,andS4). This allows us to control for family
effects shared by siblings, including childhood diet and other
Table 1. Average BMI by subject age and period measured in the full sample used in our estimation
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5 Wave 6 Wave 7 Wave 8
Age, years 30 Aug 1971 26 Jan 1975 20 Dec 1983 22 Apr 1987 23 Jan 1991 26 Jan 1995 11 Sep 1998 10 Mar 2005
27–29.99 24.36 24.37 24.57 25.12 30.38
30–34.99 24.74 24.26 25.08 26.05 26.53 26.80 22.41
35–39.99 25.44 25.07 25.19 25.41 26.64 28.13 28.99
40–44.99 25.83 25.68 25.86 26.31 26.39 27.80 28.79 29.97
45–49.99 26.09 26.05 26.55 26.90 27.13 27.40 27.81 29.19
50–54.99 26.27 26.50 26.52 27.48 27.71 27.98 27.72 28.65
55–59.99 26.38 26.28 26.82 27.16 27.79 28.55 28.59 28.41
60–63 28.13 26.45 26.67 27.15 27.77 28.00 28.73 28.60
Each cell contains the average BMI of individuals measured in the age and period denoted by the row and column and for the sample denoted by the panel.
45-50
35-40 40-45 50-55 55-60
24
25
26
27
28
29
B
M
I
Age
Fig. 1. BMI over the ages of 35–60 by birth cohort for AA, TT, and AT/TA
genotypes by general birth cohort (born before or during/after 1942).
356
|
www.pnas.org/cgi/doi/10.1073/pnas.1411893111 Rosenquist et al.
aspects of physical and social environment as well as similarities
of genetic endowment other than the target gene. In addition, in
Table S2, we consider alternative estimators for our preferred
model, and, in Table S4, we explore sex differences in the magni-
tude and statistical significance of the interaction of birth cohort
and genotype with BMI by testing sex differences in sample means
and in coefficients of sex-stratified regression models. Consistent
with previous studies, our longitudinal family fixed-effect model
(Table S1) finds a significant main effect for rs9939609 both for
AA and AT genotypes indicating an average increase of 0.88 (95%
CI 0.26–1.50, P=0.006) and 0.49 (95% CI 0.075–0.93, P=0.017)
units of BMI, respectively, relative to those with the TT genotype.
Discussion
Our results suggest that the well-documented rise in BMI in the
United States over the past 40 y may have been disproportion-
ately driven by individuals for whom genetic factors interacted
with environmental changes encountered in their development
due to their era of birth—in this case, being born later. Although
our approach, by its nature, cannot ever rule out a g-by-p in-
teraction, tests of joint significance of these interactions (F=
0.59, Pvalue =0.69) are fairly suggestive of a minimal g-by-p
contribution, holding all else constant. Furthermore, the lack of
any g-by-p findings over the time period studied, and the fact that
our study focused on adults (who, according to previous research,
have already incorporated differential genetic contributions
to BMI) (1, 21–25), all provide strong suggestive evidence of
limited g-by-p influence on our results.
Our results also help to disentangle the impact(s) of FTO ge-
notype, age, and generational environment on BMI. As discussed
above, previous GWAS (genome-wide association studies) and
g-by-e work has generally examined interactions of genotype with
a specific environmental change or attributed all changes in phe-
notype to changes in environment, assuming that genotype effects
did not change in the period studied. However, such analyses do
not make it possible to distinguish effects of contemporaneous and
lifetime environmental shocks as well as maturation effects, a
limitation of single birth cohort and cross sectional studies.
More generally, these findings raise the possibility that genetic
associations may differ across birth cohorts due to variation in
prevailing environmental contexts. If so, a genetic association
detected by a gene-by-environment (g-by-e) study performed to-
day might not be detectable in future generations. Conversely,
effects not seen at this time may appear as environmental changes
occur that affect entire populations. This general point could
certainly extend beyond the particular case of FTO and obe-
sity; and although the odds that a gene discovery effort would be
successful increase with larger sample sizes, the results of such
studies (and even their ability to detect a genotype–phenotype
relationship) may be influenced by the within-sample birth co-
hort distribution or the time when such research was undertaken
(26). The fact that allelic penetrance could vary across over time
(e.g., across birth cohorts) may have implications for the in-
terpretation of genetic risk data. This idea, that genetic effects
could vary by geographic or temporal context is somewhat self-
evident, yet has been relatively unexplored and raises the question
of whether some association results and genetic risk estimates may
be less stable than we might hope.
The concept of time-dependent genetic penetrance has been
raised in the past. The so-called thrifty-gene hypothesis suggested
that genetic variants selected for energy conservation have con-
tributed to increased obesity prevalence in modern environments
where food has become more plentiful, although recent empirical
tests of the hypothesis have not supported it (27, 28). This work
raises the question of whether broad environmental changes
might have differential impacts on the BMI of individuals based
on genotype. Many hypothesized environmental influences on the
rise in obesity did indeed occur after the early 1940s, including
technological advances reducing energy expenditure at work as
well as increases in the caloric content of processed foods (4),
whose effect may be experienced most strongly by individuals
whose tastes and habits would be influenced at a young age (1).
Although our work shows a general g-by-c effect, we do not
attempt to identify the particular environmental factor(s) whose
change(s) might be driving these results. Understanding which
specific historical influences alter the penetrance of genetic
variants across cohorts is beyond the scope here, but is an im-
portant avenue of research that is worth additional comment.
Because many of the environmental changes between birth
cohorts hypothesized to be responsible for the rise in obesity are
correlated over both time and geographic space, well-powered
studies will be required. Although other research designs, such as
natural experiments, can in principle help identify the particular
environmental factors that might interact with specific genotypes,
they require that the specific gene–environment interaction being
investigated not be confounded with other potential gene and
environment interactions (29–31). Implementing such an approach
would be challenging: spatial variation in the price of calories may
Table 2. Random effect estimates of factors influencing BMI
from a specification using discrete variables to indicate birth
cohort differences and their interactions with genetic factors
Explanatory variables Random effects estimates
Subject is male 1.641*** (0.146)
Age 30–34.99 0.477*** (0.174)
Age 35–39.99 0.608*** (0.174)
Age 40–44.99 1.011*** (0.188)
Age 45–49.99 1.199*** (0.212)
Age 50–54.99 1.231*** (0.238)
Age 55–59.99 1.272*** (0.269)
Age 60–63 1.229*** (0.300)
Subject was born after 1942 −1.360*** (0.280)
AA genotype 0.708* (0.398)
AT genotype −0.412 (0.282)
Born after 1942 by AA genotype 1.041** (0.459)
Born after 1942 by AT genotype 1.135*** (0.326)
Constant 24.01*** (0.250)
Observations 19,617
R
2
0.106
No. of individuals 3,720
Presented are the estimates of the age–period–cohort model where the co-
hort variable is treated as discrete. Each entry refers to the effect of the variable
listed in the first column on BMI holding all other factors constant. SEs are
presented in parentheses. Specifications also include gene-by-age (g-by-a)
interactions and the estimates of all other factors included in this model as well
as other estimators are presented in Table S2.SeeTable S6 for the calendar
time corresponding to examinations in each wave. Note that our main results
of birth cohort and genotype interactions are not sensitive to the method by
which the model was estimated. The following indicate statistical significance
of each explanatory variable: ***P<0.01, **P<0.05, and *P<0.1.
29
28
27
26
25
24
B
M
I
35-40 40-45 45-50 50-55 55-60
Age
Fig. 2. BMI over the ages of 35–60 by birth cohort for the AA–FTO genotype.
Rosenquist et al. PNAS
|
January 13, 2015
|
vol. 112
|
no. 2
|
357
SOCIAL SCIENCESGENETICS
be correlated with spatial variation in the rate of change in seden-
tary lifestyles or other environmental changes that have been hy-
pothesized to be linked with obesity. In addition, the large number
of potential g-by-e hypotheses creates a large number of testable
hypotheses, thereby reducing the statistical power of the study and
increasing the multiple-testing burden.
To overcome these challenges, we propose that future research
into these effects could estimate age–period–cohort models with
samples defined on the basis of geographic regions. Regional
environmental changes that track with regional differences in the
timing of breakpoints would be candidate mediators of g-by-c
effects. This approach would be well suited for other large-scale
longitudinal databases that are now beginning to genotype
subjects (32).
There are some notable limitations to our study. First, given
the unique nature of the FHS, it is not yet possible to find an
appropriate replication sample for the time period of birth years
studied and our genetic variant of interest, both of which would
be required to test the specific FTO–variant–birth-cohort in-
teraction results (33–37). The special circumstances of the FHS
with localized, longitudinal data over a large birth cohort range,
means that it would be hard to perform a traditional replication
study (16, 17). However, with the advent of more studies that
include genetic data in longitudinal samples, the conceptual
approach we are proposing, if not this particular finding, will
likely be testable in additional settings soon (32).
A second limitation of our study is that all of the observations
in our analyses were of adults; hence, we cannot examine critical
periods of growth and development where many environmental
factors particular to given birth cohorts may have been influential.
Because most evidence suggests that the genetic influences on
BMI heterogeneity are first seen in childhood and may relate to
food intake levels in that developmental period (1, 38–41), studies
of younger subjects may elucidate which particular environmental
influences might be interacting with genetic factors. Third, our
observation that the 95% confidence bands for those with the AA
genotype overlap between the two cohorts in Fig. 2 may reflect
limited power to detect an effect and/or the stronger relative im-
pact of birth-cohort-associated-factors on heterozygotes. However,
in addition to sample size differences, nonlinearity in the effects of
the A allele on BMI is also a possibility (42). Fourth, there re-
mains the possibility of sample selection bias arising from subjects
in the older cohort dying before the time when they would have
been genotyped, particularly if those who died were dispropor-
tionately heavier or of a certain genotype, although we saw no
evidence of this in measured attributes.
In sum, we have outlined what we believe to be a useful ap-
plication of age–period–cohort modeling to improve population
genetic research. Our findings are suggestive of a previously
unidentified factor to consider when assessing time trends in
obesity, as well as the interpretation of genetic association find-
ings more broadly. The phenotypic expression of individual-level
genetic variation and our ability to detect it may depend on
historical contingencies.
Materials and Methods
The FHS was initiated in 1948 when 5,209 people were enrolled in the original
cohort; since then, the study has come to be composed of four separate but
related populations. The Framingham Offspring Study began in 1971, consist-
ing of 5,124 individuals who represented the children of the original cohort
population and their spouses. Participants in the offspring study were given
physical examinations and detailed questionnaires at regular intervals starting
in 1972, with a total of eight waves completed through 2008. BMI was calcu-
lated from measured height and weight. Notably, the offspring cohort was
born over a 40-y period, with participants ranging in age from their teens to
their late 50s at the time of study onset in 1971. In addition to providing survey
and examination data, a large fraction of participants (73.0%, 3,742 individuals)
had their DNA genotyped using the 100KAffymetrix array (43). Genotypes at the
rs9939609 allele were extracted using PLINK (44) from data contained in the
Framingham SHARe database accessed through the dbgap system (www.
framinghamheartstudy.org/researchers/description-data/genetic-data.php).
For simplicity, we elected to focus attention on the rs9939609 polymorphism
although a large number of variants have been associated with BMI across large-
scale genome-wide studies (and/or been in strong linkage disequilibrium with
other FTO variants) (6). For example, in the large GIANT (Genome-wide In-
vestigation of Anthropomophic Traits) consortium (n=249,794), the less com-
mon A allele rs1558902 (in strong linkage disequilibrium with rs9939609 r
2
=
0.901) on the FTO gene was strongly associated with BMI (P=4.8 ×10
−120
)with
a per-allele change associated with an increase in BMI of 0.39 (7).
To minimize the possibility that the g-by-c effects would be capturing dif-
ferences in age ranges of the participants across cohorts, we focus our analyses on
observations between the ages of 27 and 63. That is, by excluding observations
collected during examinations when subjects were at younger and older ages,
we ensure that individuals who areunique to the earliest and latestcohorts (for
who we cannotuse as self-controls) respectively are removed fromthe analyses,
thereby mitigating potential bias from model misspecification (26). These
restrictions ensured that age is balanced between cohorts and brought the
sample size to 19,617 phenotypic observations regarding 3,720 individuals.
Summary statistics for the variables used in the regression analysis reported in
the main text and SI Materials and Methods and Tables S1–S5,S7,andS8 are
shown in Table S6. Although only 3,724 of 5,124 individuals in the FHS Offspring
sample were genotyped and not every subject attended each medical exami-
nation, χ
2
tests of differences in proportions indicate that neither specific
genotypes nor birth cohort were associated with missing data from our sample,
Χ
2
(2) =2.91 and P[X >Χ
2
(2) =0.23], reducing concerns about nonresponse.
We also compare the distribution of genetic variants for those born before
and after the identified structural breakpoint (of 1942) in the relationship
with BMI. Specifically, at the base of Table S5, we present evidence that
the differences in genetic variant association with BMI across cohorts
were not due to differences in sample characteristics before and after
1942 (26) (P=0.1550).
In motivating our specification of a modified age–period–cohort model, we
initially hypothesized that the significance of the association between the FTO
genetic variant and BMI may be significantly stronger for individuals born in
later years due to environmental changes in the United States following World
War II that influenced food availability, the overall levels of physical activity, and
other factors that could affect bodily metabolism, all previously noted in
a number of studies as potential modifiers of FTO expression (42, 45, 46).
Table S3 presents some descriptive evidence supporting a g-by-c effect. Each
entry corresponds to 5-y age-intervals of a person’s life and presents the
sample means of BMI across genetic variants and birth cohorts. Thus, partic-
ipants born in 1940 would have belonged to the 30–34.99 age group in 1974
45-50
35-40 40-45 50-55 55-60
24
25
26
27
28
29
B
M
I
Age
Fig. 3. BMI over the ages of 35–60 by birth cohort for AT/TA–FTO genotype.
45-50
35-40 40-45 50-55 55-60
24
25
26
27
28
B
M
I
Age
Fig. 4. BMI over the ages of 35–60 by birth cohort for the TT–FTO genotype.
358
|
www.pnas.org/cgi/doi/10.1073/pnas.1411893111 Rosenquist et al.
and the 40–44.99 age group in 1984. Within these g-by-a bins, we conducted
simple hypothesis tests to assess whether there were differences in BMI be-
tween the pre- and post-World War II cohorts. Table S3 presents evidence
that, unconditionally, there are statistically significant differences in BMI
between and within birth cohorts on the basis of genotype, particularly for
those with the risk allele.
Although tests of differences in means can be used to look at broad trends
over time, the participant’s age or commonly shared environmental changes
(such as the invention of television or a price shock in food) might also
trigger interactions if their impacts are modified by specific genetic variants.
The full specification of our modified age–period–cohort models, and
methods used to identify the separate effects where the cohort variable is
treated as either linear or continuous, is detailed in Supporting Information.
Our modified version of Eq. 1includes a full set of interactions with genetic
variants where the TT genotype is the reference category; this full set
of interactions is not considered in earlier, distinct age, period, or cohort
analyses of the evolution of obesity prevalence, although we have made
similar assumptions as those in prior studies (47). To reduce additional
concerns that we were restricting the relationship between the explanatory
variables (including age and period) and BMI to be linear, we converted all
of our data, including age, period of examination, era of birth, and genetic
variants, to indicator variables, coding responses as “1”if the characteristic
of the individual observation fell in that category, and “0”otherwise. By
generating the indicator variables in this way, we are reducing functional
form assumptions. We also used YOB as a continuous cohort variable with
a single linear term in some specifications.
Finally, allCIs and significance tests reported here accounted for correlations
over time due to repeated observations of the same individualor family group,
using a standard clustered robust variance estimator (48), and the errors are
assumed to be independently distributed across clusters and correlated within
clusters. Throughout, we did not impose any distributional assumptions
on μ
ift,
and we note that whereas the weighted least-squares estimates of
the random effects estimator were virtually identical to a maximum likelihood
estimator that imposes more structure on the data, both the ordinary least
squares and family fixed-effect estimates are identical to maximum likelihood
estimates where μ
ift
is assumed to be normally distributed.
ACKNOWLEDGMENTS. We thank David Cutler, Eliana Hechter, Heidi Williams,
and two anonymous reviewers for helpful comments. We also thank Peter
Treut and Emily Hau for assistance with data visualizations. This work was
supported by Grant P01-AG031093 from the National Institute on Aging and
the Social Sciences and Humanities Research Council (to S.F.L.). Funding for
SHARe Affymetrix genotyping was provided by National Heart, Lung, and
Blood Institute (NHLBI) Contract N02-HL-64278. The Framingham Heart Study
is conducted and supported by the NHLBI in collaboration with Boston Uni-
versity (Contract N01-HC-25195). Data were downloaded from NIH dbGap,
Project 780, with accession phs000153.SocialNetwork.v6.p5.c1.GRU and gen-
eral research use phs000153.SocialNetwork.v6.p5.c2.NPU.
1. Haberstick BC, et al. (2010) Stable genes and changing environments: Body mass index
across adolescence and young adulthood. Behav Genet 40(4):495–504.
2. Walley AJ, Asher JE, Froguel P (2009) The genetic contribution to non-syndromic
human obesity. Nat Rev Genet 10(7):431–442.
3. Qi L, Cho YA (2008) Gene-environment interaction and obesity. Nutr Rev 66(12):684–694.
4. Ogden CL, Flegal KM, Carroll MD, Johnson CL (2002) Prevalence and trends in over-
weight among US children and adolescents, 1999-2000. JAMA 288(14):1728–1732.
5. Currie J, Della Vigna S, Moretti E, Pathania V (2010) The effect of fast food restaurants
on obesity and weight gain. Am Econ J-Econ Polic 2(3):32–63.
6. Christakis NA, Fowler JH (2007) The spread of obesity in a large social network over 32
years. N Engl J Med 357(4):370–379.
7. Chang VW, Christakis NA (2005) Income inequality and weight status in US metro-
politan areas. Soc Sci Med 61(1):83–96.
8. Block JP, Christakis NA, O’Malley AJ, Subramanian SV (2011) Proximity to food es-
tablishments and body mass index in the Framingham Heart Study offspring cohort
over 30 years. Am J Epidemiol 174(10):1108–1114.
9. Olsen LW, Baker JL, Holst C, Sørensen TIA (2006) Birth cohort effect on the obesity
epidemic in Denmark. Epidemiology 17(3):292–295.
10. Finkelstein EA, Ruhm CJ, Kosa KM (2005) Economic causes and consequences of
obesity. Annu Rev Public Health 26(1):239–257.
11. Frayling TM, et al. (2007) A common variant in the FTO gene is associated with body
mass index and predisposesto childhood and adult obesity. Science316(5826):889–894.
12. Dina C, et al. (2007) Variation in FTO contributes to childhood obesity and severe
adult obesity. Nat Genet 39(6):724–726.
13. Fawcett KA, Barroso I (2010) The genetics of obesity: FTO leads the way. Trends Genet
26(6):266–274.
14. Speliotes EK, et al.; MAGIC; Procardis Consortium (2010) Association analyses of
249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet
42(11):937–948.
15. Caspi A, et al. (2005) Moderation of the effect of adolescent-onset cannabis use on adult
psychosis by a functional polymorphism in the catechol-O-methyltransferase gene: Longi-
tudinal evidence of a gene X environment interaction. Biol Psychiatry 57(10):1117–1127.
16. Duncan LE, Keller MC (2011) A criticalreview of the first 10 years of candidate gene-by-
environment interaction research in psychiatry. Am J Psychiatry 168(10):1041–1049.
17. Hewitt JK (2012) Editorial policy on candidate gene association and candidate gene-
by-environment interaction studies of complex traits. Behav Genet 42(1):1–2.
18. Yang Y, Land L K. C. (2013) Age-Period-Cohort Analysis: New Models, Methods, and
Empirical Applications (CRC Press, Boca Raton, FL).
19. Hansen BE (1999) Threshold effects in non-dynamic panels: Estimation, testing, and
inference. J Econom 93(2):345–368.
20. Fletcher JM, Lehrer SF (2011) Genetic lotteries within families. JHealthEcon30(4):647–659.
21. Karra E, et al. (2013) A link between FTO, ghrelin, and impaired brain food-cue re-
sponsivity. J Clin Invest 123(8):3539–3551.
22. Speakman JR, Rance KA, Johnstone AM (2008) Polymorphisms of the FTO gene are
associated with variation in energy intake, but not energy expenditure. Obesity
(Silver Spring) 16(8):1961–1965.
23. Cecil JE, Taven dale R, Watt P, Hetherington MM, Palmer CNA (2008) An obesity -
associated FTO gene variant and increased energy intake in children. N Engl J Med
359(24):2558–2566.
24. Wardle J, et al. (2008) Obesity associated genetic variation in FTO is associated with
diminished satiety. J Clin Endocrinol Metab 93(9):3640–3643.
25. Yang W, Kelly T, He J (2007) Genetic epidemiology of obesity. Epidemiol Rev 29(1):
49–61.
26. Lasky-Su J, et al. (2008) On the replication of genetic associations: Timing can be
everything! Am J Hum Genet 82(4):849–858.
27. Neel JV (1962) Diabetes mellitus: A “thrifty”genotype rendered detrimental by
“progress”?Am J Hum Genet 14:353–362.
28. Ayub Q, et al. (2014) Revisiting the thrifty gene hypothesis via 65 loci associated with
susceptibility to type 2 diabetes. Am J Hum Genet 94(2):176–185.
29. Ding W, Lehrer SF, Rosenquist JN, Audrain-McGovern J (2009) The impact of poor
health on academic performance: New evidence using genetic markers. J Health Econ
28(3):578–597.
30. Keller MC (2014) Gene ×environment interaction studies have not properly con-
trolled for potential confounders: The problem and the (simple) solution. Biol Psy-
chiatry 75(1):18–24.
31. Meaney MJ (2010) Epigenetics and the biological definition of gene x environment
interactions. Child Dev 81(1):41–79.
32. Juster FT, Suzman R (1995) An overview of the health and retirement study. J Hum
Resour 30:S7–S56.
33. Benjamin DJ, et al. (2012) The genetic architecture of economic and political pref-
erences. Proc Natl Acad Sci USA 109(21):8026–8031.
34. Benjamin DJ, et al. (2012) The promises and pitfalls of genoeconomics. Annu Rev Econ
4(1):627–662.
35. Beauchamp JP, Cesarini D, Johannesson M, et al. (2011) Molecular genetics and economics.
J Econ Perspect 25(4):57–82.
36. Chabris CF, et al. (2012) Most reported genetic associations with general intelligence
are probably false positives. Psychol Sci 23(11):1314–1323.
37. Davies G, et al. (2011) Genome-wide association studies establish that human in-
telligence is highly heritable and polygenic. Mol Psychiatry 16(10):996–1005.
38. Sovio U, et al.; Early Growth Genetics Consortium (2011) Association between com-
mon variation at the FTO locus and changes in body mass index from infancy to late
childhood: The complex nature of genetic association through growth and de-
velopment. PLoS Genet 7(2):e1001307.
39. Segal NL, Feng R, McGuire SA, Allison DB, Miller S (2009) Genetic and environmental
contributions to body mass index: Comparative analysis of monozygotic twins, di-
zygotic twins and same-age unrelated siblings. Int J Obes (Lond) 33(1):37–41.
40. Golding J, Pembrey M, Jones R; ALSPAC Study Team (2001) ALSPAC—the Avon
Longitudinal Study of Parents and Children. I. Study methodology. Paediatr Perinat
Epidemiol 15(1):74–87.
41. Yeo GSH, O’Rahilly S (2012) Uncovering the biology of FTO. Mol Metab 1(1-2):32–36.
42. Moffitt TE, Caspi A, Rutter M (2005) Strategy for investigating interactions between
measured genes and measured environments. Arch Gen Psychiatry 62(5):473–481.
43. Cupples LA, et al. (2007) The Framingham Heart Study 100K SNP genome-wide as-
sociation study resource: Overview of 17 phenotype working group reports. BMC
Med Genet 8(Suppl 1):S1.
44. Purcell S, et al. (2007) PLINK: A tool set for whole-genome association and pop-
ulation-based linkage analyses. Am J Hum Genet 81(3):559–575.
45. Blanchflower DG, Oswald AJ, Van Landeghem B (2010) Imitative Obesity and Relative
Utility. J Eur Econ Assoc 7(2-3):528–538.
46. Boardman JD, Saint Onge JM, Haberstick BC, Timberlake DS, Hewitt JK (2008) Do
schools moderate the genetic determinants of smoking? Behav Genet 38(3):234–246.
47. Reither EN, Hauser RM, Yang Y (2009) Do birth cohorts matter? Age-period-cohort
analyses of the obesity epidemic in the United States. Soc Sci Med 69(10):1439–1448.
48. Kloek T (1981) OLS estimation in a model where a microvariable is explained by ag-
gregates and contemporaneous disturbances are equicorrelated. Econometrica 49(1):
205–207.
Rosenquist et al. PNAS
|
January 13, 2015
|
vol. 112
|
no. 2
|
359
SOCIAL SCIENCESGENETICS
Supporting Information
Rosenquist et al. 10.1073/pnas.1411893111
SI Materials and Methods
Our main results are obtained by estimating age–period–cohort
models, one of the key models used by epidemiologists and social
scientists in the quantitative analysis of social change. A large
literature going back to the 1970s has examined the problem of
identification in these models (1–3) because it is well known that
age (years since birth), period (current year), and cohort (YOB)
are collinear with each other because age =period −cohort.
Intuitively, it would be impossible to observe two individuals at
the same point in time that have the same age but were born at
different dates. In our analysis, we treated the cohort variable as
both continuous and discrete, and we discuss how we achieve
identification in both of these specifications of the model below.
Models Treating Birth Year as Continuous. We begin by estimating
multivariate regression models using the estimator proposed in
ref. 4 that extends the threshold regression to a static panel data
structure. The threshold regression determines if there is a
unique breakpoint at which there is a permanent structural
change in the relationship between the specific genotypes of the
FTO gene (rs9939609) and BMI. That is, these models can be
used to determine the set of threshold YOBs at which there are
important changes in the relationships between BMI and FTO
genotypes. This threshold is chosen based on the minimization of
the concentrated sum of squared errors, and we impose the
constraint that there must be at least 5% of observations lying on
both sides of the breakpoint. Ignoring this constraint did not
change our main results identifying the main breakpoint at 1942,
but it offers substantial computational advantages by reducing
the search over all possible breakpoints. Intuitively, the threshold
regression model with a single breakpoint can be viewed as se-
lecting the regression that provided the best fit to the data from
the set of all regressions which only differ by the selection of
birth year as the breakpoint. That is, we define YOB_Threshold
as the birth year that is selected as the breakpoint and estimate
the following equation:
BMIift =β0+β1ageift +β2wavet+β3YOBi+β4genei
+β5Xift +β6ðgenei×sexiÞ+β7genei×ageift
+β8ðgenei×1fYOBi≥YOB_ThresholdgÞ
+β9ðwavet×geneiÞ+μift;
[S1]
where
•BMI
ift
is the BMI of person iin family fat time t;
•YOB is the year-of-birth indicator variable if the individual
was born during or after the year in which a structural break is
determined, henceforth referred to as the threshold year;
•1{YOB ≥YOB_Threshold} is an indicator for whether the
individual was born following the threshold year.
•wave is a series of indicators for when the measurement oc-
curred (eight waves);
•age is a series of indicators for an individual’s age in 5-y
intervals;
•gene can represent a vector of discrete indicators for poly-
morphisms of the gene being investigated (although in this
case we are looking at only the FTO rs9939609 SNP);
•X is a vector of exogenous attributes including sex; and
•μ
ift
is random error term with a mean of zero.
This model is run repeatedly because each time the threshold
YOB changes, so does 1{YOB ≥YOB_Threshold}. Because the
birth year in the FHS data contains day and month, we use this
information for a subset of observations and do not treat birth
year as integer valued for all observations in the FHS. This strategy
of running a separate regression for each potential breakpoint
would have been computationally challenging. The estimator de-
veloped in ref. 4 uses grid search techniques to choose the
threshold year at which the relationship between the FTO ge-
notype and BMI is significantly modified for individuals born
before and after 1942. The threshold year is chosen as the value
that minimizes the sum of squared errors. Once the threshold
year is identified, OLS is run on Eq. S1 to obtain the estimates of
βs. Note, that although conventional SEs on the coefficients in
Eq. S1, which treat YOB_Threshold as the true value of the
threshold, are asymptotically valid, one needs to be careful in
testing the statistical significance of whether there is a non-
linearity in the estimated relationship between cohorts. Standard
tests using the Wald statistic have poor finite sample behavior
since the asymptotic sampling distribution depends on an un-
known parameter (YOB_Threshold) that is not identified under
the null hypotheses. We thus adopt the bootstrap Ftest proposed
in ref. 4 when testing if there is a significant threshold effect.
Although this estimator has the advantage of accurately
identifying the point at which there are significant changes in the
impact of the genotypes based on YOB, it imposes restrictions on
how the YOB affects BMI. Although we could add higher-order
terms to increase the flexibility, these terms make it more difficult
for the test statistics to exhibit dramatic changes as such tests will
have no power in many settings. Using different sets of control
variables in these models, we consistently identified breakpoints
between the years of 1942 and 1945 with decidedly nonlinear
changes in the magnitude of the parameter estimates after that
time. Estimates of the preferred specification from the breakpoint
model are depicted in Fig. 1, where we consider only a single
break at 1942, although various models after that time period
yield consistent results.
To identify age, period, and cohort (APC) effects in Eq. S1,we
exploit the fact that we used categorical variable age, irregular
period (year of observation) dummies, and mixed continuous–
categorical cohort (year born +birth era) in these linear and ad-
ditive APC models. This empirical strategy has been used widely in
the social sciences (5). An alternative approach to identifying the
separate effects of APC variables would be to consider nonlinear
relationships of a subset of these effects in the specification of the
model. To examine the robustness of our results, we followed this
strategy and first used small-order polynomials in the YOB to
identify and estimate cohort effects. Second, we conducted ro-
bustness exercises that estimated specifications allowing for poly-
nomials in period effects. Our main results were robust to these
alternative nonlinear treatments of cohort and period effects.
Models Treating Birth Cohort as Discrete. Our preferred method of
analysis does not include a continuous birth-year variable for the
reasons described above. Instead, we use the 1942 cutoff identified
as a breakpoint in our continuous model as a way to compare pre-
and postbirth cohorts. By treating the APC variables as dummy
variables, identification can be easily achieved by dropping a small
number of these variables. Our preferred strategy was to restrict
the indicator for individuals under the age of 30 and the indicator
for the first medical visit to be equal to zero. Intuitively, we hy-
pothesized that BMI was increasing both over time and as indi-
viduals age. Thus, we anticipate that these restrictions would impose
the weakest assumption on the model because the reference groups
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 1of10
include the youngest individuals and the earliest time period. Be-
cause the selection of which age and period indicators to drop is ad
hoc and because prior research (6) demonstrated that the results
obtained from APC models can be quite sensitive to which pa-
rameter restrictions are made, we investigated the sensitivity of our
results to dropping nine different age or period indicators. In each
of these nine cases, our main results showed a significant inter-
action between FTO genotype and cohort.
By using indicator variables, we are relaxing the assumptions
made on the form and pattern of the relationship between BMI
and the explanatory variables, relative to the analysis where birth
cohort was modeled as a continuous variable. Estimates of the
preferred specification of this model, using discrete birth cohort
variables with the earliest age and time period effect restricted to
be zero, are presented in Table 1.
In Table S3, we list sample means for BMI within subsamples
defined by their rs9939609 genotype and age at examination, with
age measured in 5-y intervals. In the bottom two rows of the table
for each genotype, we present results from ttests of differences in
means across cohorts. These results show that, without control-
ling for other factors, there are numerous significant differences
in BMI between those born before and after 1942. Although there
is no significant difference in BMI between those born pre-/post-
1942 for any age cell for the rs9939609 TT polymorphism, nearly
every age cell for the AT polymorphism indicates that BMI is
significantly greater for those born after 1942. Similarly, among
the sample for those born post-1942 and either aged 35–40 or 45–
50, we observe significantly higher BMI among the later cohort.
To more formally examine the importance of birth cohort
interactions with genotype, we initially estimated models that
allowed for other sources of heterogeneity, shown in Table S3.
Specifically we decomposed the error term (μ
ift
) from Eq. S1 into
two components and estimate
BMIift =β0+β1ageift +β2wavet+β3post42i+β4genei
+β5Xift +β6ðgenei×sexiÞ+β7genei×ageift
+β8ðpost42i×geneiÞ+β9ðwavet×geneiÞ+vf+eift;
[S2]
where
•post42 is an indicator variable if the individual was born dur-
ing or after 1942;
•v
f
is a term that controls for family-specific unobserved het-
erogeneity; and
•e
ift
is random error term with a mean of zero.
This model allows for contemporaneous impacts as measured
by period of interview, cohort effects, and age effects as well as
their interactions with genetic factors. Again, note that family
fixed-effect models implicitly include shared genotype as part of
shared familial environment. To identify all of these factors, in the
main test we imposed restrictions and removed indicators for the
first wave, first age interval (27–30), and the TT polymorphism
(and their interactions) to ensure there was no multicollinearity.
To evaluate the individual importance of including genetic
interactions with sex and APC indicators, we considered specifi-
cation tests that compared estimates of the unrestricted model in
Eq. S2 to a series of nested models in which only one of these sets
of interactions was restricted to be zero. These Ftests test the joint
significance of the set of indicators and help us to identify the
regression model that best fits the population from which the data
were sampled. Tests of joint significance individually reject both
the period interactions (β
9
=0, F=0.5891, P>F=0.6912) and
the sex interactions (β
6
=0, F=1.12, P>F=0.3494) but not the
cohort interactions at significance levels below 0.01 (β
8
≠0, F=
17.51, P>F=2.1 ×10
−4
). Thus, our preferred specification
excludes these two sets of interactions and we focus on the fol-
lowing model:
BMIift =α0+α1ageift +α2wavet+α3post42i+α4genei+α5Xift
+α6ðgenei×sexiÞ+α7ðpost42i×geneiÞ+vf+e
p
ift:
[S3]
Note we use different notation for both the coefficients and error
term in Eqs. S2 and S3 because they may differ due to the omis-
sion of the genetic interactions with both sex and wave. We esti-
mate Eq. S3 using three different estimators that each impose
a different assumption regarding v
f
. OLS estimates are obtained
by assuming v
f
=0. The family fixed-effects estimator assumes
that v
f
is sibling-invariant family-specific unobserved heterogene-
ity that may be correlated with the explanatory variables. A ran-
dom-effects estimator assumes that v
f
is sibling-invariant family-
specific unobserved heterogeneity that is uncorrelated with the
explanatory variables. Because these fixed-effect and random-
effect models account for family-specific unobserved heterogene-
ity, more reliable estimates are likely obtained because they adjust
for the effects of shared unobserved influences on BMI between
biological siblings. The random-effect model yields more precise
estimates when part of the effect of genetic factors operates at the
level of the family (e.g., there is an independent effect of the
extent to which a genotype is present within a family and the mean
BMI in the family). However, the family fixed-effects model
blocks both genetic factors and parental characteristics/behaviors
that are common to family members (e.g., siblings), including un-
measured factors; therefore, from the perspective of confounding,
the fixed-effect specification is preferred.
As first noted in ref. 7, estimates of the impacts of genetic
factors on outcomes that ignore family fixed effects may also
capture dynastic effects because both genetic markers and many
phenotypes are transmitted from one generation to the next.
OLS and random-effect estimates of Eq. S3 may not isolate the
unique contribution of one’s genotype from those arising from
intergenerational transmission of genetic and behavioral char-
acteristics. That is, the random-effects model (as with the tradi-
tional linear regression estimator) assumes that the family-
specific term is uncorrelated with the explanatory variables but
makes use of the structure of the error term (μ
ift
) to provide more
reliable and precise estimates. On the other hand, using a family
fixed-effects estimator that controls for these unobserved family-
specific effects assuming their effects are constant between sib-
lings, allows for correlations with explanatory variables thereby
removing a potential source of bias in the resulting estimates,
and can (more importantly) isolate the specific contribution of
one’s genotype.
More generally, we suggest that presenting estimation results
that are made with different estimators that each impose different
assumptions on how v
f
relates to the discrete cohort variables
serves as an additional robustness check on the main findings.
The results for these three estimators are presented in Table S2.
Notice that, irrespective of the estimation method, the inter-
action term of birth cohort and genotype is significant for AT
and AA in the random-effects specification. Because in many
age groups BMI was higher for those born before 1942 than after
1942 for those with the TT polymorphism, the negative sign on
post42 was expected. Finally, the last two columns of Table S2
indicate the robustness of the main results to different methods
of accounting for family unobserved heterogeneity, increasing
our confidence in the main findings. Repeated models run on
males and females separately further support our findings, as the
interactions between genetic polymorphism and being born after
1942 are positive for both sexes and statistically significant,
particularly in the random-effects specifications for which the
most efficient estimates are obtained.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 2of10
A final point related to the identification of APC models is that
many of the explanatory variables will be highly correlated. For
example, in later waves, older individuals will be by definition
born in the later cohort. The correlation between the explanatory
variables will not bias our estimates but will lead to larger SEs,
assuming the model is specified correctly. As such, it is not a
surprise that many of the estimated coefficients in our models
have wide CIs. Intuitively, large SEs imply that the effects of
different variables are highly uncertain, and, when independent
variables are highly correlated, high uncertainty is what should be
reported. The only solution to reduce the width of CIs would be to
collect more data to gain more independent variation to identify
the separate effects. Chapter 23 of ref. 8 provides a more detailed
discussion of how highly correlated explanatory variables will
lead to unbiased estimates but may influence the interpretation
of results from linear regression models.
Lastly, in Tables S7 and S8, respectively, we considered esti-
mating models that either ignore both age and cohort effects (as
well as their interactions) and models that only ignores cohort
effects. Table S7 can be viewed as a model that allows for main
genetic effects and contemporaneous g-by-p relationships. Not
surprisingly, we find that interaction effects in later waves are
larger in magnitude. This is in part capturing the effect of having
a larger percentage of older individuals in later time periods and
having more people born in the second cohort being interviewed
in later time periods. In other words, the g-by-p variable is likely
positively correlated with g-by-a and g-by-c variables that cor-
respond to both older individuals and those born in later cohorts.
Thus, by omitting both age and cohort effects when estimating
a variant of Eq. S3, the estimate of the g-by-p effect is biased
upwards because it is also capturing part of the effects of these
omitted variables that, as described, are correlated with the g-by-p
variable. Table S8 shows that many of these biased estimates
become smaller once we also allow for age effects. That is, by
including age indicators, the coefficients on the g-by-p effect on
average become smaller in magnitude, though they continue to
exceed the estimates presented in Table 1. The decline in the
magnitude of many of the g-by-p effects reinforces the bias from
simply omitting relevant information on how genetic factors in-
fluence human development over the lifecycle.
However, the estimates in Tables S7 and S8 also omit relevant
information on how genetic effects differ across eras in which an
individual grows up and, thus, it is not surprising that they differ
markedly from those presented in both Table 1 and Tables S1, S2,
and S4. In particular, omitting this relevant information allows
one to erroneously conclude that several of the g-by-p and g-by-a
interactions have a statistically significant impact. Many of these
effects become statistically insignificant once we allow for g-by-c
effects. Because the specifications presented in Tables S7 and S8
are restricted versions of our more general APC model presented
in Eq. S2, we conducted a series of model specification tests to
examine the validity of these restrictions. Irrespective of the
estimator used, the test results reject these restrictions rein-
forcing that researchers working with the FHS data should both
allow for both main cohort effects and g-by-c interactions. This
finding has implications for the interpretation of estimates from
many g-by-e studies which only use interactions between gene
and contemporaneous periods—which, primarily due to data
limitations, have collected data on individuals for shorter durations
and fewer cohorts. This also reinforces the utility of genotyping
large-scale longitudinal databases thereby allowing researchers to
examine whether specific g-by-e effects are sensitive to APC effects.
1. Fienberg S-E, Mason W-M (1979) Identification and estimation of age–period–cohort
models in the analysis of discrete archival data. Sociol Methodol 10(1):1–67.
2. Glenn N-D (1981) The utility and logic of cohort a nalysis. J Appl Behav Sci 2(17):
247–257.
3. Mason KO, Winsborough HH, Mason WM, Poole WK (1973) Some methodological is-
sues in cohort analysis of archival data. Am Sociol Rev 38(2):242–258.
4. Hansen BE (1999) Threshold effects in non-dynamic panels: Estimation, testing, and
inference. J Econom 93(2):345–368.
5. Card D, Lemieux T (2001) Can falling supply explain the rising return to college for
younger men? A cohort-based analysis. Q J Econ 116(2):705–746.
6. Glenn N-D (1976) Cohort analysts’futile quest: Statistical attempts to separate age,
period and cohort effects. Am Sociol Rev 41(5):900–904.
7. Ding W, Lehrer S-F, Rosenquist J-N, Audrain-McGovern J (2009) The impact of poor
health on academic performance: New evidence using genetic markers. J Health Econ
28(3):578–597.
8. Goldberger AS (1991) A Course in Econometrics (Harvard Univ Press, Cambridge, MA).
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 3of10
Table S1. Model estimates of factors influencing BMI, where birth year is treated as continuous variables
Models that exclude genetic interactions with
both age and birth cohort variables
Models that include genetic interactions with
both age and birth cohort variables
Estimation
approach
Linear
regression
Random
effects
Linear regression
with controls
for family fixed
effects
Linear
regression
Random
effects
Linear regression
with controls
for family fixed
effects
Subject is male 1.805*** (0.152) 1.635*** (0.146) 1.855*** (0.169) 1.804*** (0.152) 1.633*** (0.146) 1.855*** (0.169)
Age 30–34.99 0.195 (0.204) 0.413*** (0.106) 0.0802 (0.165) 0.525* (0.315) 0.437** (0.175) 0.191 (0.263)
Age 35–39.99 0.504* (0.266) 0.648*** (0.126) 0.329* (0.193) 0.497 (0.310) 0.515*** (0.181) 0.247 (0.257)
Age 40–44.99 0.838** (0.375) 1.014*** (0.160) 0.568** (0.266) 0.968** (0.407) 0.860*** (0.204) 0.445 (0.316)
Age 45–49.99 1.023** (0.500) 1.175*** (0.201) 0.564 (0.345) 1.091** (0.525) 0.992*** (0.237) 0.433 (0.395)
Age 50–54.99 1.113* (0.620) 1.193*** (0.243) 0.522 (0.425) 1.105* (0.640) 0.968*** (0.274) 0.26 (0.468)
Age 55–59.99 1.122 (0.743) 1.138*** (0.289) 0.349 (0.505) 1.104 (0.756) 0.954*** (0.316) 0.162 (0.541)
Age 60–63 0.902 (0.858) 0.998*** (0.330) 0.142 (0.589) 0.876 (0.871) 0.859** (0.355) 0.00354 (0.623)
Birth year −0.039 (0.0265) −0.0439*** (0.0121) −0.0733*** (0.0219) −0.0835*** (0.0288) −0.0793*** (0.0161) −0.107*** (0.0266)
Wave 2 0.132 (0.205) 0.259*** (0.0893) 0.367** (0.145) 0.132 (0.205) 0.259*** (0.0893) 0.369** (0.145)
Wave 3 0.608* (0.315) 0.752*** (0.126) 0.974*** (0.221) 0.609* (0.315) 0.753*** (0.126) 0.979*** (0.220)
Wave 4 1.250*** (0.400) 1.338*** (0.156) 1.652*** (0.278) 1.250*** (0.400) 1.339*** (0.156) 1.657*** (0.277)
Wave 5 1.854*** (0.494) 2.007*** (0.190) 2.391*** (0.346) 1.855*** (0.493) 2.008*** (0.190) 2.396*** (0.345)
Wave 6 2.521*** (0.598) 2.667*** (0.228) 3.143*** (0.414) 2.520*** (0.597) 2.667*** (0.228) 3.149*** (0.413)
Wave 7 2.836*** (0.668) 3.054*** (0.256) 3.558*** (0.463) 2.834*** (0.667) 3.053*** (0.256) 3.563*** (0.462)
Wave 8 3.307*** (0.847) 3.713*** (0.318) 4.321*** (0.583) 3.313*** (0.846) 3.718*** (0.318) 4.337*** (0.582)
AA genotype 1.060*** (0.247) 1.035*** (0.226) 0.881*** (0.318) 0.767 (1.267) −0.953 (0.958) −0.312 (1.407)
AT genotype 0.421** (0.167) 0.379** (0.161) 0.490** (0.217) −2.546*** (0.841) −2.021*** (0.699) −2.027** (0.943)
Born after 1942
by AA Genotype
0.0632** (0.0286) 0.0538** (0.023) 0.0432 (0.0329)
Born after 1942
by AT Genotype
0.0699*** (0.0187) 0.0537*** (0.0168) 0.0519** (0.0221)
30–34.99 by AA −1.293** (0.644) −0.492 (0.311) −1.142** (0.485)
35–39.99 by AA 0.775 (0.513) −0.230 (0.290) −0.746* (0.389)
40–44.99 by AA −0.991* (0.524) −0.136 (0.279) −0.680* (0.404)
45–49.99 by AA −0.876 (0.544) −0.167 (0.280) −0.682 (0.429)
50–54.99 by AA −0.672 (0.549) 0.0483 (0.279) −0.368 (0.43)
55–55.99 by AA −0.6 (0.571) −0.0159 (0.283) −0.431 (0.457)
60–63 by AA −0.378 (0.564) 0.0663 (0.294) −0.251 (0.454)
30–34.99 by AT −0.271 (0.393) 0.0883 (0.220) 0.11 (0.338)
35–39.99 by AT 0.227 (0.309) 0.319 (0.204) 0.365 (0.278)
40–44.99 by AT 0.0216 (0.319) 0.334* (0.199) 0.43 (0.296)
45–49.99 by AT 0.114 (0.337) 0.400** (0.199) 0.443 (0.306)
50–54.99 by AT 0.218 (0.336) 0.422** (0.198) 0.612* (0.313)
55–59.99 by AT 0.202 (0.355) 0.356* (0.202) 0.473 (0.325)
60–63 by AT 0.163 (0.344) 0.245 (0.208) 0.326 (0.328)
Constant 24.98*** (1.227) 25.07*** (0.551) 26.42*** (0.987) 26.74*** (1.312) 26.59*** (0.696) 27.89*** (1.172)
Observations 19,617 19,617 19,617 19,617 19,617 19,617
No. of family
fixed effects
Not applicable Not applicable 1,414 Not applicable Not applicable 1,414
R
2
0.095 0.098 0.479 0.098 0.103 0.48
Presented are estimates of the age–period–cohort model where the cohort variable is treated as continuous. Each entry refers to the effect of the variable
listed in the first column on BMI holding all other factors constant. Robust SEs are presented in parentheses. The columns in this table differ based on what
factors are accounted for and the method used to estimate the statistical model. See Table S6 for the calendar time corresponding to examinations in each
wave. Note that our main results of birth cohort and genotype interactions are not sensitive to the method by which the model was estimated. Estimates from
the fifth column were used to generate Fig. 1. The following indicate the statistical significance of an explanatory variable on BMI: *** P<0.01, **P<0.05, and
*P<0.1.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 4of10
Table S2. Model estimates of factors influencing BMI, where birth year is treated as a discrete variable for pre-/post-1942 as birth year
Estimator →explanatory variables ↓Linear regression Random effects
Linear regression with controls for
family fixed effects
Subject is male 1.812*** (0.152) 1.641*** (0.146) 1.835*** (0.154)
Age 30–34.99 0.532* (0.306) 0.477*** (0.174) 0.542** (0.254)
Age 35–39.99 0.608** (0.255) 0.608*** (0.174) 0.710*** (0.243)
Age 40–44.99 1.245*** (0.291) 1.011*** (0.188) 1.285*** (0.298)
Age 45–49.99 1.498*** (0.340) 1.199*** (0.212) 1.498*** (0.362)
Age 50–54.99 1.640*** (0.383) 1.231*** (0.238) 1.540*** (0.425)
Age 55–59.99 1.765*** (0.453) 1.272*** (0.269) 1.688*** (0.497)
Age 60–63 1.658*** (0.503) 1.229*** (0.300) 1.635*** (0.567)
Subject was born after 1942 −1.086*** (0.326) −1.360*** (0.280) −1.020*** (0.353)
Wave 2 −0.0447 (0.113) 0.173** (0.0774) 0.0239 (0.127)
Wave 3 0.321* (0.166) 0.617*** (0.105) 0.389** (0.192)
Wave 4 0.874*** (0.206) 1.163*** (0.128) 0.888*** (0.241)
Wave 5 1.385*** (0.252) 1.791*** (0.155) 1.434*** (0.299)
Wave 6 1.958*** (0.310) 2.406*** (0.185) 1.984*** (0.365)
Wave 7 2.216*** (0.356) 2.760*** (0.207) 2.223*** (0.416)
Wave 8 2.576*** (0.453) 3.356*** (0.258) 2.703*** (0.526)
AA genotype 1.385** (0.599) 0.708* (0.398) 1.622*** (0.570)
AT genotype −0.359 (0.389) −0.412 (0.282) −0.412 (0.383)
Born after 1942 by AA genotype 0.956* (0.509) 1.041** (0.459) 0.689 (0.563)
Born after 1942 by AT genotype 1.255*** (0.348) 1.135*** (0.326) 1.129*** (0.386)
Age 30–34.99 by AA −1.236* (0.637) −0.488 (0.311) −1.596*** (0.527)
Age 35–39.99 by AA −0.723 (0.507) −0.227 (0.290) −1.241*** (0.404)
Age 40–44.99 by AA −0.999* (0.517) −0.135 (0.279) −1.246*** (0.442)
Age 45–49.99 by AA −0.921* (0.539) −0.168 (0.280) −1.141** (0.460)
Age 50–54.99 by AA −0.751 (0.543) 0.0459 (0.279) −0.968** (0.470)
Age 55–59.99 by AA −0.708 (0.582) −0.0173 (0.283) −0.911* (0.493)
Age 60–63 by AA −0.539 (0.568) 0.0613 (0.294) −0.710 (0.497)
Age 30–34.99 by AT −0.171 (0.392) 0.0928 (0.220) −0.199 (0.332)
Age 35–39.99 by AT 0.343 (0.308) 0.324 (0.204) 0.279 (0.273)
Age 40–44.99 by AT 0.0780 (0.321) 0.338* (0.199) 0.183 (0.293)
Age 45–49.99 by AT 0.155 (0.340) 0.402** (0.199) 0.237 (0.306)
Age 50–54.99 by AT 0.240 (0.338) 0.423** (0.198) 0.455 (0.310)
Age 55–59.99 by AT 0.208 (0.367) 0.357* (0.202) 0.418 (0.326)
Age 60–63 by AT 0.147 (0.360) 0.244 (0.208) 0.272 (0.333)
Constant 23.80*** (0.348) 24.01*** (0.250) 23.75*** (0.335)
Observations 19,617 19,617 19,617
R
2
0.099 0.106 0.397
No. of Individuals 3,720 3,720 3,720
No. of family fixed effects 1,414 1,414 1,414
Presented are estimates of the age–period–cohort model where the cohort variable is treated as discrete as indicated in Eq. S3. Each entry refers to the effect
of the variable listed in the first column on BMI holding all other factors constant. Robust SEs are presented in parentheses. The columns in this table differ
based on what factors are accounted for and the method used to estimate the statistical model. See Table S6 for the calendar time corresponding to
examinations in each wave. Note that our main results of birth cohort and genotype interactions are not sensitive to the method by which the model was
estimate. The following indicate the statistical significance of each explanatory variable: ***P<0.01, **P<0.05, and *P<0.1.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 5of10
Table S3. Descriptive statistics of BMI by genotype based on age at examination and between birth cohorts (1942 change point)
AA genotype
Age group 30–34.99 35–39.99 40–44.99 45–49.99 50–54.99 55–59.99
Pre-1942 24.679 (0.569) 25.605 (0.370) 26.234 (0.405) 26.526 (0.271) 27.747 (0.253) 28.194 (0.249)
95% CI 23.526–25.832 24.869–26.341 25.433–27.035 25.991–27.060 27.248–28.246 27.704–28.685
Post-1942 25.51027 (0.412) 26.368 (0.391) 27.047 (0.332) 28.155 (0.359) 28.480 (0.385) 29.089 (0.455)
95% CI 24.697–26.323 25.597–27.141 26.395–27.700 27.448–28.861 27.721–29.239 28.192–29.986
Observations by birth cohort
sample
38 (pre) 82 (pre) 131 (pre) 222 (pre) 292 (pre) 345 (pre)
154 (post) 201 (post) 279 (post) 267 (post) 227 (post) 189 (post)
ttest of difference in means
between cohorts
−0.948 −1.162 −1.457 −3.504 −1.649 −1.879
Pvalue of two-sided ttest above
P(T<t)
0.172 0.123 0.073 0.0002 0.050 0.030
AT genotype
Age group 30–34.99 35–39.99 40–44.99 45–49.99 50–54.99 55–59.99
Pre-1942 25.134 (0.372) 25.413 (0.233) 25.803 (0.178) 26.269 (0.159) 26.692 (0.135) 27.116 (0.135)
95% CI 24.396–25.872 24.954–25.871 25.453–26.153 25.957–26.581 26.427–26.957 26.851–27.382
Post-1942 24.903 (0.189) 25.822 (0.177) 26.573 (0.180) 27.465 (0.182) 28.283 (0.190) 28.899 (0.244)
95% CI 24.531–25.275 25.474–26.170 26.220–26.925 27.109–27.822 27.910–28.655 28.420–29.377
Observations by birth cohort
sample
115 (pre) 290 (pre) 477 (pre) 748 (pre) 1,003 (Pre) 1,116 (pre)
587 (post) 775 (post) 926 (post) 901 (post) 841 (post) 580 (post)
ttest of difference in means
between cohorts
0.504 −1.268 −2.739 −4.855 −6.981 −6.933
Pvalue of two-sided ttest
above P(T<t)
0.693 0.103 0.003 P<0.001 P<0.001 P<0.001
TT genotype
Age group 30–34.99 35–39.99 40–44.99 45–49.99 50–54.99 55–59.99
Pre-1942 24.824 (0.486) 25.392 (0.285) 25.807 (0.245) 26.277 (0.197) 26.878 (0.178) 27.346 (0.170)
95% CI 23.857–25.791 24.829–25.954 25.325–26.288 25.890–26.663 26.528–27.228 27.012–27.680
Post-1942 24.314 (0.233) 24.568 (0.181) 25.687 (0.194) 26.470 (0.191) 27.082 (0.211) 27.611 (0.257)
95% CI 23.855–24.773 24.211–24.924 25.305–26.069 26.096–26.845 26.667–27.497 27.106–28.117
Observations by birth cohort
sample
86 (pre) 212 (pre) 373 (pre) 565 (pre) 731 (pre) 876 (pre)
377 (post) 485 (post) 591 (post) 579 (post) 552 (post) 370 (post)
ttest of difference in means
between cohorts
0.943 2.475 0.383 −0.708 −0.741 −0.855
Pvalue of two-sided ttest
above P(T<t)
0.827 0.993 0.649 0.240 0.230 0.761
The means and SDs are shown in parentheses of BMI for individuals with a specific FTO allele type and age range at time of examination. ttests test that
there are no differences in average BMI conditional on age and FTO allele type across the birth cohorts with the 1942 breakpoint are calculated. ***P<0.01,
**P<0.05, and *P<0.1. Observation numbers are pre-1942 cohort +post-1942 cohort. The table clearly indicates that there are statistically significant
differences for those with the AA and AT genotypes by birth cohort but there are no age ranges for those with the TT genotype where a statistically significant
difference in BMI exists between cohorts.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 6of10
Table S4. Model estimates by sex of factors influencing BMI, where birth year is treated as a discrete variable for pre-/post-1942 as
birth year
Females Males
Estimator
Linear
regression
Random
effects
Linear regression
with controls for
family fixed effects
Linear
regression
Random
effects
Linear regression
with controls for
family fixed effects
Age 35–39.99 0.125 (0.290) 0.0874 (0.171) 0.205 (0.298) 0.364 (0.222) 0.438*** (0.150) 0.438* (0.254)
Age 40–44.99 0.872*** (0.336) 0.676*** (0.191) 0.903*** (0.294) 0.853*** (0.268) 0.617*** (0.166) 0.545** (0.252)
Age 45–49.99 1.117** (0.446) 0.752*** (0.232) 0.973*** (0.315) 1.097*** (0.317) 0.886*** (0.195) 0.714*** (0.266)
Age 50–54.99 1.179** (0.519) 0.780*** (0.279) 1.065*** (0.347) 1.335*** (0.396) 0.895*** (0.230) 0.693** (0.287)
Age 55–59.99 1.387** (0.630) 0.920*** (0.330) 1.264*** (0.387) 1.349*** (0.470) 0.783*** (0.272) 0.539* (0.320)
Age 60–63 1.193 (0.738) 0.793** (0.382) 1.202*** (0.437) 1.373** (0.548) 0.856*** (0.314) 0.566 (0.358)
Subject was
born after
1942
−1.764*** (0.485) −2.067*** (0.427) −1.89*** (0.227) −0.313 (0.421) −0.612* (0.342) −0.229 (0.204)
Wave 2 0.148 (0.179) 0.427*** (0.116) 0.217 (0.163) −0.223 (0.136) −0.0561 (0.0952) 0.00904 (0.132)
Wave 3 0.525** (0.263) 0.926*** (0.157) 0.658*** (0.192) 0.126 (0.198) 0.346*** (0.128) 0.466*** (0.153)
Wave 4 1.147*** (0.329) 1.568*** (0.192) 1.264*** (0.217) 0.592** (0.246) 0.798*** (0.156) 0.980*** (0.172)
Wave 5 1.676*** (0.400) 2.266*** (0.232) 1.869*** (0.249) 1.079*** (0.304) 1.359*** (0.189) 1.577*** (0.198)
Wave 6 2.355*** (0.493) 3.013*** (0.277) 2.591*** (0.289) 1.532*** (0.371) 1.845*** (0.225) 2.140*** (0.230)
Wave 7 2.648*** (0.568) 3.421*** (0.310) 2.913*** (0.320) 1.746*** (0.423) 2.146*** (0.252) 2.478*** (0.255)
Wave8 3.240*** (0.723) 4.164*** (0.384) 3.687*** (0.399) 1.805*** (0.529) 2.574*** (0.315) 2.931*** (0.322)
AA genotype 0.908 (0.838) 0.270 (0.584) 0.982 (0.677) 1.265* (0.744) 0.618 (0.468) 0.982* (0.593)
AT genotype −1.389*** (0.532) −1.501*** (0.397) −0.832* (0.437) −0.0426 (0.432) 0.0210 (0.310) 0.148 (0.354)
Born after 1942 by AA
genotype
0.950 (0.782) 0.917 (0.717) −0.689* (0.361) 0.925 (0.630) 1.146** (0.552) 1.410*** (0.320)
Born after 1942 by AT
genotype
2.043*** (0.524) 1.729*** (0.503) 1.505*** (0.248) 0.362 (0.441) 0.443 (0.398) −0.0421 (0.219)
Age 30–34.99 by AA −1.109 (0.741) −0.0415 (0.383) −0.823 (0.690) −0.413 (0.782) −0.0732 (0.339) −0.246 (0.599)
Age 35–39.99 by AA −0.409 (0.684) 0.305 (0.386) −0.316 (0.708) −0.418 (0.634) −0.230 (0.346) −0.421 (0.617)
Age 40–44.99 by AA −0.515 (0.707) 0.294 (0.367) −0.405 (0.676) −0.876 (0.633) −0.0224 (0.332) −0.213 (0.593)
Age 45–49.99 by AA −0.580 (0.723) 0.483 (0.367) −0.400 (0.672) −0.632 (0.685) −0.268 (0.331) −0.398 (0.589)
Age 50–54.99 by AA −0.118 (0.744) 0.744** (0.368) −0.0716 (0.674) −0.747 (0.685) −0.109 (0.331) −0.183 (0.589)
Age 55–59.99 by AA −0.185 (0.793) 0.474 (0.373) −0.0693 (0.681) −0.567 (0.734) 0.0982 (0.335) −0.0631 (0.594)
Age 60–63 by AA −0.0671 (0.788) 0.467 (0.391) −0.256 (0.718) −0.403 (0.724) 0.196 (0.352) 0.0746 (0.626)
Age 30–34.99 by AT 0.166 (0.370) 0.838*** (0.214) 0.437 (0.383) 0.614* (0.324) 0.349** (0.176) 0.327 (0.303)
Age 35–39.99 by AT 0.706* (0.397) 1.100*** (0.251) 0.791* (0.462) 0.726** (0.325) 0.230 (0.212) 0.384 (0.374)
Age 40–44.99 by AT 0.550 (0.393) 1.114*** (0.241) 0.767* (0.443) 0.366 (0.343) 0.235 (0.204) 0.315 (0.360)
Age 45–49.99 by AT 0.775* (0.452) 1.349*** (0.238) 1.039** (0.437) 0.296 (0.346) 0.127 (0.203) 0.162 (0.356)
Age 50–54.99 by AT 0.972** (0.437) 1.347*** (0.237) 1.187*** (0.436) 0.248 (0.374) 0.181 (0.201) 0.233 (0.354)
Age 55–59.99 by AT 1.077** (0.484) 1.288*** (0.243) 1.091** (0.444) 0.0747 (0.387) 0.123 (0.206) 0.0603 (0.362)
Age 60–63 by AT 1.130** (0.492) 1.195*** (0.256) 1.086** (0.470) −0.104 (0.417) −0.0281 (0.216) −0.106 (0.380)
Constant 24.29*** (0.420) 24.34*** (0.302) 24.16*** (0.269) 25.85*** (0.335) 26.04*** (0.244) 25.87*** (0.233)
Observations 1,957 1,957 1,957 1,763 1,763 1,763
R
2
0.080 0.87 0.569 0.062 0.073 0.531
No. of
Individuals
10,404 10,404 10,404 9,213 9,213 9,213
No. of family
fixed effects
Not applicable Not applicable 983 Not applicable Not applicable 888
Presented are estimates of the age–period–cohort model where the cohort variable is treated as discrete as indicated in Eq. S3. Each entry refers to the effect of
the variable listed in the first column on BMI holding all other factors constant. Robust SEs are presented in parentheses. The columns in this table differ based on
the sex subsample as indicated row 1 and the method used to estimate the statistical model indicated in row 2. See Table S6 for the calendar time corresponding to
examinations in each wave. The following indicate the statistical significance of each explanatory variable: ***P<0.01, **P<0.05, and *P<0.1.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 7of10
Table S5. Descriptive statistics on genetic characteristics across
birth cohorts
Genotype at
rs9939609
Individuals born
pre-1942
Individuals born
post-1942
TT 787 (36.52%) 517 (33.04%)
AT 1,049 (48.68%) 812 (51.77%)
AA 319 (14.85%) 236 (15.08%)
No. of people 2,155 1,565
Presented is the distribution of genetic risk alleles of individuals in the
Framingham Offspring Study born pre- and post-1942. A Pearson’sχ
2
for the
hypothesis that the rows and columns in a two-way table are independent
accounting for correlations within families yields P>Χ
2
=0.1550, χ
2
(2) =
1.88. This indicates that the distributions of genetic risk factors do not differ
between cohorts born pre- and post-1942.
Table S6. Descriptive statistics
No. of unique subjects 3,720
Total no. of observations in estimation sample 19,617
Subjects that are male, % 52.61
Mean age of subject at data collection (SD) 48.1763 (9.4095)
No. of individuals born before 1920 83
No. of Individuals born between 1920 and 1925 323
No. of Individuals born between 1925 and 1930 481
No. of Individuals born between 1930 and 1935 561
No. of Individuals born between 1935 and 1940 575
No. of Individuals born between 1940 and 1945 715
No. of Individuals born between 1945 and 1950 537
No. of individuals born after 1950 351
Observations collected in wave 1 beginning 30 Aug 1971 3,720
Observations collected in wave 2 beginning 26 Jan 1995 3,581
Observations collected in wave 3 beginning 20 Dec 1983 3,326
Observations collected in wave 4 beginning 22 Apr 1987 2,955
Observations collected in wave 5 beginning 23 Jan 1991 2,488
Observations collected in wave 6 beginning 26 Jan 1995 1,916
Observations collected in wave 7 beginning 11 Sep 1998 1,310
Observations collected in wave 8 beginning 10 Mar 2005 321
No. of Individuals with FTO–AA, % 555 (14.56)
No. of Individuals with FTO–AT, % 1,861 (50.03)
No. of Individuals with FTO–TT, % 1,304 (35.05)
BMI 26.869 (5.013)
Provided are the summary statistics for the measures used in the multivariate regression analysis. We only list
the date of the first interview for each wave in the description above because the examinations in each wave
were held over several years and the exact time could be inferred by taking the difference between age at
examination and YOB.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 8of10
Table S7. Model estimates of factors influencing BMI, where we ignore cohort effects and interactions of genetic factors with age and
birth cohort indicators
Estimator Linear regression Random effects
Linear regression with controls for
family fixed effects
Wave 2 −0.252** (0.101) 0.0764 (0.0943) −0.203 (0.169)
Wave 3 0.0669 (0.132) 0.491*** (0.109) 0.146 (0.176)
Wave 4 0.538*** (0.158) 1.023*** (0.122) 0.573*** (0.182)
Wave 5 0.928*** (0.184) 1.530*** (0.139) 0.961*** (0.194)
Wave 6 1.304*** (0.235) 2.146*** (0.161) 1.387*** (0.212)
Wave 7 1.338*** (0.277) 2.337*** (0.178) 1.458*** (0.227)
Wave 8 1.275*** (0.374) 2.831*** (0.220) 1.657*** (0.277)
AA genotype 0.574 (0.357) 0.854*** (0.328) 0.123 (0.254)
AT genotype 0.131 (0.237) 0.336 (0.231) 0.523*** (0.179)
Age 35–39.99 0.595*** (0.101) 0.489*** (0.0718) 0.681*** (0.125)
Age 40–44.99 1.111*** (0.116) 0.954*** (0.0859) 1.244*** (0.126)
Age 45–49.99 1.485*** (0.146) 1.214*** (0.107) 1.570*** (0.134)
Age 50–54.99 1.762*** (0.176) 1.337*** (0.131) 1.856*** (0.147)
Age 55–59.99 1.960*** (0.217) 1.381*** (0.157) 2.015*** (0.163)
Age 60–63 1.912*** (0.251) 1.341*** (0.183) 2.117*** (0.182)
Wave 2 by AA 0.279 (0.182) 0.0714 (0.156) 0.165 (0.305)
Wave 3 by AA 0.225 (0.222) 0.139 (0.157) 0.176 (0.307)
Wave 4 by AA 0.500** (0.235) 0.224 (0.159) 0.396 (0.309)
Wave 5 by AA 0.395 (0.273) 0.254 (0.165) 0.394 (0.319)
Wave 6 by AA 0.620* (0.338) 0.289* (0.175) 0.530 (0.338)
Wave 7 by AA 1.024*** (0.382) 0.553*** (0.182) 0.762** (0.350)
Wave 8 by AA 0.910 (0.590) 0.237 (0.229) 0.561 (0.436)
Wave 2 by AT 0.135 (0.120) 0.0521 (0.110) 0.0905 (0.215)
Wave 3 by AT 0.135 (0.141) 0.00837 (0.111) 0.0159 (0.216)
Wave 4 by AT 0.138 (0.155) −0.0530 (0.111) 0.00228 (0.217)
Wave 5 by AT 0.314* (0.182) 0.108 (0.116) 0.209 (0.225)
Wave 6 by AT 0.517** (0.230) 0.0139 (0.124) 0.281 (0.239)
Wave 7 by AT 0.724*** (0.267) 0.201 (0.130) 0.502** (0.250)
Wave 8 by AT 1.325*** (0.384) 0.371** (0.160) 0.831*** (0.304)
Constant 23.65*** (0.187) 23.60*** (0.181) 23.41*** (0.150)
Observations 19,617 19,617 19,617
R
2
0.096 0.097 0.480
No. of individuals 3,720 3,720 3,720
No. of family fixed effects Not applicable Not applicable 1,414
Presented are estimates of the age–period model where the cohort variable is not included and the only genetic interactions included are those with period
effects allowing solely for contemporaneous gene–environment interactions. The age and period variables are treated as discrete as indicated in Eq. S3.Each
entry refers to the effect of the variable listed in the first column on BMI holding all other factors constant. Robust SEs are presented in parentheses.The
columns in this table differ based on what factors are accounted for and the method used to estimate the statistical model. See Table S6 for the calendar time
corresponding to examinations in each wave. Note that our main results of birth cohort and genotype interactions are not sensitive to the method by which the
model was estimate. The following indicate the statistical significance of each explanatory variable: ***P<0.01, **P<0.05, and *P<0.1.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 9of10
Table S8. Model estimates of factors influencing BMI, where we ignore cohort effects and interactions of genetic factors with birth
cohort indicators
Estimator Linear regression Random effects
Linear regression with controls for
family fixed effects
Wave 2 −0.411*** (0.121) −0.0757 (0.112) −0.337* (0.174)
Wave 3 −0.197 (0.172) 0.244* (0.144) −0.0737 (0.188)
Wave 4 0.204 (0.215) 0.700*** (0.172) 0.288 (0.200)
Wave 5 0.528** (0.252) 1.124*** (0.204) 0.609*** (0.218)
Wave 6 0.835*** 0.319) 1.646*** 0.242) 0.960*** 0.243)
Wave 7 0.820** (0.378) 1.764*** (0.270) 0.973*** (0.262)
Wave 8 0.654 (0.494) 2.093*** (0.335) 1.032*** (0.319)
AA genotype 1.489** (0.607) 0.954** (0.401) 0.860* (0.465)
AT genotype 0.172 (0.309) −0.0794 (0.269) 0.213 (0.291)
Age 35–39.99 0.593*** (0.181) 0.498*** (0.123) 0.700*** (0.213)
Age 40–44.99 1.434*** (0.202) 1.068*** (0.146) 1.363*** (0.210)
Age 45–49.99 1.950*** (0.259) 1.438*** (0.183) 1.847*** (0.220)
Age 50–54.99 2.356*** (0.307) 1.657*** (0.224) 2.163*** (0.237)
Age 55–59.99 2.764*** (0.379) 1.886*** (0.269) 2.568*** (0.260)
Age 60–63 2.904*** (0.433) 2.026*** (0.312) 2.870*** (0.288)
Age 30–34.99 by AA −0.707 (0.570) 0.00615 (0.267) −0.759 (0.499)
Age 35–39.99 by AA −0.758 (0.523) −0.127 (0.304) −0.782 (0.527)
Age 40–44.99 by AA −1.255** (0.585) −0.216 (0.345) −0.942* (0.521)
Age 45–49.99 by AA −1.422** (0.648) −0.434 (0.408) −1.183** (0.533)
Age 50–54.99 by AA −1.511** (0.748) −0.412 (0.478) −1.091** (0.555)
Age 55–59.99 by AA −1.710* (0.880) −0.630 (0.557) −1.358** (0.585)
Age 60–63 by AA −1.760* (0.966) −0.737 (0.632) −1.378** (0.624)
Age 30–34.99 by AT 0.331 (0.242) 0.577*** (0.142) 0.505* (0.264)
Age 35–39.99 by AT 0.304 (0.277) 0.464** (0.193) 0.393 (0.331)
Age 40–44.99 by AT −0.186 (0.301) 0.325 (0.224) 0.253 (0.328)
Age 45–49.99 by AT −0.414 (0.375) 0.215 (0.271) 0.0164 (0.336)
Age 50–54.99 by AT −0.639 (0.427) 0.0510 (0.322) −0.0647 (0.351)
Age 55–59.99 by AT −1.009* (0.516) −0.208 (0.379) −0.473 (0.374)
Age 60–63 by AT −1.367** (0.577) −0.506 (0.434) −0.862** (0.401)
Wave 2 by AA 0.547** (0.253) 0.246 (0.209) 0.384 (0.321)
Wave 3 by AA 0.641* (0.354) 0.420 (0.268) 0.506 (0.343)
Wave 4 by AA 0.998** (0.414) 0.589* (0.316) 0.797** (0.361)
Wave 5 by AA 0.967* (0.502) 0.704* (0.377) 0.865** (0.390)
Wave 6 by AA 1.261** (0.629) 0.828* (0.445) 1.073** (0.430)
Wave 7 by AA 1.705** (0.724) 1.159** (0.497) 1.351*** (0.461)
Wave 8 by AA 1.674* (0.991) 1.003 (0.618) 1.276** (0.565)
Wave 2 by AT 0.357** (0.160) 0.221 (0.147) 0.269 (0.226)
Wave 3 by AT 0.522** (0.225) 0.300 (0.189) 0.328 (0.240)
Wave 4 by AT 0.640** (0.278) 0.342 (0.225) 0.419* (0.253)
Wave 5 by AT 0.925*** (0.332) 0.613** (0.268) 0.732*** (0.273)
Wave 6 by AT 1.247*** (0.417) 0.651** (0.318) 0.930*** (0.302)
Wave 7 by AT 1.539*** (0.486) 0.937*** (0.354) 1.249*** (0.323)
Wave 8 by AT 2.338*** (0.645) 1.340*** (0.439) 1.829*** (0.393)
Constant 23.45*** (0.204) 23.58*** (0.190) 23.36*** (0.175)
Observations 19,617 19,617 19,617
R
2
0.098 0.097 0.481
No. of individuals 3,720 3,720 3,720
No. of family fixed effects Not applicable Not applicable 1,414
Presented are estimates of an age–period model where the cohort variable and all interactions are not included in the specification. All age and period
variables are treated as discrete as indicated in Eq. S3. Each entry refers to the effect of the variable listed in the first column on BMI holding all other factors
constant. Robust SEs are presented in parentheses. The columns in this table differ based on what factors are accounted for and the method used to estimate
the statistical model. See Table S6 for the calendar time corresponding to examinations in each wave. Note that our main results of birth cohort and genotype
interactions are not sensitive to the method by which the model was estimated. The following indicate the statistical significance of each explanatory variable:
***P<0.01, **P<0.05, and *P<0.1.
Rosenquist et al. www.pnas.org/cgi/content/short/1411893111 10 of 10