Don’t birth cohorts matter? A
commentary and simulation exercise
on Reither, Hauser and Yang’s (2009)
age-period-cohort study of obesity
Andrew Bell a b and Kelvyn Jones a b
aSchool of Geographical Sciences
University of Bristol
bCentre for Multilevel Modelling
University of Bristol
2 Priory Road
Draft – please do not cite without permission
Last updated: 2nd September 2013
School of Geographical Sciences
University of Bristol
Acknowledgements: Thanks to Dewi Owen and Ron Johnston for their help; neither is responsible for
what we have written.
Reither et al. (2009) use a Hierarchical Age-Period-Cohort model (HAPC - Yang & Land, 2006) to
assess changes in obesity in the USA population. Their results suggest that there is only a minimal
effect of cohorts, and that it is periods which have driven the increase in obesity over time. We use
simulations to show that this result may be incorrect. Using simulated data in which it is cohorts,
rather than periods, that are responsible for the rise in obesity, we are able to replicate the period-
trending results of Reither et al. In this instance, the HAPC model misses the true cohort trend
entirely, erroneously finds a period trend, and underestimates the age trend. Reither et al.’s results
may be correct, but because age, period and cohort are confounded there is no way to tell. This is
typical of age-period-cohort models, and shows the importance of caution when any APC model is
used. We finish with a discussion of ways forward for researchers wishing to model age, period and
cohort in a robust and non-arbitrary manner.
Reither et al’s HAPC study suggest rises in obesity are caused by period effects
Simulations suggest that this result may be erroneous
Identical results were found with data simulated from an entirely different process
Results show the pitfalls of using APC models without critical forethought.
Age-period-cohort models, Obesity, Collinearity, Model identification
The desire to separate age, period and cohort (APC) effects has been a key feature of both the
medical and social sciences for a number of decades (Ryder, 1965). For at least the same period,
levels of obesity have been rising at a continuous rate, to the point that in 1997 it was classified by
the World Health Organisation as a global epidemic (Caballero, 2007). In 2009, Reither et al. (2009)
used the recently developed Hierarchical Age-Period-Cohort (HAPC) model (Yang & Land, 2006) to
assess the relative importance of periods and cohorts in the development of the obesity epidemic.
Whilst they found some significant cohort effects, the implication of their results was “that period
effects were principally responsible for the obesity epidemic” (Reither et al., 2009:1445), and this
result was repeated by Yang and Land (2013:215-222).
However, the possibility of separating APC effects is beset by an ‘identification problem’ due to the
fact that age, period and cohort when taken together are perfectly collinear. In this paper we show
that the HAPC model does not solve this identification problem, and therefore that the results found
by Reither et al. should be treated with some scepticism.
The purpose of this paper is twofold. The first substantive contribution is to add to the growing
debate in epidemiology regarding the causes of, and therefore possible solutions to, the obesity
epidemic. Whether periods or cohorts are responsible for changes in obesity is of profound
importance because it should affect how policy interventions are targeted. The second,
methodological, contribution is to assess the capabilities of age-period-cohort models, and the
dangers of using these models without critical forethought regarding their limits. In this we are
building on previous work (Bell & Jones, 2013a, b, c; Glenn, 2005; Luo, 2013; Luo & Hodges, 2013)
questioning the capabilities of the HAPC model and other methodological innovations to disentangle
We first outline the identification problem and Yang and Land’s proffered solution to it. Second we
briefly review the literature on the development of the obesity epidemic. Third we outline our
simulation design which we use to show that the results found by Reither et al. could have been
created by a different data generating process (DGP). Finally, we discuss the implications of this
both within obesity research and beyond, considering ways forward for researchers wishing to use
techniques like the HAPC model to make robust conclusions regarding APC effects.
2 The APC identification problem and Yang and Land’s HAPC Model
The conceptual distinction between age, period and cohort is well known (Bell & Jones, 2013a;
Suzuki, 2012). However despite this, there remains the problem of statistically modelling the three
effects because of the mathematical dependency between them:
As such, if we know the value of two of the terms, we will always know the value of the third. From
an ‘experimental’ standpoint, therefore, it is impossible to hold two of APC constant whilst varying
the third. Because of this, each of the following DGPs (and an infinite number more) would produce
identical values for a dependent variable Y:
Given such data, therefore, it would not be possible to tell which DGP actually produced the data.
These three instances presented here have very different substantive meanings, yet it would not be
possible to tell which of the three actually produced the data at hand1. It is for this reason that
many see a solution to the identification problem to be a logical impossibility:
“The continued search for a statistical technique that can be mechanically applied always to
correctly estimate the effects is one of the most bizarre instances in the history of science of
repeated attempts to do the logically impossible.”
Despite this, numerous supposed solutions to the identification problem have been proposed, each
of which imposes some kind of constraint on the model (Mason et al., 1973; Sasaki & Suzuki, 1987;
Tu et al., 2011; Yang et al., 2008). The problem arises when these constraints are not clearly stated,
are applied arbitrarily on the basis of statistical necessity, and are not grounded in any kind of
substantive theory. The models are generally very sensitive to such constraints and as such can
provide extremely misleading results when those constraints are not precisely justified and
Yang and Land’s proposed solution is to use a cross-classified multilevel model, which treats age as a
fixed effect and periods and cohort groups as random effects – contexts in which individuals reside.
The model can thus be specified (in the continuous Y case) as:
The dependent variable, is measured for individuals i in period j1 and cohort j2. The ‘micro’
model has linear and quadratic age terms, with coefficients and respectively; a constant
() that varies across both periods and cohorts; and a level 1 residual error term (). The
1 The technical consequence of this is that a regression with age, period and cohort as linear independent
variables will not be estimable (at least with OLS) because the design matrix XTX cannot be inverted.
macro model defines the intercept in the micro model by a non-varying overall intercept , and a
residual term for each of period and cohort. The period, cohort and level-1 residuals are all assumed
to follow Normal distributions, each with variances that are estimated.
Putting age in the fixed part and period and cohort in the random part is conceptually attractive; but
also, it is argued by Yang and Land that this distinction solves the identification problem:
"An HAPC framework does not incur the identification problem because the three effects are
not assumed to be linear and additive at the same level of analysis"
(Yang & Land, 2013:191)
In addition to this, Yang and Land suggest that the inclusion of the quadratic term for age helps to
further resolve the identification problem:
the underidentification problem of the classical APC accounting model has been resolved by
the specification of the quadratic function for the age effects.
(Yang & Land, 2006:84)
However, it has been shown elsewhere that this methodological advance in fact amounts to another
constraint (Luo & Hodges, 2013), and simulation studies have shown that the use of this model,
without critical forethought, can lead to misleading results (Bell & Jones, 2013a).
3 The Obesity Epidemic
Historically, obesity was a rare affliction, predominantly affecting those of high socio-economic
status (Caballero, 2007). However, levels of obesity increased throughout the twentieth century,
particularly amongst those of lower socio-economic status and education levels (Visscher et al.,
2010). A number of reasons for this have been proposed, including the more sedentary lifestyle
associated with the technological advances of the modern world (Rokholm et al., 2010), and the
greater availability, portion size and fat content of food (Hill & Peters, 1998). However, the question
remains as to whether it is via periods or cohorts that these changes occur. If the former, it would
suggest that changes in lifestyle have affected all age groups equally, resulting in bad diets and low
levels of exercise for all individuals. In contrast, the latter would suggest that these cultural changes
particularly affect people in their formative years, and these changes have affected their behaviour
and possibly their physiological resistance to obesity throughout their subsequent life-course. In the
same vein, interventions to the obesity epidemic should be similarly targeted to the groups most
affected. If cohorts are responsible for changes in obesity, then policy interventions should be
focused on children in their formative years because interventions targeted at adults are likely to be
Reither et al. argue that the obesity epidemic is the result predominantly of periods, and their
results are shown graphically in the third column of figure 1. They argue that “the pattern of
predicted probabilities for U.S. adults shows a monotonic increase over time, with no sign of
abatement in recent periods of observation” (Reither et al., 2009:1443). Similarly, Allman-Farinelli
et al. (2008) find that period effects are the driving force of changes in their APC analysis of obesity
in Australia, whilst Rokholm et al. (2010:843) argue that the slight levelling off of the obesity
epidemic observed in recent years “occurred at approximately the same time for different age
groups”. However, other studies find evidence that cohorts have the greater influence on obesity:
for example Olsen et al. (2006) find that non-linearities in cohort trends match for different age
groups, but do not match for periods. However, we argue that all of the methods used above have
flaws, relying on un-testable assumptions (explaining why the results that have been found are so
contradictory). The next section takes the results found by Reither et al (2009) and shows that those
results could have been found with a very different data generating process (DGP).
4 Simulation exercise
Reither et al (2009) found a strong, approximately linear trend in periods, and very little in terms of a
trend in cohorts (see figure 1). However, we have argued that, because of the identification
problem, these results could have arisen from a very different DGP. In order to test this, we ran the
model used by Reither et al using the following DGP:
for cohorts, for periods
Here, Y will equal 1 for an individual who is obese, and 0 otherwise. The period and cohort residuals
were generated to be Normally distributed, with a variance of 0.01. Crucially, in this DGP we include
a linear cohort effect2 of 0.04, and an age effect that is 0.04 larger than that found by Reither et al
(2009). We do not include the period trend found by Reither et al, so this part of the DGP is just
random fluctuations from year to year.
This data, generated with this known functional form, was then fitted to a logistic version of Yang
and Land’s HAPC model:
2 It could be argued that it is unlikely that such a cohort (or period) trend would never be generated in real life
in this way, because periods and cohorts are intrinsically random (in contrast to age which has a fixed range
and so should be treated as a fixed covariate). However, the model is unable to tell whether a trend is the
result of a ‘random’ and fleeting upward fluctuation or a consistent linear trend over all possible time
periods/cohorts, since the resulting data sample would be much the same. This is especially the case when the
trend found (by Reither et al.) is very much linear in appearance and interpreted as such (“a monotonic
increase over time” – Reither et al 2009:1443). As such a linear trend is an appropriate means of generating
the data for this situation.
It is implicitly being assumed that any period or cohort trend is appropriately picked up by the period
or cohort residuals, since no fixed effect is specified for such trends.
Reither et al. (2009:1442) use 5-year groups to define their cohorts in the models that they fit. This
is “conventional in demography”, but Reither et al. argue that a further advantage of this grouping is
that they “function as equality constraints” which help to identify the model. In previous work (Bell
& Jones, 2013a) it has been shown that the HAPC model is able to correctly estimate trends when
groupings exactly match groupings in the data generating process, but not when those groupings are
chosen by arbitrary convention as they appear to have been here. It is therefore important to
evaluate the possible effect that this grouping has on the question at hand. We therefore examined
three grouping scenarios:
No grouping (i.e. 1 year birth cohorts) in either the DGP or the fitted HAPC model
No grouping in the DGP, but HAPC model fitted with 5-year birth cohorts
7-year birth cohorts in the DGP but HAPC model fitted with 5-year birth cohorts.
Since it is unlikely that we would ever know the exact cohort groups in the DGP (if they are present
at all), we do not test a model where the groupings in the DGP and the fitted models match.
The simulations were conducted in a similar way to those in Bell and Jones (2013a), using Bayesian
MCMC methods (Browne, 2009) in MLwiN version 2.28 (Rasbash et al., 2013), through Stata using
the runmlwin command (Leckie & Charlton, 2013). True values from the DGP were used as starting
values as non-informative priors, and the model was run for 20,000 iterations, following a 1000
iteration burn-in. In order to assess convergence of the chains, a sample of parameter trajectories
were visually inspected. In addition, a version of the Potential Scale Reduction Factor (Bell & Jones,
2013a; Brooks & Gelman, 1998) and Effective Sample Size were calculated for all parameters. The
Stata code for these simulations can be found in the online appendix.
For each grouping scenario 1000 separate simulations were conducted. We have reported the
results from the model with the median value for the coefficient associated with age for the scenario
where cohorts were grouped in 7-year intervals and modelled with 5-year intervals. We did this,
rather than averaging over all 1000 simulations, because the mean results could not have been
estimated from a single dataset generated by our DGP (for example, random variation in residuals
would be averaged out). The results found are, however, typical of all simulations in all grouping
The results are shown in the second column of figure 1, alongside the true DGP (column 1) and the
results found by Reither et al (column 3). As can be seen, the typical median result does not match
the DGP at all. No cohort trend is found, an erroneous period trend is found, and the age effect is
underestimated. In fact, the results found very closely resemble those found by Reither et al. The
implications of this, of course, is that the same DGP could have generated Reither et al.’s data, and
the results that they found could be as misleading as the results found in the simulation.
[Figure 1 about here]
To be clear: we are not saying that the results found by Reither et al. are necessarily incorrect.
However, Reither et al. (2009:1444) argue that their results are “unambiguous” and it is with this
that we take issue. There is no reason to think that Reither et al’s results are the true DGP, rather
than the DGP we used to generate our data here. This is important for policy makers considering
possible interventions to the obesity epidemic. Whilst Reither et al’s results suggest that
interventions should be targeted to all age groups, the alternative explanation offered by us would
suggest that interventions would be better targeted at children in their formative years.
3 A small minority (~5%) of results for the scenarios with mismatched groupings (between the DGP and the
fitted model) produced different results, including results that were correct according to the DGP. However
this did not occur when the cohorts were ungrouped. We have not reported these, given that a model that is
right less than 10% of the time is not particularly useful.
Reither et al are not alone in finding results that may be misleading using the HAPC model.
Dassonneville (2012) finds a period trend in voter turnout volatility, going against the literature
which tends to find cohort effects to be most significant. Much like the period trend found by
Reither et al, this could be an incorrect finding. Other studies (Piontek et al., 2012; Schwadel, 2010)
similarly use the HAPC model to find period and cohort trends which may be over-interpretations of
So what should the researcher of APC effects do? Where there are no trends in the periods or
cohorts, the HAPC model works well, meaning that it can be used to assess random variation in
periods and cohorts. This assumes that there are not equal and opposite linear period and cohort
trends (which, would cancel each other out, with the model estimating a spurious age effect rather
than the true period and cohort trends), but this is an assumption that researchers are often willing
to make. Furthermore, there remains the possibility that cohort and/or period residuals remain
autocorrelated, even when there is no trend; the model can be extended to incorporate this
autocorrelation into the model (Stegmueller, 2013).
Where trends do exist, one option would be to make a decision based on theory as to which of
periods or cohorts are most likely to have generated the data, and include that term in the HAPC
model as a linear fixed effect (Bell & Jones, 2013a). This decision cannot, however, be made on the
basis of the data, since a model with period and age fixed linear trends will fit the data as well as one
with age and cohort fixed linear trends. This is confirmed by simulations using the same DGP as
above, (but with a period or cohort linear term included in the fixed part of the fitted model). The
results of these are displayed in Table 1.
[Table 1 about here]
In the case of this study, where the purpose of the research is to examine which of period and
cohort are most likely to be the cause of the epidemic, researchers could assess the age trend that is
found and decide whether it seems likely. In the case of the modelled results here, we would argue
that the age effect that we generated is more plausible than that found by Reither et al (2009).
Whilst we would expect some decline in obesity at older ages – due to physiological reasons and
survival bias (Villareal et al., 2005) – we would not expect it to be as large or as early in life as that
found by Reither et al. (Villareal et al., 2005; Visscher et al., 2010). Of course it is also possible that
the rise of obesity is the result of a mixture of period and cohort effects, in which case the true DGP
would be somewhere in between those found in figure 1. Where there are very good theoretical
(e.g. physiological) reasons to believe the age trend is known, such belief could be incorporated into
the model (e.g. see Tilley & Evans, 2013) potentially in a Bayesian way using strong informative
priors (Browne, 2009; Jackman, 2009). However, that would involve a theoretical judgement which,
again, cannot be confirmed simply on the basis of the data.
Overall, we hope that this commentary will push researchers towards engaging in more critical
forethought regarding APC effects. With appropriate constraints, based on theoretical plausibility
rather than statistical necessity, techniques like the HAPC model can be useful in modelling possible
APC combinations, so long as those constraints are explicitly stated by the authors. When those
constraints are unclear, or not known about in the first place, misleading results may be produced.
Allman-Farinelli, M.A., Chey, T., Bauman, A.E., Gill, T., & James, W.P.T. (2008). Age, period and birth
cohort effects on prevalence of overweight and obesity in Australian adults from 1990 to
2000. European Journal of Clinical Nutrition, 62, 898-907.
Bell, A., & Jones, K. (2013a). Another 'futile quest'? A simulation study of Yang and Land's
Hierarchical Age-Period-Cohort model. Under review. Available at
a32cab9e4193776576e41c/dl.pdf [Accessed 19th April 2013].
Bell, A., & Jones, K. (2013b). Current practice in the modelling of Age, Period and Cohort effects with
panel data: a commentary on Tawfik et al (2012), Clarke et al (2009), and McCulloch (2012).
Quality and Quantity, in press.
Bell, A., & Jones, K. (2013c). The impossibility of separating age, period and cohort effects. Social
Science & Medicine, 93, 163-165.
Brooks, S.P., & Gelman, A. (1998). General methods for monitoring convergence of iterative
simulations. Journal of Computational and Graphical Statistics, 7, 434-455.
Browne, W.J. (2009). MCMC estimation in MLwiN, Version 2.25. University of Bristol: Centre for
Caballero, B. (2007). The global epidemic of obesity: An overview. Epidemiologic Reviews, 29, 1-5.
Dassonneville, R. (2012). Questioning generational replacement: an age, period and cohort analysis
of electoral volatility in the Netherlands, 1971-2010. Electoral Studies, 32, 37-47.
Glenn, N.D. (2005). Cohort Analysis. London: Sage.
Hill, J.O., & Peters, J.C. (1998). Environmental contributions to the obesity epidemic. Science, 280,
Jackman, S. (2009). Bayesian Analysis for the Social Sciences. Chichester: Wiley.
Leckie, G., & Charlton, C. (2013). runmlwin: A program to run the MLwiN multilevel modelling
software from within Stata. Journal of Statistical Software, 52.
Luo, L. (2013). Assessing Validity and Application Scope of the Intrinsic Estimator Approach to the
Age-Period-Cohort Problem. Demography, in press.
Luo, L., & Hodges, J. (2013). The cross-classified age-period-cohort model as a constrained estimator.
Under review. Available at http://paa2013.princeton.edu/papers/132093 [Accessed 16th
Mason, K.O., Mason, W.M., Winsborough, H.H., & Poole, K. (1973). Some methodological Issues in
cohort analysis of archival data. American Sociological Review, 38, 242-258.
Olsen, L.W., Baker, J.L., Holst, C., & Sorensen, T.I.A. (2006). Birth cohort effect on the obesity
epidemic in Denmark. Epidemiology, 17, 292-295.
Piontek, D., Kraus, L., Pabst, A., & Legleye, S. (2012). An age-period-cohort analysis of cannabis use
prevalence and frequency in Germany, 1990-2009. Journal of Epidemiology and Community
Health, 66, 908-913.
Rasbash, J., Charlton, C., Browne, W.J., Healy, M., & Cameron, B. (2013). MLwiN version 2.28.
University of Bristol: Centre for Multilevel Modelling.
Reither, E.N., Hauser, R.M., & Yang, Y. (2009). Do birth cohorts matter? Age-period-cohort analyses
of the obesity epidemic in the United States. Social Science & Medicine, 69, 1439-1448.
Rokholm, B., Baker, J.L., & Sorensen, T.I.A. (2010). The levelling off of the obesity epidemic since the
year 1999 - a review of evidence and perspectives. Obesity Reviews, 11, 835-846.
Ryder, N.B. (1965). The cohort as a concept in the study of social change. American Sociological
Review, 30, 843-861.
Sasaki, M., & Suzuki, T. (1987). Changes in Religious Commitment in the United-States, Holland, and
Japan. American Journal of Sociology, 92, 1055-1076.
Schwadel, P. (2010). Age, period, and cohort effects on US religious service attendance: The
declining impact of sex, southern residence, and Catholic affiliation. Sociology of Religion,
Spiegelhalter, D.J., Best, N.G., Carlin, B.R., & van der Linde, A. (2002). Bayesian measures of model
complexity and fit. Journal of the Royal Statistical Society Series B-Statistical Methodology,
64, 583-616. Available at <Go to ISI>://000179221100001
Stegmueller, D. (2013). Bayesian hierarchical age-period-cohort models with time-structured effects:
an application to religious voting in the US, 1972-2008. Electoral Studies, in press.
Suzuki, E. (2012). Time changes, so do people. Social Science & Medicine, 75, 452-456.
Tilley, J., & Evans, G. (2013). Ageing and generational effects on vote choice: combining cross-
sectional and panel data to estimate APC effects. Electoral Studies, in press.
Tu, Y.K., Smith, G.D., & Gilthorpe, M.S. (2011). A new approach to age-period-cohort analysis uing
partial least squares regression: the trend in blood pressure in the Glasgow alumni cohort.
Plos One, 6.
Villareal, D.T., Apovian, C.M., Kushner, R.F., & Klein, S. (2005). Obesity in older adults: technical
review and position statement of the American Society for Nutrition and NAASO, The
Obesity Society. American Journal of Clinical Nutrition, 82, 923-934.
Visscher, T.L.S., Snijder, M.B., & Seidell, J.C. (2010). Epidemiology: definition and classification of
obesity. In P.G. Kopelman, I.D. Caterson, & W.H. Dietz (Eds.), Clinical Obesity in Adults and
Children (pp. 3-14). Oxford: Blackwell.
Yang, Y., & Land, K.C. (2006). A mixed models approach to the age-period-cohort analysis of
repeated cross-section surveys, with an application to data on trends in verbal test scores.
Sociological Methodology, 36, 75-97.
Yang, Y., & Land, K.C. (2013). Age-Period Cohort Analysis: New models, methods, and empirical
applications. Boca Raton, FL: CRC Press.
Yang, Y., Schulhofer-Wohl, S., Fu, W.J.J., & Land, K.C. (2008). The intrinsic estimator for age-period-
cohort analysis: What it is and how to use it. American Journal of Sociology, 113, 1697-1736.
Figure 1: Age (row 1) cohort (row 2) and period (row 3) effects on obesity, according to the true self-
generated DGPa (column 1), the median simulation result (column 2) and the result found by Reither
et al., 2009 (column 3).
a Simulation here is from the scenario where cohorts were grouped in 7-year intervals in the DGP and 5-year
intervals in the fitted model. However the results of the other grouping scenarios were substantively similar.
Table 1: Mean fixed effects and model fit criterion (DIC - see Spiegelhalter et al., 2002) for simulation
results with a model using (1) age and cohort, and (2) age and period, as fixed linear effects. It can
be seen that, if anything, the DIC supports the incorrect specification.
1. Age and Cohort
2. Age and Period
a Cohorts here were grouped by 7-year intervals in the DGP and by 5-year intervals in the model (but the
results were substantively similar to the other grouping scenarios tested).