Comparing population distributions from
bin-aggregated sample data: An application to
historical height data from France
Jean-Yves Duclos∗, Josée Leblanc†, David Sahn‡
20th April 2009
This paper develops a methodology to estimate the entire population dis-
tributions from bin-aggregated sample data. We do this through the estima-
tion of the parameters of mixtures of distributions that allow for maximal
parametric flexibility. The statistical approach we develop enables compar-
isons of the full distributions of height data from potential army conscripts
across France’s 88 departments for most of the nineteenth century. These
comparisons are made by testing for differences-of-means stochastic dom-
inance. Corrections for possible measurement errors are also devised by
taking advantage of the richness of the data sets. Our methodology is of
interest to researchers working on historical as well as contemporary bin-
aggregated or histogram-type data, something that is still widely done since
much of the information that is publicly available is in that form, often due
to restrictions due to political sensitivity and/or confidentiality concerns.
Key words: Health, health inequality, aggregate data, 19th- century France,
JEL Classification: C14, C81, D3, D63, I1, I3, N3.
∗Institut d’Anàlisi Econòmica (CSIC), Barcelona, Spain, and Département d’économique and
CIRPÉE, Université Laval, Canada; email: firstname.lastname@example.org
†Department of Finance, Ottawa, Canada; email: Leblanc.Josee@fin.gc.ca
‡Cornell University, Cornell, USA; email: email@example.com
There are many reasons to consider dimensions of well-being other than in-
come or expenditure, both normative and practical. Following Sen (1985) and
others, for example, one may wish to consider well-being as multidimensional,
comprising characteristics such as , good health, nutrition, literacy, and freedom
of association. Income may be instrumentally important to achieve these ends, but
it is the capabilities themselves that are intrinsically important and merit recogni-
tion and measurement in their own right. Poverty can thus be defined as depriva-
tion of basic capabilities or the failure of certain basic functionings, not just low
levels of income. Deprivation of capabilities can also in turn contribute to low
material standards of living.
This paper focuses on health, which is certainly important in a multidimen-
sional understanding of well-being. In fact, even in a purely unidimensional wel-
farist framework, it can be argued that health contributes to welfare at least as
much as income. Income may not even be a sound directional indicator of overall
welfare in some environments. As Floud (1984) writes, “[t]here is little point in
an improvement in real wages which is bought at the expense of a miserable life
and an early death”. The example of the United States, for instance, shows that
although health and economic growth have generally converged during the twenti-
eth century, they diverged during the nineteenth century (“the antebellum period”)
(Costa and Steckel 1997); strong economic growth during the nineteenth century
coincided inter alia with a decrease in body-heights. Whether income is more
indicative of welfare than health in such circumstances is then open to debate.1
Beyond the conceptual arguments, there are many practical reasons to mea-
sure well-being in non-income dimensions. First, measurement difficulties may
be less of a problem for some non-income variables. Collecting income (or expen-
diture) data is a complex procedure that contrasts, for instance, with the relatively
straightforward procedure of collecting anthropometric data — data that also suf-
fer typically less from misreporting. Measurement errors can of course still affect
such data, but anthropometric data (unlike other self-reported and subjective mea-
sures of health) are more likely to be uncorrelated with important variables of
interest — such as the welfare variable itself. Note also that health can be more
1More generally, measures of health are often not highly correlated with incomes, either within
a given country or across countries (Haddad and Ahmed 2003, Behrman and Deolalikar 1988,
Behrman and Deolalikar 1990, Appleton and Song 1999), suggesting among other things that
health variables may provide significant information on welfare that is not captured by income
easily measured at the individual rather than at the household level, thus largely
avoiding the need to make difficult assumptions on the well-being of individuals
which requires an assessment of both the needs of individuals and how resources
are allocated among household members relative to needs.
Perhaps most importantly, the choice of a welfare indicator obviously depends
on the availability of data. Research on the distribution of well-being routinely
takes advantage of the availability of large-scale micro-data sets that provide de-
tailed information on money-metric and non-monetary indicators2. The choice of
welfare indicators is much more constrained for studies covering earlier historical
periodswhere the available data is more limited and often of variable quality. It
may for instance be the case that only part of the initially gathered welfare infor-
mation was preserved, or that it comes from sources whose primary goal was not
to capture data representative of the population3. Furthermore, historical house-
hold level data on incomes and expenditures are particularly difficult to collect
since economies were less monetized, transactions were often in-kind, consump-
tion was largely from home consumption rather than market purchases, and tax
authorities did not have good records of gathering and verifying information on
Fortunately, quality historical data are widely available from many countries
on one of the clearest manifestations of health and nutritional status, stature. It
is also now well established that one of the best global indicators of living condi-
tions is height, standardized for age and gender (de Onis, Frongillo, and Blossner
2000). More specifically,stature is the outcome of a combination of inputs that
affect nutrition and disease, such as the local health environment, access to clean
2Some of the non-monetary indicators found in the literature include body-height (Fogel, En-
german, Floud, Steckel, Trussell, Wachter, Sokoloff, Villaflor, Margo, and Friedman 1982, Steckel
and Floud 1997, Wagstaff 2002, Pradhan, Sahn, and Younger 2003, Sahn and Younger 2005),
body mass index (Costa and Steckel 1997), amount of abdominal fat (Costa and Steckel 1997),
birth weight (Costa 1999), life expectancy (Whitwell, Souza, and Nicholas 1997, Goesling and
Firebaugh 2004), the overall mortality rate or mortality before a certain age (Costa and Steckel
1997, Floud and Harris 1997, Weir 1997, Whitwell, Souza, and Nicholas 1997, Wagstaff 2002),
the prevalence of chronic or severe illness (Costa and Steckel 1997, Wagstaff 2002, Anson and
Sun 2004), the individual’s own assessment of his or her health (Deaton and Paxson 1998, Nolte
and McKee 2004), disabilities (Anson and Sun 2004), difficulties accomplishing tasks (Anson and
Sun 2004), and mental illness (Anson and Sun 2004).
3For example, Costa (1999)’s data on the birth weight of babies were obtained from hospital
archives. These archives are not necessarily complete, since hospitals will not have kept every
patient file since 1848; moreover, the hospital registries may represent a biased sample of the pop-
ulation, since wealthier women were over-represented among those who gave birth in hospitals.
water, nutrient intake, maternal health status, health technology, the organization
of work, and so forth. In short, stature captures “multiple dimensions of the indi-
vidual health and development and their socio-economic and environmental deter-
minants” (Beaton, Kelly, Kevany, Martorell, and Mason 1990)4. And in particular,
heights of young men entering adulthood is a cumulative indicator of their overall
health and nutritional status during their formative years, particularly the period
prior to the beginning of puberty.
Economic historians have thus expended considerable effort to examine
changes in anthropometric outcomes of various populations over time (Fogel, En-
german, Floud, Steckel, Trussell, Wachter, Sokoloff, Villaflor, Margo, and Fried-
man 1982, Steckel and Floud 1997, Weir 1997, Deaton and Paxson 1998 and
Goesling and Firebaugh 2004).5An important concern that arises in how histo-
rians use anthropometric indicators is whether the summary health statistics they
employ, particularly measures of central tendencies, can adequately capture the
distribution of health. Indeed, it is increasingly recognized that looking at en-
tire distributions of health, just as economists have long done with incomes, can
provide valuable information that would otherwise be hidden by summary health
statistics, such as means and the share of the population that falls before a norma-
tive standard, or cut-off point, that may define poor health6.
This paper gives prominence to the distributional analysis of health by exam-
ining both the evolution and the distribution of heights throughout France in the
nineteenth century. More specifically, the paper uses a particularly rich data set
collected on men who were called up for possible conscription into the French
army during this period. The screening of all men at the age of 20 for manda-
4This is also supported by the existence of a significant correlation between body-heights and
various indicators of health — see for instance Wagstaff (2002) — suggesting that the choice of
an indicator other than body-height would yield similar results. Moreover, adult stature is not only
a good indicator of prior episodes of, infection and chronic disease, but it has also been shown to
be an important determinant of risk of morbidity and mortality.
5A number of other studies have also looked at health inequalities, using statistics such as
covariances (such as Deaton and Paxson 1998 and Anson and Sun 2004), differences between
various percentiles (Costa 1999), distances from the mean for different social classes (Anson and
Sun 2004), or inequality indices (e.g., Wagstaff 2002, Pradhan, Sahn, and Younger 2003, Goesling
and Firebaugh 2004, Sahn and Younger 2005). Studies of health inequality have also attempted to
explain whether inequality in some factors, such as income, can transmit itself into health inequal-
ity — see for instance Weir (1997), Anson and Sun (2004), and Nolte and McKee (2004)). Other
studies have decomposed the evolution of health inequality across factors, such as Pradhan, Sahn,
and Younger (2003), Goesling and Firebaugh (2004), Sahn and Younger (2005), and Wagstaff
6For instance, see Pradhan, Sahn, and Younger (2003) and Deaton (2003).
tory military service involved a physical examination, including measuring their
height. Thus, we have a virtually complete census of heights for each year, dis-
aggregated by administrative department. Data in such abundance for a whole
century are arare find. Consequently, socio-economic improvements as well as
periods of adverse conditions in 19th-century France can be expected to have an
observable effect measured by the stature achieved at 20 years of age - which is
what we measure in our data.
While individual heights were recorded at the time of conscription, the data
we have available are limited to the number of men that fall into classes, or bins
of height intervals, for each year and department. This raises obvious challenges
given our objective to compare the entire distributions of heights. These difficul-
ties are not unique to our data, or the paper’s historical period of interest. Even
today, much of the information that is used and publicly available on distributions
of income come from bin-aggregated, or histogram-type, data — important exam-
ples of this are the popular World Income Inequality and POVCAL databases7. In
several countries, the unit data from early household surveys have not survived; in
other cases, access to distant and/or more recent microdata is restricted by politi-
cal sensitivity and confidentiality concerns. We therefore propose and implement
sample data through the estimation of the parameters of mixtures of distributions
that allow for maximal parametric flexibility8. While we only apply our method
to the historical height data from France, this approach will be of general interest
to both historians and other researchers working on contemporary bin level data
on household incomes, heights, and other indicators of well-being.
Our data are abundant, consisting of more than 6000 different distributions
of heights for the period 1819 to 1900 over 90-some departments. Using the
sampling distribution of the estimators of the means and of the cumulative dis-
tributions of heights, we test for differences in means and also implement tests
for robust comparisons across time and regions. We are also able to correct our
inference procedures for the possible presence of measurement errors. This is
rarely possible to do with the usual data used for comparing welfare; it is made
feasible here by the richness of the data. We correct for measurement errors by
measuring and taking into account the noise that is (possibly) introduced by mea-
surement errors in the many year-to-year comparisons of the distributions that are
7See www.wider.unu.edu/wii d/wiid.htm and www.worldbank.or g/LSMS/tools/povcal/
8See for instance Bandourian, McDonald, and Turvey (2002) for a review of some of the more
restrictive functional forms that have been proposed in the literature.
made possible by our data. This renders the broader comparisons in which we are
interested robust to both sampling and measurement errors. Again, this is done by
estimating the parameters of mixtures of distributions with the maximum degree
of parametric flexibility and by therefore exploiting all the statistical information
that is present in the available height data.
In short, this paper develops a methodology that allows us to estimate the en-
tire population distributions from the bin-aggregated sample data. We go on to
illustrate how this methodology can be applied to a rich data set from France of
20-year-old army conscripts, and thereafter employ the generated distributions to
address one of many possible questions on the evolution and the distribution of
health and welfare in 19th-century France. These data are introduced in Section
2. The methodology for deriving from bin data the entire distribution as well as
the average height of each department-year is described in Section 3.1 and extracts
the maximum possible amount of information from the data. Statistical tests of
differences in mean heights and in the entire distributions are performed using the
test statistics and sampling distributions presented in Section 3.2. Further details
in carrying out stochastic dominance tests appear in Section 3.3. The empirical
results are presented in Section 4, where two main questions are more particu-
larly considered to illustrate how our statistical procedures can be used. The first
question considers the evolution of body-heights in France from the beginning to
the end of the century; the second deals with the regional correlates of the distri-
bution of body-heights. Section 5 concludes the paper. The Appendix in Section
6 provides more technical details on the data, the estimation procedures, and the
adjustment for measurement errors.
The data we use are from the registries of potential conscripts into the French
army for the period 1819 to 1900, covering the 90-some departments of France9.
More precisely, they are from the "Comptes rendus statistiques et sommaires."
Depending on the year, they constitute either a "nearly" random sample, or a com-
plete census of young French men aged 20 years old for each year of the century
(this point is addressed in greater detail in the Appendix on page 22). They consist
of 6369 different datasets, each representing the distribution of body-heights for a
9We are very grateful to Gilles Postel-Vinay for his generous assistance in making available to
us, and helping us better understand the data. More details on the data set are also found in Sahn
and Postel-Vinay (forthcoming).
given department and a given year. In total, we have measurements on close to 15
million young French men over the course of the century.
A particularity of these data is that they are not available in a totally disag-
gregated form. Rather, they are grouped into "classes," each of which contains
individuals within a specific body-height range. The available data thus report the
number of individuals in each of these classes. Furthermore, neither the number
of classes, nor the boundaries between bins, were constant throughout the century.
The data from 1819 to 1829 are divided into 15 classes, those from 1830 to 1871
into 16 classes, and those from 1872 to 1900 into 9 classes. The class boundaries
for the various years are reproduced in Table 1. Note that the classes were based
on the old imperial measurements until 1867 (1 French inch = 27.07 millimeters)
so that 1.570 meters corresponds to 4 feet 10 inches, 1.597 meters to 4 feet 11
inches, etc. The metric measurements we employ are the public equivalent to the
imperial measurements used during these years.
These registries contain a considerable amount of information. Indeed, the
number of conscripts measured each year varies between one-quarter, and all, of
the male population aged 20. Between 1819 and 1830, approximately 80,000 men
were measured each year; from 1836 to 1885, approximately 150,000; and as of
1886, approximately 300,000. Only during this final period was the entire male
population aged 20 measured each year.
3Estimation and inference
3.1 Estimation of complete distributions
That the data available for this study are grouped into classes raises difficulties
if the objective is to compare complete distributions of heights. To address this
challenge, we first need to estimate the continuous population height distributions
using the discontinuous sample histograms that are available. To do this, we solve
a system of (C − 1) equations in (C − 1) unknowns, where C is the number of
classes into which the heights are regrouped in the aggregated sample data. Each
of these equations captures the probability of belonging to one of the C classes of
height. Equation (1) defines such a probability for the class c of heights x, a class
whose lower bound is xcand upper bound is xc:
F(xc;ˆΘ) − F(xc;ˆΘ) =ˆHc,
parameters for this function, andˆHcis the proportion of heights between xcand
xcthat is observed in our data.
F is specified as a mixture of normal distributions, namely, as a weighted
sum of several normal distributions. We let the mixture use as many parameters
as is statistically possible given the grouped form of our data. This mixture of
normal distributions thus allows for the maximal possible amount of estimation
flexibility. Note that since the normal distribution is smooth, this property will
also be imposed on our estimated population height distributions.
Equation (2) provides an example of a “mixture” of three normal distributions
with a set of 9 parameters:
F(x;α1,α2,α3,µ1,µ2,µ3,σ1,σ2,σ3) = α1Φ
?x − µ1
?x − µ3
?x − µ2
+ (1 − α1− α2) Φ
where Φ is the distribution function of the standard normal distribution, and where
αd, µdand σdcorrespond respectively to the weight, the mean, and the standard
error of normal distribution d. Note that since α1+ α2+ α3 = 1, we can set
α3 = 1 − α1− α2. There are therefore 8 “free” parameters in (2). Thus, a
mixture of D ≥ 1 normal distributions contains 3D−1 free parameters. Similarly,
there are only (C − 1) “degrees of freedom” in data aggregated into C classes of
heights, since the probabilities of belonging to one class is one minus the sum
of the probabilities of belonging to the others. Hence, the problem is to solve a
system of equations such as (1), with c = 1,...,C − 1, using theˆHcobserved
probabilities of belonging to C classes and choosing D = C/3.
For some years, however, C/3 is not an integer. For those years we use instead
mixtures of (C + 1)/3 or (C + 2)/3 distributions, setting the last one (σC+1)
or two (µC+2and σC+2) parameters in the mixture to some pre-specified values.
These values are chosen as those that are estimated in the 1880 distribution of
individual heights, which is the only year for which we have access to the entire
set of individual-level data.
More technical details on the above estimation procedure can be found in the
Appendix, Section 6.2.
3.2 Test statistics
Once the parametersˆΘ in (1) are estimated for each year and each department,
we can proceed to assess the evolution of the distributions of health in nineteenth-
century France. We do this in two ways: first by comparing mean body-heights,
which is one of the most common procedures in the literature, and second by
comparing “health poverty rates”. Note that these poverty rates will be compared
across ranges of possible “health poverty lines”, which will amount to testing for
stochastic dominance of height distributions.
Mean height can be estimated as
The height poverty rate (“the poverty headcount”) is the proportion of individuals
below a height poverty line. Computing the poverty rate consists of evaluating the
distribution function at the poverty line z:
Once estimated, ˆ µ and F(z;ˆΘ) can be compared across departments and years.
For the comparisons of means, the null hypothesis is that the mean of distribution
B does not exceed the mean of distribution A, and the alternative hypothesis is
that it does. The test statistic that we use is then
ˆ µ =
ˆ µB− ˆ µA
var(ˆ µB) + ?
var(ˆ µ) =1
(x − ˆ µ)2dF(x;ˆΘ)
and n (which is always well above 500 in our data) stands for the number of sol-
diers over whom the aggregated bin data have been computed. Under the assump-
tion that population heights follow the flexible form given by (2) and that the two
means µBand µAare equal, the statistic (5) can be shown to follow asymptotically
a normal distribution with mean zero and unit variance. At the conventional 5%
level, the above null hypothesis will then be rejected if (5) is greater than 1.645.
For ordering poverty headcounts10, we use the test statistic
F(z;ˆΘA) − F(z;ˆΘB)
var(F(z;ˆΘB)) + ?
var(F(z;ˆΘi)) =F(z;ˆΘi) (1 − F(z;ˆΘi))
Under the null hypothesis of equality of the two distribution functions at z, and
that population heights follow the flexible form given by (2), the distribution
of F(z;ˆΘA) − F(z;ˆΘB) is asymptotically normal with mean 0 and variance
Note that the expressions ˆ µ, ?
are also as distribution-free as they can be. Note furthermore that the asymptotic
result for (7) is valid for the z located at the frontiers of the bins of the aggregated
data even when population heights do not follow exactly the flexible form given
by (2), since at such z we can estimate (7) and (8) directly from theˆHcin (1).
var(F(z;ˆΘB)) + ?
computed once the parametersˆΘ in system (1) are estimated. SinceˆΘ contain the
var(F(z;ˆΘA)) (since the samples from A and B are indepen-
var(ˆ µ), F(z;ˆΘA) and ?
var(F(z;ˆΘi)) are readily
A poverty comparison that uses (7) depends on the choice of the line z. It is
also evidently dependent on the choice of the distribution function as a “poverty
index”. To make the paper’s poverty comparisons more robust to such choices,
stochastic dominance tests can be performed by comparing poverty rates over
ranges of poverty lines. Pushing this approach farther, one can also compare cu-
mulative height distributions over the entire range of possible heights.
To see what this implies in terms of poverty rankings, note that the poverty
headcount F belongs to a general class of poverty indices, denoted as Π1(z+),
that can be defined with the help of two simple axioms and of a condition (see for
instance Duclos and Araar 2006, Part III). The first axiom, a monotonicity axiom,
says that an increase in the body-height of any one individual (provided that no
one else’s body-height decreases) should (weakly) reduce the value of a poverty
10See for instance Davidson and Duclos (2000, Theorem 1).
index. The second axiom, a symmetry axiom, says that interchanging the body-
height of any two individuals should not affect the poverty index. The condition
is that the poverty index should use a poverty line that is below z+.
We then say that distribution B poverty dominates11distribution A if and only
if the distribution function for B lies below that for A for all poverty lines in the
interval [0,z+]. Analytically, for generally-denoted poverty indices P(z) and a
distribution function F(x), we have
PA(z) ≥ PB(z)
⇔ FA(x) ≥ FB(x)
∀ P(z) ∈ Π1(z+)
∀ x ∈ [0,z+].
This results says that ordering poverty headcounts over all lines in [0,z+] also
ranks all poverty indices that meet the monotonicity and symmetry axioms, and
for whatever choice of poverty lines below z+.
For statistical and normative reasons (see Davidson and Duclos 2006), these
dominance tests are better implemented over ranges poverty lines ranging from of
z−to z+, rather than from 0 to z+(these tests are then denoted in the literature as
restricted stochastic dominance tests). Empirically, this interval will correspond
to [1.53,1.78], or from approximately the 3rd to the 97th percentile of the distri-
butions of heights observed in nineteenth century France. The null and alternative
hypotheses for the dominance tests that we conduct can then be written as:
F(zm;ˆΘA) ≤ F(zm;ˆΘB)
F(zm;ˆΘA) > F(zm;ˆΘB),
11In the first order, since we could also test for higher-order dominance comparisons — see
again Duclos and Araar (2006), Part III.
where the zi’s represent m points in the interval [z−,z+]. The null hypothesis is
an hypothesis of non-dominance of A by B. The alternative hypothesis is that B
The decision rule differs from that for simpler test hypotheses, since H0and
H1are sets of multiple hypotheses. The decision rule is to reject H0and conclude
that B dominates A if and only if each of the inequalities in H0can be rejected at
the 5% level. Since it involves testing separately over m hypothesis tests, this test
procedure is generally conservative: the 5% nominal level for each inequality test
in H0leads to a less than 5% probability of committing a Type I error of wrongly
rejecting the joint hypothesis of non-dominance of A by B when non-dominance
An illustration of a test procedure for stochastic dominance is provided by
Figure 1, which uses data from the Ain department. We test here whether distri-
bution B (1886) dominates distribution A (1819) (H0: 1886 does not dominate
1819). We see in the lower panel that the t test statistics (of equation (7)) exceed
the critical value (the dotted line) across the entire interval [1.53,1.78], so we can
reject H0and conclude that year 1886 dominates year 1819, that is, that year 1886
has less height for whatever choice of poverty measures in Π1(z+).
We now turn to the empirical results. Recall that the first step is to estimate an
entire population distribution function for each of our 6369 datasets. This is done
using the estimation techniques presented in Section 3. Once estimated, these
distributions generally fit the empirical distributions very well, as is illustrated
for instance in Figure 2. The histogram shows rectangles whose areaˆHc(see
(1)) comes from the aggregated data. The line is the estimated density function,
a mixture of normal density functions (whose cumulative distributive functions
appear in (2)).
Some estimated distributions are very close to the normal distribution, such
as the one of Figure 3. This is more often the case of the distributions with eight
parameters (a mixture of three normal distributions), e.g., for many of the dis-
tributions from 1872 to 1900. Other distributions are less smooth, as Figure 4
shows. It proved impossible to find a satisfactory solution to the system (1) for 3
out of the 6369 distributions of our database. This may be due to numerical limi-
tations in attempting to solve for systems of up to 15 equations in 15 unknowns,
in which each equation is a sum of as many as 6 normal distribution functions —
also computed numerically. It can also be because of difficulties inherent in fitting
empirical distributions that diverge (because, e.g., of sampling variability) widely
from the normal distribution. Representing less than 0.05% of our distributions,
these three distributions are dropped from our analysis; further details on them
can be found in Leblanc (2007).
4.1 National distributions of heights
We first consider the overall distribution of heights of the French during the
19th century. Figure 5 illustrates mean body-heights in each department for every
year. Observe that there appears to be an upward trend in average body-heights.
This phenomenon is most clearly seen in Figure 6, which reproduces the national
mean of body-heights (i.e., all departments aggregated — more information can
be found in the Appendix) from 1819 to 1900. Note that the national mean seems
to have increased from around 163.5 to 165.5 centimeters between 1819 and 1900.
Figure 6 also suggests that this evolution of the mean was not constant. Finally,
observe in Figure 5 that there is considerable variation across departments.
4.2 Year-to-year comparisons
Let us now turn to the year-to-year evolution of heights for each department.
For each year, Tables 2 and 3 show the percentage of departments within which
this year is better than (or dominates) the preceding year. Table 2 uses means and
Table 3 compares distribution functions at a fixed z.
Notice that there appears to be a great deal of variation in the means from
year to year. In Table 2, the percentages of departmental means dominating or
dominated by the preceding year hover around 15% to 35%. These percentages
therefore far exceed 5%, the level of the test (i.e. the proportion of times when
sampling error would have erroneously led us to conclude that there was domi-
nance when there was in fact none). The percentages of decline are also nearly as
high as those of increase.
Similar results in Table 3 are obtained for comparisons of the proportion of
individuals whose body-height was below 1.652 meters for the years 1819–1866
and 1.640 meters for 1867–1900.12The presumption that the variability in the
12These values were selected because they correspond to class boundaries (the proportion used
thus corresponds to the sum of the number of individuals in each class below this boundary) and
they differ because the class boundaries were changed between these two periods. They also
means actually springs from the raw data, and not from peculiarities in the esti-
mation methods, is thus supported by the fact that a similar dominance rate exists
in Table 3 as for means in Table 4.
There are two possible explanations to the relatively high rate of “acceptance”
of dominance. The first is that the distributions within a single department truly
do vary from one year to the next, and that the null hypothesis of non-dominance
of the distributions must naturally be rejected more often than the nominal 5%
level of the tests if we want our tests to have some power. The second, more cau-
tious, explanation starts with the presumption that there should be little difference
from one year to the next between the population distributions within a single de-
partment, and that we need to locate the source of the high rates of acceptance
of dominance in measurement errors. This would then admonish caution in our
interpretations of the dominance results.
The Appendix (Section 6.4) analyzes the effect of various possible sources of
measurement errors on the validity of our inference results. A relatively conserva-
tive approach that emerges from that analysis is to consider the year-year accep-
tance of dominance rankings to stem from department-year-specific measurement
errors that are identically, independently and normally distributed across the cen-
tury. This would suggest a standard deviation of those measurement errors of the
order of 1.7 times the size of the sampling error on the estimator of the mean —
or roughly 0.23 cm. Note that a standard deviation of 0.23 cm in the distribution
of (true) population average heights would also suffice to generate the high rates
of dominance acceptance that we observe in Table 2.
Thus, rather than conclude that one year dominates another on the usual ba-
sis of rejecting non-dominance at the nominal level of 5%, we will only draw
this conclusion if we can reject non-dominance of means at a nominal level of
20%, which is approximately the average rate of dominance rankings across de-
partments from one year to the next over the century. In the case of stochastic (or
distribution) dominance (Table 4), the corresponding average rate of dominance
rankings is about 1.5%. Even though this is smaller than the 5% nominal level
used in testing each of the null in the composite null H0in (10), recall from the
discussion on page 11 that the decision rule is to reject that composite null and
conclude that B dominates A only if all of the inequalities in H0can be rejected
at the nominal level. The test procedure is therefore inherently conservative, lead-
correspond to points outside of the tails of these distributions. In fact, approximately 60% of
individuals’ body-heights were less than 1.652 meters in 1819, and for 25% body-height was less
than 1.640 meters in 1886. Because of this, it is worth noting that only the information present in
the data (and not in the estimated distributions) were used for these tests.
ing to a probability of committing a Type I error that can be much lower than the
nominal level. Allowing for the presence of measurement errors in our context is
not enough to offset this. This explains why the 1.5% quoted above is below the
nominal 5% level.
4.3 Did the body-height of the French increase during the 19th
We then turn to the following question: “Did the body-height of the French
increase during the 19th century?” To answer this, we compare the distributions
of the first ten years of available data (1819–1828) with those of the last ten years
(1891–1900) on the basis of the elements discussed above — the means and the
entire distributions. Comparisons of department-years were performed depart-
ment by department, in order to account for “fixed effects” unique to each depart-
ment (slight variations in genetic heritage, different geophysical conditions, etc.).
If data are available for all years, this corresponds to a maximum of 100 com-
parisons per department. Overall, 7430 individual comparisons were performed.
In addition, aggregate analyses were also performed: for these same years, the
means and distribution for all of France were compared.
4.3.1 Comparisons of means
Overall, 93.8% of the distributions at the end of the century had a statistically
greater mean than at the beginning. Thus, there was clearly a height progression
over the course of the century, even in light of the aforementioned conservative-
ness of our inference procedures.
One of the reasons why this proportion is not 100% can be found in devel-
opments that are specific to certain departments, as Figure 7 illustrates for the
Calvados department. We observe that, in this department, the evolution between
the beginning and the end of the century is not very pronounced, and that further-
more the mean was already high at the start of the century. Figure 8 provides,
however, a good illustration for the Ain department of the situation of a vast ma-
jority of the departments, namely a steady progression in body-height throughout
the century. In addition, for purposes of verification, we identified the percentage
of distributions at the beginning of the century that dominated the distributions at
the end of the century. This proportion was only 2%, which adds to the strong
statistical evidence of an increase in mean height over the course of the century.