Page 1

Comparing population distributions from

bin-aggregated sample data: An application to

historical height data from France

Jean-Yves Duclos∗, Josée Leblanc†, David Sahn‡

20th April 2009

Abstract

This paper develops a methodology to estimate the entire population dis-

tributions from bin-aggregated sample data. We do this through the estima-

tion of the parameters of mixtures of distributions that allow for maximal

parametric flexibility. The statistical approach we develop enables compar-

isons of the full distributions of height data from potential army conscripts

across France’s 88 departments for most of the nineteenth century. These

comparisons are made by testing for differences-of-means stochastic dom-

inance. Corrections for possible measurement errors are also devised by

taking advantage of the richness of the data sets. Our methodology is of

interest to researchers working on historical as well as contemporary bin-

aggregated or histogram-type data, something that is still widely done since

much of the information that is publicly available is in that form, often due

to restrictions due to political sensitivity and/or confidentiality concerns.

Key words: Health, health inequality, aggregate data, 19th- century France,

welfare

JEL Classification: C14, C81, D3, D63, I1, I3, N3.

∗Institut d’Anàlisi Econòmica (CSIC), Barcelona, Spain, and Département d’économique and

CIRPÉE, Université Laval, Canada; email: jyves@ecn.ulaval.ca

†Department of Finance, Ottawa, Canada; email: Leblanc.Josee@fin.gc.ca

‡Cornell University, Cornell, USA; email: david.sahn@cornell.edu

1

Page 2

1Introduction

There are many reasons to consider dimensions of well-being other than in-

come or expenditure, both normative and practical. Following Sen (1985) and

others, for example, one may wish to consider well-being as multidimensional,

comprising characteristics such as , good health, nutrition, literacy, and freedom

of association. Income may be instrumentally important to achieve these ends, but

it is the capabilities themselves that are intrinsically important and merit recogni-

tion and measurement in their own right. Poverty can thus be defined as depriva-

tion of basic capabilities or the failure of certain basic functionings, not just low

levels of income. Deprivation of capabilities can also in turn contribute to low

material standards of living.

This paper focuses on health, which is certainly important in a multidimen-

sional understanding of well-being. In fact, even in a purely unidimensional wel-

farist framework, it can be argued that health contributes to welfare at least as

much as income. Income may not even be a sound directional indicator of overall

welfare in some environments. As Floud (1984) writes, “[t]here is little point in

an improvement in real wages which is bought at the expense of a miserable life

and an early death”. The example of the United States, for instance, shows that

although health and economic growth have generally converged during the twenti-

eth century, they diverged during the nineteenth century (“the antebellum period”)

(Costa and Steckel 1997); strong economic growth during the nineteenth century

coincided inter alia with a decrease in body-heights. Whether income is more

indicative of welfare than health in such circumstances is then open to debate.1

Beyond the conceptual arguments, there are many practical reasons to mea-

sure well-being in non-income dimensions. First, measurement difficulties may

be less of a problem for some non-income variables. Collecting income (or expen-

diture) data is a complex procedure that contrasts, for instance, with the relatively

straightforward procedure of collecting anthropometric data — data that also suf-

fer typically less from misreporting. Measurement errors can of course still affect

such data, but anthropometric data (unlike other self-reported and subjective mea-

sures of health) are more likely to be uncorrelated with important variables of

interest — such as the welfare variable itself. Note also that health can be more

1More generally, measures of health are often not highly correlated with incomes, either within

a given country or across countries (Haddad and Ahmed 2003, Behrman and Deolalikar 1988,

Behrman and Deolalikar 1990, Appleton and Song 1999), suggesting among other things that

health variables may provide significant information on welfare that is not captured by income

alone.

2

Page 3

easily measured at the individual rather than at the household level, thus largely

avoiding the need to make difficult assumptions on the well-being of individuals

which requires an assessment of both the needs of individuals and how resources

are allocated among household members relative to needs.

Perhaps most importantly, the choice of a welfare indicator obviously depends

on the availability of data. Research on the distribution of well-being routinely

takes advantage of the availability of large-scale micro-data sets that provide de-

tailed information on money-metric and non-monetary indicators2. The choice of

welfare indicators is much more constrained for studies covering earlier historical

periodswhere the available data is more limited and often of variable quality. It

may for instance be the case that only part of the initially gathered welfare infor-

mation was preserved, or that it comes from sources whose primary goal was not

to capture data representative of the population3. Furthermore, historical house-

hold level data on incomes and expenditures are particularly difficult to collect

since economies were less monetized, transactions were often in-kind, consump-

tion was largely from home consumption rather than market purchases, and tax

authorities did not have good records of gathering and verifying information on

incomes.

Fortunately, quality historical data are widely available from many countries

on one of the clearest manifestations of health and nutritional status, stature. It

is also now well established that one of the best global indicators of living condi-

tions is height, standardized for age and gender (de Onis, Frongillo, and Blossner

2000). More specifically,stature is the outcome of a combination of inputs that

affect nutrition and disease, such as the local health environment, access to clean

2Some of the non-monetary indicators found in the literature include body-height (Fogel, En-

german, Floud, Steckel, Trussell, Wachter, Sokoloff, Villaflor, Margo, and Friedman 1982, Steckel

and Floud 1997, Wagstaff 2002, Pradhan, Sahn, and Younger 2003, Sahn and Younger 2005),

body mass index (Costa and Steckel 1997), amount of abdominal fat (Costa and Steckel 1997),

birth weight (Costa 1999), life expectancy (Whitwell, Souza, and Nicholas 1997, Goesling and

Firebaugh 2004), the overall mortality rate or mortality before a certain age (Costa and Steckel

1997, Floud and Harris 1997, Weir 1997, Whitwell, Souza, and Nicholas 1997, Wagstaff 2002),

the prevalence of chronic or severe illness (Costa and Steckel 1997, Wagstaff 2002, Anson and

Sun 2004), the individual’s own assessment of his or her health (Deaton and Paxson 1998, Nolte

and McKee 2004), disabilities (Anson and Sun 2004), difficulties accomplishing tasks (Anson and

Sun 2004), and mental illness (Anson and Sun 2004).

3For example, Costa (1999)’s data on the birth weight of babies were obtained from hospital

archives. These archives are not necessarily complete, since hospitals will not have kept every

patient file since 1848; moreover, the hospital registries may represent a biased sample of the pop-

ulation, since wealthier women were over-represented among those who gave birth in hospitals.

3

Page 4

water, nutrient intake, maternal health status, health technology, the organization

of work, and so forth. In short, stature captures “multiple dimensions of the indi-

vidual health and development and their socio-economic and environmental deter-

minants” (Beaton, Kelly, Kevany, Martorell, and Mason 1990)4. And in particular,

heights of young men entering adulthood is a cumulative indicator of their overall

health and nutritional status during their formative years, particularly the period

prior to the beginning of puberty.

Economic historians have thus expended considerable effort to examine

changes in anthropometric outcomes of various populations over time (Fogel, En-

german, Floud, Steckel, Trussell, Wachter, Sokoloff, Villaflor, Margo, and Fried-

man 1982, Steckel and Floud 1997, Weir 1997, Deaton and Paxson 1998 and

Goesling and Firebaugh 2004).5An important concern that arises in how histo-

rians use anthropometric indicators is whether the summary health statistics they

employ, particularly measures of central tendencies, can adequately capture the

distribution of health. Indeed, it is increasingly recognized that looking at en-

tire distributions of health, just as economists have long done with incomes, can

provide valuable information that would otherwise be hidden by summary health

statistics, such as means and the share of the population that falls before a norma-

tive standard, or cut-off point, that may define poor health6.

This paper gives prominence to the distributional analysis of health by exam-

ining both the evolution and the distribution of heights throughout France in the

nineteenth century. More specifically, the paper uses a particularly rich data set

collected on men who were called up for possible conscription into the French

army during this period. The screening of all men at the age of 20 for manda-

4This is also supported by the existence of a significant correlation between body-heights and

various indicators of health — see for instance Wagstaff (2002) — suggesting that the choice of

an indicator other than body-height would yield similar results. Moreover, adult stature is not only

a good indicator of prior episodes of, infection and chronic disease, but it has also been shown to

be an important determinant of risk of morbidity and mortality.

5A number of other studies have also looked at health inequalities, using statistics such as

covariances (such as Deaton and Paxson 1998 and Anson and Sun 2004), differences between

various percentiles (Costa 1999), distances from the mean for different social classes (Anson and

Sun 2004), or inequality indices (e.g., Wagstaff 2002, Pradhan, Sahn, and Younger 2003, Goesling

and Firebaugh 2004, Sahn and Younger 2005). Studies of health inequality have also attempted to

explain whether inequality in some factors, such as income, can transmit itself into health inequal-

ity — see for instance Weir (1997), Anson and Sun (2004), and Nolte and McKee (2004)). Other

studies have decomposed the evolution of health inequality across factors, such as Pradhan, Sahn,

and Younger (2003), Goesling and Firebaugh (2004), Sahn and Younger (2005), and Wagstaff

(2002).

6For instance, see Pradhan, Sahn, and Younger (2003) and Deaton (2003).

4

Page 5

tory military service involved a physical examination, including measuring their

height. Thus, we have a virtually complete census of heights for each year, dis-

aggregated by administrative department. Data in such abundance for a whole

century are arare find. Consequently, socio-economic improvements as well as

periods of adverse conditions in 19th-century France can be expected to have an

observable effect measured by the stature achieved at 20 years of age - which is

what we measure in our data.

While individual heights were recorded at the time of conscription, the data

we have available are limited to the number of men that fall into classes, or bins

of height intervals, for each year and department. This raises obvious challenges

given our objective to compare the entire distributions of heights. These difficul-

ties are not unique to our data, or the paper’s historical period of interest. Even

today, much of the information that is used and publicly available on distributions

of income come from bin-aggregated, or histogram-type, data — important exam-

ples of this are the popular World Income Inequality and POVCAL databases7. In

several countries, the unit data from early household surveys have not survived; in

other cases, access to distant and/or more recent microdata is restricted by politi-

cal sensitivity and confidentiality concerns. We therefore propose and implement

amethodthatestimatestheentirepopulationdistributionsfromthebin-aggregated

sample data through the estimation of the parameters of mixtures of distributions

that allow for maximal parametric flexibility8. While we only apply our method

to the historical height data from France, this approach will be of general interest

to both historians and other researchers working on contemporary bin level data

on household incomes, heights, and other indicators of well-being.

Our data are abundant, consisting of more than 6000 different distributions

of heights for the period 1819 to 1900 over 90-some departments. Using the

sampling distribution of the estimators of the means and of the cumulative dis-

tributions of heights, we test for differences in means and also implement tests

for robust comparisons across time and regions. We are also able to correct our

inference procedures for the possible presence of measurement errors. This is

rarely possible to do with the usual data used for comparing welfare; it is made

feasible here by the richness of the data. We correct for measurement errors by

measuring and taking into account the noise that is (possibly) introduced by mea-

surement errors in the many year-to-year comparisons of the distributions that are

7See www.wider.unu.edu/wii d/wiid.htm and www.worldbank.or g/LSMS/tools/povcal/

8See for instance Bandourian, McDonald, and Turvey (2002) for a review of some of the more

restrictive functional forms that have been proposed in the literature.

5

Page 6

made possible by our data. This renders the broader comparisons in which we are

interested robust to both sampling and measurement errors. Again, this is done by

estimating the parameters of mixtures of distributions with the maximum degree

of parametric flexibility and by therefore exploiting all the statistical information

that is present in the available height data.

In short, this paper develops a methodology that allows us to estimate the en-

tire population distributions from the bin-aggregated sample data. We go on to

illustrate how this methodology can be applied to a rich data set from France of

20-year-old army conscripts, and thereafter employ the generated distributions to

address one of many possible questions on the evolution and the distribution of

health and welfare in 19th-century France. These data are introduced in Section

2. The methodology for deriving from bin data the entire distribution as well as

the average height of each department-year is described in Section 3.1 and extracts

the maximum possible amount of information from the data. Statistical tests of

differences in mean heights and in the entire distributions are performed using the

test statistics and sampling distributions presented in Section 3.2. Further details

in carrying out stochastic dominance tests appear in Section 3.3. The empirical

results are presented in Section 4, where two main questions are more particu-

larly considered to illustrate how our statistical procedures can be used. The first

question considers the evolution of body-heights in France from the beginning to

the end of the century; the second deals with the regional correlates of the distri-

bution of body-heights. Section 5 concludes the paper. The Appendix in Section

6 provides more technical details on the data, the estimation procedures, and the

adjustment for measurement errors.

2Data

The data we use are from the registries of potential conscripts into the French

army for the period 1819 to 1900, covering the 90-some departments of France9.

More precisely, they are from the "Comptes rendus statistiques et sommaires."

Depending on the year, they constitute either a "nearly" random sample, or a com-

plete census of young French men aged 20 years old for each year of the century

(this point is addressed in greater detail in the Appendix on page 22). They consist

of 6369 different datasets, each representing the distribution of body-heights for a

9We are very grateful to Gilles Postel-Vinay for his generous assistance in making available to

us, and helping us better understand the data. More details on the data set are also found in Sahn

and Postel-Vinay (forthcoming).

6

Page 7

given department and a given year. In total, we have measurements on close to 15

million young French men over the course of the century.

A particularity of these data is that they are not available in a totally disag-

gregated form. Rather, they are grouped into "classes," each of which contains

individuals within a specific body-height range. The available data thus report the

number of individuals in each of these classes. Furthermore, neither the number

of classes, nor the boundaries between bins, were constant throughout the century.

The data from 1819 to 1829 are divided into 15 classes, those from 1830 to 1871

into 16 classes, and those from 1872 to 1900 into 9 classes. The class boundaries

for the various years are reproduced in Table 1. Note that the classes were based

on the old imperial measurements until 1867 (1 French inch = 27.07 millimeters)

so that 1.570 meters corresponds to 4 feet 10 inches, 1.597 meters to 4 feet 11

inches, etc. The metric measurements we employ are the public equivalent to the

imperial measurements used during these years.

These registries contain a considerable amount of information. Indeed, the

number of conscripts measured each year varies between one-quarter, and all, of

the male population aged 20. Between 1819 and 1830, approximately 80,000 men

were measured each year; from 1836 to 1885, approximately 150,000; and as of

1886, approximately 300,000. Only during this final period was the entire male

population aged 20 measured each year.

3Estimation and inference

3.1 Estimation of complete distributions

That the data available for this study are grouped into classes raises difficulties

if the objective is to compare complete distributions of heights. To address this

challenge, we first need to estimate the continuous population height distributions

using the discontinuous sample histograms that are available. To do this, we solve

a system of (C − 1) equations in (C − 1) unknowns, where C is the number of

classes into which the heights are regrouped in the aggregated sample data. Each

of these equations captures the probability of belonging to one of the C classes of

height. Equation (1) defines such a probability for the class c of heights x, a class

whose lower bound is xcand upper bound is xc:

F(xc;ˆΘ) − F(xc;ˆΘ) =ˆHc,

(1)

7

Page 8

whereF standsfortheheightdistributionfunction,ˆΘisthevectoroftheestimated

parameters for this function, andˆHcis the proportion of heights between xcand

xcthat is observed in our data.

F is specified as a mixture of normal distributions, namely, as a weighted

sum of several normal distributions. We let the mixture use as many parameters

as is statistically possible given the grouped form of our data. This mixture of

normal distributions thus allows for the maximal possible amount of estimation

flexibility. Note that since the normal distribution is smooth, this property will

also be imposed on our estimated population height distributions.

Equation (2) provides an example of a “mixture” of three normal distributions

with a set of 9 parameters:

F(x;α1,α2,α3,µ1,µ2,µ3,σ1,σ2,σ3) = α1Φ

?x − µ1

σ1

?

+ α2Φ

?x − µ3

?x − µ2

?

σ2

?

(2)

+ (1 − α1− α2) Φ

σ3

,

where Φ is the distribution function of the standard normal distribution, and where

αd, µdand σdcorrespond respectively to the weight, the mean, and the standard

error of normal distribution d. Note that since α1+ α2+ α3 = 1, we can set

α3 = 1 − α1− α2. There are therefore 8 “free” parameters in (2). Thus, a

mixture of D ≥ 1 normal distributions contains 3D−1 free parameters. Similarly,

there are only (C − 1) “degrees of freedom” in data aggregated into C classes of

heights, since the probabilities of belonging to one class is one minus the sum

of the probabilities of belonging to the others. Hence, the problem is to solve a

system of equations such as (1), with c = 1,...,C − 1, using theˆHcobserved

probabilities of belonging to C classes and choosing D = C/3.

For some years, however, C/3 is not an integer. For those years we use instead

mixtures of (C + 1)/3 or (C + 2)/3 distributions, setting the last one (σC+1)

or two (µC+2and σC+2) parameters in the mixture to some pre-specified values.

These values are chosen as those that are estimated in the 1880 distribution of

individual heights, which is the only year for which we have access to the entire

set of individual-level data.

More technical details on the above estimation procedure can be found in the

Appendix, Section 6.2.

8

Page 9

3.2 Test statistics

Once the parametersˆΘ in (1) are estimated for each year and each department,

we can proceed to assess the evolution of the distributions of health in nineteenth-

century France. We do this in two ways: first by comparing mean body-heights,

which is one of the most common procedures in the literature, and second by

comparing “health poverty rates”. Note that these poverty rates will be compared

across ranges of possible “health poverty lines”, which will amount to testing for

stochastic dominance of height distributions.

Mean height can be estimated as

?∞

The height poverty rate (“the poverty headcount”) is the proportion of individuals

below a height poverty line. Computing the poverty rate consists of evaluating the

distribution function at the poverty line z:

?z

Once estimated, ˆ µ and F(z;ˆΘ) can be compared across departments and years.

For the comparisons of means, the null hypothesis is that the mean of distribution

B does not exceed the mean of distribution A, and the alternative hypothesis is

that it does. The test statistic that we use is then

ˆ µ =

−∞

x dF(x;ˆΘ).

(3)

F(z;ˆΘ) =

−∞

dF(x;ˆΘ).

(4)

ˆ µB− ˆ µA

var(ˆ µB) + ?

??

var(ˆ µA)

(5)

where

?

var(ˆ µ) =1

n

?∞

−∞

(x − ˆ µ)2dF(x;ˆΘ)

(6)

and n (which is always well above 500 in our data) stands for the number of sol-

diers over whom the aggregated bin data have been computed. Under the assump-

tion that population heights follow the flexible form given by (2) and that the two

means µBand µAare equal, the statistic (5) can be shown to follow asymptotically

a normal distribution with mean zero and unit variance. At the conventional 5%

level, the above null hypothesis will then be rejected if (5) is greater than 1.645.

9

Page 10

For ordering poverty headcounts10, we use the test statistic

F(z;ˆΘA) − F(z;ˆΘB)

var(F(z;ˆΘB)) + ?

var(F(z;ˆΘi)) =F(z;ˆΘi) (1 − F(z;ˆΘi))

?

?

var(F(z;ˆΘA))

(7)

where

?

n

.

(8)

Under the null hypothesis of equality of the two distribution functions at z, and

that population heights follow the flexible form given by (2), the distribution

of F(z;ˆΘA) − F(z;ˆΘB) is asymptotically normal with mean 0 and variance

?

Note that the expressions ˆ µ, ?

maximalnumberofparametersthatcanbeestimatedfromourdata, thesestatistics

are also as distribution-free as they can be. Note furthermore that the asymptotic

result for (7) is valid for the z located at the frontiers of the bins of the aggregated

data even when population heights do not follow exactly the flexible form given

by (2), since at such z we can estimate (7) and (8) directly from theˆHcin (1).

var(F(z;ˆΘB)) + ?

computed once the parametersˆΘ in system (1) are estimated. SinceˆΘ contain the

var(F(z;ˆΘA)) (since the samples from A and B are indepen-

dent).

var(ˆ µ), F(z;ˆΘA) and ?

var(F(z;ˆΘi)) are readily

3.3Dominance tests

A poverty comparison that uses (7) depends on the choice of the line z. It is

also evidently dependent on the choice of the distribution function as a “poverty

index”. To make the paper’s poverty comparisons more robust to such choices,

stochastic dominance tests can be performed by comparing poverty rates over

ranges of poverty lines. Pushing this approach farther, one can also compare cu-

mulative height distributions over the entire range of possible heights.

To see what this implies in terms of poverty rankings, note that the poverty

headcount F belongs to a general class of poverty indices, denoted as Π1(z+),

that can be defined with the help of two simple axioms and of a condition (see for

instance Duclos and Araar 2006, Part III). The first axiom, a monotonicity axiom,

says that an increase in the body-height of any one individual (provided that no

one else’s body-height decreases) should (weakly) reduce the value of a poverty

10See for instance Davidson and Duclos (2000, Theorem 1).

10

Page 11

index. The second axiom, a symmetry axiom, says that interchanging the body-

height of any two individuals should not affect the poverty index. The condition

is that the poverty index should use a poverty line that is below z+.

We then say that distribution B poverty dominates11distribution A if and only

if the distribution function for B lies below that for A for all poverty lines in the

interval [0,z+]. Analytically, for generally-denoted poverty indices P(z) and a

distribution function F(x), we have

PA(z) ≥ PB(z)

⇔ FA(x) ≥ FB(x)

∀ P(z) ∈ Π1(z+)

∀ x ∈ [0,z+].

(9)

This results says that ordering poverty headcounts over all lines in [0,z+] also

ranks all poverty indices that meet the monotonicity and symmetry axioms, and

for whatever choice of poverty lines below z+.

For statistical and normative reasons (see Davidson and Duclos 2006), these

dominance tests are better implemented over ranges poverty lines ranging from of

z−to z+, rather than from 0 to z+(these tests are then denoted in the literature as

restricted stochastic dominance tests). Empirically, this interval will correspond

to [1.53,1.78], or from approximately the 3rd to the 97th percentile of the distri-

butions of heights observed in nineteenth century France. The null and alternative

hypotheses for the dominance tests that we conduct can then be written as:

F(z1;ˆΘA)

or

F(z2;ˆΘA)

or

...

or

F(zm;ˆΘA) ≤ F(zm;ˆΘB)

versus

H1:F(z1;ˆΘA)

and

F(z2;ˆΘA)

and

...

and

F(zm;ˆΘA) > F(zm;ˆΘB),

H0:

≤

F(z1;ˆΘB)

≤

F(z2;ˆΘB)

(10)

>F(z1;ˆΘB)

>F(z2;ˆΘB)

(11)

11In the first order, since we could also test for higher-order dominance comparisons — see

again Duclos and Araar (2006), Part III.

11

Page 12

where the zi’s represent m points in the interval [z−,z+]. The null hypothesis is

an hypothesis of non-dominance of A by B. The alternative hypothesis is that B

dominates A.

The decision rule differs from that for simpler test hypotheses, since H0and

H1are sets of multiple hypotheses. The decision rule is to reject H0and conclude

that B dominates A if and only if each of the inequalities in H0can be rejected at

the 5% level. Since it involves testing separately over m hypothesis tests, this test

procedure is generally conservative: the 5% nominal level for each inequality test

in H0leads to a less than 5% probability of committing a Type I error of wrongly

rejecting the joint hypothesis of non-dominance of A by B when non-dominance

is true.

An illustration of a test procedure for stochastic dominance is provided by

Figure 1, which uses data from the Ain department. We test here whether distri-

bution B (1886) dominates distribution A (1819) (H0: 1886 does not dominate

1819). We see in the lower panel that the t test statistics (of equation (7)) exceed

the critical value (the dotted line) across the entire interval [1.53,1.78], so we can

reject H0and conclude that year 1886 dominates year 1819, that is, that year 1886

has less height for whatever choice of poverty measures in Π1(z+).

4 Results

We now turn to the empirical results. Recall that the first step is to estimate an

entire population distribution function for each of our 6369 datasets. This is done

using the estimation techniques presented in Section 3. Once estimated, these

distributions generally fit the empirical distributions very well, as is illustrated

for instance in Figure 2. The histogram shows rectangles whose areaˆHc(see

(1)) comes from the aggregated data. The line is the estimated density function,

a mixture of normal density functions (whose cumulative distributive functions

appear in (2)).

Some estimated distributions are very close to the normal distribution, such

as the one of Figure 3. This is more often the case of the distributions with eight

parameters (a mixture of three normal distributions), e.g., for many of the dis-

tributions from 1872 to 1900. Other distributions are less smooth, as Figure 4

shows. It proved impossible to find a satisfactory solution to the system (1) for 3

out of the 6369 distributions of our database. This may be due to numerical limi-

tations in attempting to solve for systems of up to 15 equations in 15 unknowns,

in which each equation is a sum of as many as 6 normal distribution functions —

12

Page 13

also computed numerically. It can also be because of difficulties inherent in fitting

empirical distributions that diverge (because, e.g., of sampling variability) widely

from the normal distribution. Representing less than 0.05% of our distributions,

these three distributions are dropped from our analysis; further details on them

can be found in Leblanc (2007).

4.1 National distributions of heights

We first consider the overall distribution of heights of the French during the

19th century. Figure 5 illustrates mean body-heights in each department for every

year. Observe that there appears to be an upward trend in average body-heights.

This phenomenon is most clearly seen in Figure 6, which reproduces the national

mean of body-heights (i.e., all departments aggregated — more information can

be found in the Appendix) from 1819 to 1900. Note that the national mean seems

to have increased from around 163.5 to 165.5 centimeters between 1819 and 1900.

Figure 6 also suggests that this evolution of the mean was not constant. Finally,

observe in Figure 5 that there is considerable variation across departments.

4.2 Year-to-year comparisons

Let us now turn to the year-to-year evolution of heights for each department.

For each year, Tables 2 and 3 show the percentage of departments within which

this year is better than (or dominates) the preceding year. Table 2 uses means and

Table 3 compares distribution functions at a fixed z.

Notice that there appears to be a great deal of variation in the means from

year to year. In Table 2, the percentages of departmental means dominating or

dominated by the preceding year hover around 15% to 35%. These percentages

therefore far exceed 5%, the level of the test (i.e. the proportion of times when

sampling error would have erroneously led us to conclude that there was domi-

nance when there was in fact none). The percentages of decline are also nearly as

high as those of increase.

Similar results in Table 3 are obtained for comparisons of the proportion of

individuals whose body-height was below 1.652 meters for the years 1819–1866

and 1.640 meters for 1867–1900.12The presumption that the variability in the

12These values were selected because they correspond to class boundaries (the proportion used

thus corresponds to the sum of the number of individuals in each class below this boundary) and

they differ because the class boundaries were changed between these two periods. They also

13

Page 14

means actually springs from the raw data, and not from peculiarities in the esti-

mation methods, is thus supported by the fact that a similar dominance rate exists

in Table 3 as for means in Table 4.

There are two possible explanations to the relatively high rate of “acceptance”

of dominance. The first is that the distributions within a single department truly

do vary from one year to the next, and that the null hypothesis of non-dominance

of the distributions must naturally be rejected more often than the nominal 5%

level of the tests if we want our tests to have some power. The second, more cau-

tious, explanation starts with the presumption that there should be little difference

from one year to the next between the population distributions within a single de-

partment, and that we need to locate the source of the high rates of acceptance

of dominance in measurement errors. This would then admonish caution in our

interpretations of the dominance results.

The Appendix (Section 6.4) analyzes the effect of various possible sources of

measurement errors on the validity of our inference results. A relatively conserva-

tive approach that emerges from that analysis is to consider the year-year accep-

tance of dominance rankings to stem from department-year-specific measurement

errors that are identically, independently and normally distributed across the cen-

tury. This would suggest a standard deviation of those measurement errors of the

order of 1.7 times the size of the sampling error on the estimator of the mean —

or roughly 0.23 cm. Note that a standard deviation of 0.23 cm in the distribution

of (true) population average heights would also suffice to generate the high rates

of dominance acceptance that we observe in Table 2.

Thus, rather than conclude that one year dominates another on the usual ba-

sis of rejecting non-dominance at the nominal level of 5%, we will only draw

this conclusion if we can reject non-dominance of means at a nominal level of

20%, which is approximately the average rate of dominance rankings across de-

partments from one year to the next over the century. In the case of stochastic (or

distribution) dominance (Table 4), the corresponding average rate of dominance

rankings is about 1.5%. Even though this is smaller than the 5% nominal level

used in testing each of the null in the composite null H0in (10), recall from the

discussion on page 11 that the decision rule is to reject that composite null and

conclude that B dominates A only if all of the inequalities in H0can be rejected

at the nominal level. The test procedure is therefore inherently conservative, lead-

correspond to points outside of the tails of these distributions. In fact, approximately 60% of

individuals’ body-heights were less than 1.652 meters in 1819, and for 25% body-height was less

than 1.640 meters in 1886. Because of this, it is worth noting that only the information present in

the data (and not in the estimated distributions) were used for these tests.

14

Page 15

ing to a probability of committing a Type I error that can be much lower than the

nominal level. Allowing for the presence of measurement errors in our context is

not enough to offset this. This explains why the 1.5% quoted above is below the

nominal 5% level.

4.3 Did the body-height of the French increase during the 19th

century?

We then turn to the following question: “Did the body-height of the French

increase during the 19th century?” To answer this, we compare the distributions

of the first ten years of available data (1819–1828) with those of the last ten years

(1891–1900) on the basis of the elements discussed above — the means and the

entire distributions. Comparisons of department-years were performed depart-

ment by department, in order to account for “fixed effects” unique to each depart-

ment (slight variations in genetic heritage, different geophysical conditions, etc.).

If data are available for all years, this corresponds to a maximum of 100 com-

parisons per department. Overall, 7430 individual comparisons were performed.

In addition, aggregate analyses were also performed: for these same years, the

means and distribution for all of France were compared.

4.3.1 Comparisons of means

Overall, 93.8% of the distributions at the end of the century had a statistically

greater mean than at the beginning. Thus, there was clearly a height progression

over the course of the century, even in light of the aforementioned conservative-

ness of our inference procedures.

One of the reasons why this proportion is not 100% can be found in devel-

opments that are specific to certain departments, as Figure 7 illustrates for the

Calvados department. We observe that, in this department, the evolution between

the beginning and the end of the century is not very pronounced, and that further-

more the mean was already high at the start of the century. Figure 8 provides,

however, a good illustration for the Ain department of the situation of a vast ma-

jority of the departments, namely a steady progression in body-height throughout

the century. In addition, for purposes of verification, we identified the percentage

of distributions at the beginning of the century that dominated the distributions at

the end of the century. This proportion was only 2%, which adds to the strong

statistical evidence of an increase in mean height over the course of the century.

15