Genetic Genealogy Comes of Age: Perspectives on the Use of Deep-Rooted Pedigrees in Human Population Genetics.
ABSTRACT In this article, we promote the implementation of extensive genealogical data in population genetic studies. Genealogical records can provide valuable information on the origin of DNA donors in a population genetic study, going beyond the commonly collected data such as residence, birthplace, language, and self-reported ethnicity. Recent studies demonstrated that extended genealogical data added to surname analysis can be crucial to detect signals of (past) population stratification and to interpret the population structure in a more objective manner. Moreover, when in-depth pedigree data are combined with haploid markers, it is even possible to disentangle signals of temporal differentiation within a population genetic structure during the last centuries. Obtaining genealogical data for all DNA donors in a population genetic study is a labor-intensive task but the vastly growing (genetic) genealogical databases, due to the broad interest of the public, are making this job more time-efficient if there is a guarantee for sufficient data quality. At the end, we discuss the advantages and pitfalls of using genealogy within sampling campaigns and we provide guidelines for future population genetic studies. Am J Phys Anthropol, 2013. © 2013 Wiley Periodicals, Inc.
- [show abstract] [hide abstract]
ABSTRACT: Nine skeletons found in a shallow grave in Ekaterinburg, Russia, in July 1991, were tentatively identified by Russian forensic authorities as the remains of the last Tsar, Tsarina, three of their five children, the Royal Physician and three servants. We have performed DNA based sex testing and short tandem repeat (STR) analysis and confirm that a family group was present in the grave. Analysis of mitochondrial (mt) DNA reveals an exact sequence match between the putative Tsarina and the three children with a living maternal relative. Amplified mtDNA extracted from the remains of the putative Tsar has been cloned to demonstrate heteroplasmy at a single base within the mtDNA control region. One of these sequences matches two living maternal relatives of the Tsar. We conclude that the DNA evidence supports the hypothesis that the remains are those of the Romanov family.Nature Genetics 03/1994; 6(2):130-5. · 35.21 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: To test for human population substructure and to investigate human population history we have analysed Y-chromosome diversity using seven microsatellites (Y-STRs) and ten binary markers (Y-SNPs) in samples from eight regionally distributed populations from Poland (n = 913) and 11 from Germany (n = 1,215). Based on data from both Y-chromosome marker systems, which we found to be highly correlated (r = 0.96), and using spatial analysis of the molecular variance (SAMOVA), we revealed statistically significant support for two groups of populations: (1) all Polish populations and (2) all German populations. By means of analysis of the molecular variance (AMOVA) we observed a large and statistically significant proportion of 14% (for Y-SNPs) and 15% (for Y-STRs) of the respective total genetic variation being explained between both countries. The same population differentiation was detected using Monmonier's algorithm, with a resulting genetic border between Poland and Germany that closely resembles the course of the political border between both countries. The observed genetic differentiation was mainly, but not exclusively, due to the frequency distribution of two Y-SNP haplogroups and their associated Y-STR haplotypes: R1a1*, most frequent in Poland, and R1*(xR1a1), most frequent in Germany. We suggest here that the pronounced population differentiation between the two geographically neighbouring countries, Poland and Germany, is the consequence of very recent events in human population history, namely the forced human resettlement of many millions of Germans and Poles during and, especially, shortly after World War II. In addition, our findings have consequences for the forensic application of Y-chromosome markers, strongly supporting the implementation of population substructure into forensic Y chromosome databases, and also for genetic association studies.Human Genetics 10/2005; 117(5):428-43. · 4.63 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Genome-wide genotypes and sequences are enriching our understanding of the past 50,000 years of human history and providing insights into earlier periods largely inaccessible to mitochondrial DNA and Y-chromosomal studies.To see a world in a grain of sand ...William Blake, Auguries of Innocence.Genome biology 11/2011; 12(11):234. · 10.30 Impact Factor
Genetic Genealogy Comes of Age: Perspectives on the
Use of Deep-Rooted Pedigrees in Human Population
M.H.D. Larmuseau,1,2,3* A. Van Geystelen,1M. van Oven,4and R. Decorte1,2
1UZ Leuven, Laboratory of Forensic Genetics and Molecular Archaeology, Leuven, Belgium
2Department of Imaging and Pathology, KU Leuven, Forensic Medicine, Leuven, Belgium
3KU Leuven, Department of Biology, Laboratory of Biodiversity and Evolutionary Genomics, Leuven, Belgium
4Department of Forensic Molecular Biology, Erasmus MC, University Medical Center Rotterdam, Rotterdam,
sampling strategy; haploid markers; surname
genomic diversity; genetic genealogy; ancestry; population history;
mentation of extensive genealogical data in population
genetic studies. Genealogical records can provide valua-
ble information on the origin of DNA donors in a popula-
tion genetic study, going beyond the commonly collected
data such as residence, birthplace, language, and self-
reported ethnicity. Recent studies demonstrated that
extended genealogical data added to surname analysis
can be crucial to detect signals of (past) population strat-
ification and to interpret the population structure in a
more objective manner. Moreover, when in-depth pedi-
gree data are combined with haploid markers, it is even
In this article, we promote the imple-
possible to disentangle signals of temporal differentia-
tion within a population genetic structure during the
last centuries. Obtaining genealogical data for all DNA
donors in a population genetic study is a labor-intensive
task but the vastly growing (genetic) genealogical data-
bases, due to the broad interest of the public, are mak-
ing this job more time-efficient if there is a guarantee
for sufficient data quality. At the end, we discuss the
advantages and pitfalls of using genealogy within sam-
pling campaigns and we provide guidelines for future
AmJ Phys Anthropol
C 2013 Wiley Periodicals, Inc.
Genomic diversity and population stratification within
a given species are the subjects of two disciplines,
namely population genetics and phylogeography (Avise,
2000). On the basis of the insights from both fields, it is
possible to decipher the population structure as well as
the evolution and history of a particular species (Cuve-
liers et al., 2012). However, as in all species, in humans
it is difficult to disassemble the effects of recent, histori-
cal and prehistorical events on the demography based on
the present-day genetic variation and population differ-
entiation. The subtle population stratification and demo-
population expansions, and colonizations are, therefore,
often unknown for human populations (Jobling, 2012).
Nevertheless, this knowledge is of fundamental interest
to medical,forensic and
(Novembre et al., 2008). To optimize the quality of popu-
lation genetic studies, the selection of DNA donors as
representative for the population under study is a cru-
The first human population genetic studies classified
DNA donors into populations based on their (former)
residence or birthplace, their (self-reported) ethnicity
and language (Jobling et al., 2004). Such sampling strat-
egy is certainly not suited to survey population genetic
structure on a small geographical scale because this
approach cannot account for migrations in the recent
past, which occur at a high frequency in a globalized
world (Rogaev et al., 2009). Therefore, additional criteria
are required in regional population studies to select
DNA donors which are more associated to the particular
region of interest. Nowadays, most population genetic
studies apply the two- or three-generations-of-residence
criterion whereby only those individuals are selected of
which the (great)grandparents already lived in the stud-
ied area (Novembre et al., 2008; Zalloua et al., 2008;
Simms et al., 2011; e.g., Morozova et al., 2012). Another
criterion is to select individuals preferably from rural
regions, because it is expected that less recent migration
occurred in those regions in comparison with more
industrialized regions (Weale et al., 2002; Balanovsky
et al., 2008). This avoids having migrations that hap-
pened in the last generations (after circa 1920) influence
the analysis. Nevertheless, there were of course also
huge migration events before the 20th century even in
rural regions, such as the Industrial Revolution in West-
ern Europe at the beginning of the 19th century.
Grant sponsor: KULeuven BOF-Centre of Excellence Financing;
Grand number: PF/2010/07.
Universiteit Leuven, Forensic Medicine, Kapucijnenvoer 33, B–3000
Leuven, Belgium. E-mail: email@example.com
to: Dr.MaartenLarmuseau; Katholieke
Received 24 August 2012; revised 21 December 2012; accepted 3
Published online 00 Month 2013 in Wiley Online Library
? 2013 WILEY PERIODICALS, INC.
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 00:N/A–N/A (2013)
In addition to the standard two- or three-generations-
of-residence criterion, several population genetic studies
in the last decennium have also used the surname of the
donors in regions where people have a paternally herit-
able surname. This approach is derived from isonymy
whereby only DNA donors were selected with a surname
which has a local geographical distribution and/or which
occurred in a particular region for a long time based on
archival records or based on the surname origin (Boat-
tini et al., 2012). One recent study, with the aim to
define a reference database of DNA donors for whole ge-
nome studies in England, demonstrated that a group of
individuals with a surname that occurred already in a
particular region in 1881 reveals a higher signal of pop-
ulation genetic structure than a group of donors only
based on the two-generations-of-residence criterion, even
in rural locations (Tyler-Smith and Xue, 2012; Winney
et al., 2012).
The surname of an individual indeed provides extra
information about the genealogy of a person, but this in-
formation is limited to the single paternal lineage. This
may be insufficient for a genomic study, especially in
patrilocal populations which is the most common social
system for humans (Underhill and Kivisild, 2007).
Nevertheless, a recent paper shows a good overview of
the possibilities of using surnames to get more insights
in the genetic structure of a population based on inter-
disciplinary research (Darlu et al., 2012). The criterion
to select only ‘local’ surnames for a regional study also
has several drawbacks. First, it is often the case that a
surname which has a limited geographical distribution
is the result of a past migration event whereby a mi-
grant came to the local region and received a new spe-
cific surname in their new community two or more
centuries ago (Marynissen, 2011). Second, there is not
always a paternal relationship between similar name-
sakes living some centuries ago and current families,
even if they lived in the same region. With more genea-
logical information it is often observed that individuals
received a ‘local’ surname by adoption or directly from
their mother. Finally, even today, many recent immi-
grants in Europe received or chose a new ‘local’ surname
as part of their integration into their new community.
Therefore, the use of genealogical data of DNA donors,
available in archives, may allow a solution to these prob-
lems and create more possibilities in population genetic
studies, which several recent studies have already
shown, as discussed in the next section.
The use of in-depth pedigrees in human
Without written records, personal genealogical knowl-
edge rarely goes back more than one century. However,
by using archival documents like estate inventories, par-
ish registers, official lists of soldiers, or declarations of
will, which are available in many parts of the world, it
is possible to extend the legal genealogy 400–500 years
in the past (Rasmuson, 2008). The amount of retraceable
depends on the region in the world, the availability and
accessibility of archives and the personal and familial
history (van Uytven et al., 2011).
In a past period when huge genetic/genomic datasets
were not yet available, analyses using large genealogical
databases gave already many indications about the pop-
ulation genetic structure, as in Canada (Scriver, 2001)
and Iceland (Helgason et al., 2003). This Perspective
wants to emphasize that genealogical information is still
valuable when combined with genetic data, as several
recent studies demonstrated. Genealogical data will help
in three research aims: first, to verify the observed
genomic population structure and to interpret it cor-
rectly, also by enhancing the representativeness of the
sample; second, to estimate more accurately the time
scale of the detected gene flow events; and third, to
determine temporal genetic differentiation within a
First, the concordance of genetic and genealogical data
provides an independent and complementary validation
of population structure studies based on genomic data,
indicating that the results showed a real (biologically
significant) signal. One study revealed strong genetic
inbreeding in two isolated villages in Italy six km apart
from each other. Extensive genealogical data of all par-
ticipants from those villages confirmed the observed
inbreeding patterns based on full genome data and could
give more insights in the effect of inbreeding on the ge-
nome (Colonna et al., 2009). Another genomic study in
Quebec (Canada) by Roy-Gagnon et al. (2011) showed
significant population structure that corroborated the
genealogical analysis that was performed previously
(Gagnon and Heyer, 2001; Bherer et al., 2011). The
observed structure was consistent with the settlement
patterns involving several founder events after the ini-
tial French immigration wave.
The use of extended genealogical data has proven cru-
cial for interpreting population structure objectively.
One excellent example has already illustrated this,
namely the genetic pattern of admixture of European
migrants and Native Americans in the Canadian Gasp? e
Peninsula (Moreau et al., 2011). The genetic pattern was
initially observed based on mitochondrial DNA (mtDNA)
and Y-chromosomal markers and interpreted according
to the parsimony principle (Moreau et al., 2009) which
appeared to be incorrect when the genealogies of the
participants were implemented in the genetic analysis
(Moreau et al., 2011).
Another study with a known extended genealogy as
criterion for the selection of DNA donors was a popula-
tion genetic analysis in the West European region of
Brabant (in total 11,300 km2) based on high-resolution
Y-chromosomal data (Larmuseau et al., 2011). When
each participant was classified according to the birth-
place of his oldest reported paternal ancestor (ORPA), a
significant population stratification on a micro-geograph-
ical scale in Brabant was observed. This signal was not
detected during earlier attempts based on the standard
two-generations-of-residence criterion and a surname
criterion. The frequency gradient of two Y-chromosomal
haplogroups (R-U106 and
appeared to be a small part of a larger gradient of the
frequencies of those two haplogroups on a European
scale (Cruciani et al., 2011). As such, the relevance of
genealogical data of the DNA donors for detecting popu-
lation stratification on a micro-geographic scale was
Second, time estimations of past migration events can
be obtained by using genealogical data. For example, in
the case of a British family belonging to the African Y-
chromosomal haplogroup A1a-M31, it was possible to
provide an upper limit for the time of arrival of this line-
age in Britain. Based on the DNA of several members of
the paternal lineage and their known genealogical
M.H.D. LARMUSEAU ET AL.
American Journal of Physical Anthropology
relationships, the gene flow from Africa to Western
Europe could be dated to before 1700 (de Knijff, 2007;
King et al., 2007b), indicating the complexity of gene
flow events even before the Industrial Revolution in
Western Europe at the beginning of the 19thcentury.
Another recent Y-chromosomal study found evidence for
a past gene-flow event from Northern France to Flanders
(Larmuseau et al., 2012b). Based on extensive genealogi-
cal data and the surnames of the DNA donors, this gene
flow event could be dated to between 400 years ago, as
long as reliable genealogical data exists for many families
in Western Europe, and 600 years ago, when paternally
heritable surnames came into use in these regions.
Thirdly, in addition to the use of genealogical data as
a selection criterion for recruiting suitable participants
or as background information of the donors, this data
may also be used to analyze the temporal differentiation
within a certain region during the last centuries. When
the genealogical data is linked to haploid markers that
do not mutate in a genealogical time period, it is possible
to study indirectly the population differentiation of each
time period during the last centuries. As such this
approach, enables one to study how dynamic human
population structures were in recent times and what the
effects are of demographic changes, which are still
largely unknown especially for historical and recent
events (Kayser et al., 2005). In other words, this
approach will increase the ‘archaeogenetic’ power of
genetic studies through the analysis of historical records
and pedigrees, which was already partly possible using
only surnames (Bowden et al., 2008; Boattini et al.,
2011; Darlu et al., 2012). Despite the usefulness of this
approach, it is only applicable on haploid markers, thus
indicating the added value of haploid data compared to
A recent study on the West-European region of Bra-
bant introduced this genetic-genealogical approach (Lar-
museau et al., 2012a). “Autochthonous” West European
males were selected for this study based on the criterion
that each donor had at least one ancestor in his patriline
that had his residence within the West European region
of Brabant. The Y chromosomes of all these unrelated
donors were genotyped at high phylogenetic resolution
and linked to particular locations within Brabant at spe-
cific time periods based on the genealogical records. A
clear North-South gradient of frequencies of Y-chromo-
somal haplogroups detected in Brabant faded in time,
and completely disappeared after the Industrial Revolu-
tion at the beginning of the 19thcentury. Since the sec-
ond half of the 19th century a new gradient within
Brabant was observed. The origin of this gradient
accords most likely to the break between the northern
part and southern parts of Brabant due to the separa-
tion of The Netherlands and Belgium in 1830, as
expected based on the demographic changes of the popu-
lations in both countries after the separation (NIDI,
2003). These results indicate the high velocity of popula-
tion differentiation between countries and regions as
seen in the genomic analysis for Europe by Novembre
et al. (2008) due to historical events.
Recreational genetic genealogy as a scientific
Collecting genealogical data for each DNA donor in a
population geneticstudyis verytime-consuming.
However, it is our opinion that the strong interest in
(genetic) genealogy by the public and the rapidly
expanding available genealogical databases will eventu-
ally provide enough data to perform this job more time-
efficiently in the future. Today, the research work to find
genealogical data is a hobby for many people from differ-
ent social classes and in many parts of the world.
Recently, genetics added a new dimension to genealogi-
cal research: a discipline called genetic genealogy.
Genetic genealogy is useful for recreationists to verify
their paternal and maternal lineages and to extend their
lineages using haploid markers from the Y chromosome
and mtDNA respectively. Especially in the case of the Y
chromosome it is of interests that the biological patriline
is not always the same as the legal patriline, due to non-
paternity events (Kayser et al., 2007), which are much
more frequent than non-maternity events (e.g., due to
adoption or a baby switch in the hospital). Moreover,
genetic genealogy is also very useful to find a connection
between families of a similar village or region with the
same surname and between families with different spell-
ing variants of a similar surname (e.g., a family with
surname ‘Lernout’ versus one with surname ‘Larnout’)
without any archival evidence of a relationship between
the families. Therefore, genetic genealogy now opens up
possibilities to answer some relevant research questions
which were impossible to answer without DNA evidence.
The usefulness of genetic genealogy and the many
examples of successful retracements of famous family
histories thanks to genetics as those of the third US
president Thomas Jefferson (Foster et al., 1998; King
et al., 2007a); Louis XVII and Marie-Antoinette of
France (Jehaes et al., 1998; Jehaes et al., 2001), and the
Romanov family (Gill et al., 1994; Rogaev et al., 2009)
resulted in an enormous success of genetic genealogical
research which is still growing exponentially. Moreover,
(commercial) DNA tests have become commonly accessi-
ble and many genealogical associations organize genetic
projects to accommodate the needs of their members.
The Internet and its social network sites make it very
easy for genetic genealogists to interpret their results
and to contact other genealogists with similar genetic
results. A list of the main available genetic genealogical
databases is given in King and Jobling (2009). Today,
most of these databases are dealing only with haploid
markers and thus paternal and maternal genealogical
sequencing possibilities genome-wide data is now also
starting to become of interest for a broad public includ-
ing genealogists (King and Jobling, 2009).
Genetic genealogical initiatives are sometimes unfairly
treated as commercially available tests of genetic ances-
try which have scientific limitations (Bolnick et al.,
2007). Moreover, genealogists often interpret their own
results emotionally instead of rationally which is com-
prehensible when one is dealing with his/her own family
history. Therefore, many scientists are hesitant to work
with (genetic) genealogical data. Of course, one also
expects a lower quality control of data available from
amateur genealogists. Yet this may be avoided when the
scientists work together with research groups on histori-
cal demography, genealogical societies or huge genealogi-
cal databaseswith micro-data
(Tulinius, 2011), Canada (www.uqac.ca/balsac) or The
Netherlands (Alter et al., 2009), to verify specific family
trees. The rapidly enlarging genetic genealogical data-
bases have in recent years become increasingly useful to
asfor e.g. Iceland
HUMAN POPULATION GENETICS AND SAMPLING DESIGN
American Journal of Physical Anthropology
scientists, for example to select specific DNA donors suit-
able for finding novel phylogenetic markers within rare
(Sims et al., 2009). Today, genetic genealogy is becoming
so familiar that scientists may use high-quality data-
bases to study the population genetic structure and evo-
lutionary patterns within a specific region or population
and to survey the signatures of past migrations and
other demographic events.
Pitfalls for including genealogy in population
Including genealogical data in population genetics is
not without any pitfalls, as no sampling procedure is opti-
mal. Although genealogy is of widespread interest across
all social layers of society, especially in the Western world
(Simons, 2007), this framework will only select research
participants who are more or less interested in their gene-
alogy and for whom archival information about their gene-
alogy was traceable. Therefore it is always essential to
know if the sampling is representative for a specific popu-
lation. One possibility to test the representativeness of a
sample for the full population is the condition that the fre-
quency of the most frequent surnames in the sample has
to be the same as in the population (Larmuseau et al.,
2012a). Another potential bias in the sampling procedure
is that most genealogists will only search in their regional
archives. As such, scientists may overlook essential past
migrations because this may exclude a specific part of the
population from the sampling. To avoid this bias, the cur-
rent research projects using genetic-genealogy data are
also including in extension the approach of Bowden et al.
(2008), by selecting DNA donors based on surnames which
are found in archival records before specific migration
events. This is now being done in the ongoing Roman
DNA project in Flanders. Nevertheless, in many countries
there are ongoing efforts to digitize archival documents to
make them (freely) available on the internet (e.g., in Bel-
gium, the Demogen initiative, http://search.arch.be/en/
zoeken-naar-personen; and in The Netherlands, the Wie-
WasWie initiative, http://www.wiewaswie.nl). As such, the
genealogist will not only be able to search in archives in
his/her neighborhood but will also have the possibility to
search in archives around the world.
Another pitfall in including genealogical data in popu-
lation genetics is the possibility of mistakes in the link
between genetics and genealogy for participants, espe-
cially because of the occurrence of unknown nonpater-
nity events (unknown nonmaternity events are rare). On
average, 1.9% of the contemporary fathers who are confi-
dent that they are the biological father of their child
turned out not to be, after genetic analysis (Anderson,
2006; Strassmann et al., 2012). Because the whole popu-
lation is a combination of fathers with high and low con-
fidence about their biological paternity of their child, the
overall non-paternity rate is expected to be in the range
of 1–5% per generation for any human population and
cross-culturally (Anderson, 2006; King and Jobling,
2009). Nevertheless if the sampling is large enough the
effect of nonpaternity on the results is expected to be
low as shown by the results of some previous studies
which analyzed patterns of three or four centuries ago
(Bowden et al., 2008; Larmuseau et al., 2012a,b).
Other pitfalls discussed here are more specific for the
genetic-genealogical approach whereby in-depth pedigree
data is combined with haploid markers to disentangle
signals of temporal differentiation within a population
approach may generate a virtual temporal sampling so
that it is possible to look back indirectly how the situa-
tion was in historical periods. This approach is compara-
ble to paintings of Pieter Brueghel the Elder which show
indirectly how the society was organized in the 16th cen-
tury (Silver, 2011). Nevertheless, like those paintings
and all other historical records, this approach shows a
selective and therefore not unbiased picture of the past
(Jobling et al., 2004). The genetic-genealogical frame-
work will only give indications about the temporal differ-
entiation between populations based on present-day
DNA donors. Therefore this is an indirect temporal sam-
ple limited to individuals who had progeny till today and
such a sample will therefore not necessarily represent
the whole population at a certain point in the past. One
study showed that 90% of all the males and females who
lived in Iceland in the period 1698–1742 had no direct
patrilineal (male-line) or matrilineal (female-line) de-
scendant in the period 1972–2002 (Helgason et al., 2003).
Although the strength of this process is different within
each population, the effect of genetic drift as evolutionary
force has always to be taken into account while interpret-
ing the results of the genetic-genealogical approach.
It is expected that the frequencies of haploid variants
will change strongly over time when mtDNA and/or the
Y chromosome are under selection. When selection is
involved, the genetic-genealogical approach will not be
able to provide good indications of the frequencies of the
haploid variants in a past period. Natural selection is
therefore a potential problem in the analysis of non-
recombinant loci on mtDNA and the Y chromosome.
However, so far only unclear indications for selection on
those markers are found. For example, mtDNA hap-
logroup H would have been selected during a past dis-
ease-episode as hypothesized after the comparison of old
samples and current observations of sepsis patients
(Jobling, 2012). Males with Y-chromosomal haplogroup I
would have a faster progression to HIV/AIDS after infec-
tion (Jobling, 2012) and they also would have a 50%
higher chance for developing coronary artery disease
than males with Y-chromosomal haplogroup R (Charchar
et al., 2012). Even when the selection on haploid markers
is negligible, it is important to be aware of the fact that
this approach is not an alternative for ancient DNA for
population genetic studies but only gives indications of
temporal differentiation in a population structure.
Finally, the Y chromosome and mtDNA are good
markers to study population structure and to trace his-
torical migrations (Underhill and Kivisild, 2007). Never-
theless, both markers are single loci and therefore
provide also single windows into the past (Colonna
et al., 2011). Nowadays there is more and more interest
in studying the full genome sequences of ancient and
modern human samples to understand human popula-
tion history and to survey past demographical events
(Colonna et al., 2011; Stoneking and Krause, 2011).
Future research studies should focus therefore on the
combination of pedigrees and full genome approaches to
enlarge the scope of the genetic-genealogical approach
which is currently only applicable to haploid markers.
the lastcenturies. This
CONCLUSION AND FUTURE RESEARCH
This Perspective argues that the implementation of
extended genealogical data will improve the quality of
M.H.D. LARMUSEAU ET AL.
American Journal of Physical Anthropology
population genetic research and has the ability to pro-
vide new insights in human population genetic struc-
tures and past demographic events. Therefore we want
to promote the use of genealogical information in future
population genetic studies for two main reasons.
First, the implementation of genealogy in population
genetics is important to select research participants to
provide a good reference sample for a certain population.
This enlarges the possibilities to observe a significant
population structure which would otherwise be invisible
(Jobling, 2012; Winney et al., 2012). Moreover, it is as
well possible to interpret the results correctly (Darlu
et al., 2012). Therefore, it is possible to formulate several
criteria for the selection of DNA donors to provide
researchers an ideal authentic sample for a certain
- The full genealogy of the DNA donor for a consider-
able number of generations has to provide evidence that
the donor is ancestrally associated to the selected region
at least for the last couple of centuries.
- No (recent) relatives of each donor (that are
identical-by-descent for the portion of DNA under study)
are included in the sample based on the full genealogies
of all donors.
- The surname of the research participants and those
of his or her known maternal ancestors are present in
archival documents, which provide evidence that the
surnames were present in the region before historical
periods with major migration events.
- The origin, the toponym, the language of dialect of
the surname of the DNA donor and those of his or her
known maternal ancestors are not referring to an area
outside the selected region.
- The distribution of the surname of the participant
and those of his or her known maternal ancestors are
mainly restricted to the selected region.
- The occurrence of non-paternity events in the geneal-
ogy of the donor is tested by comparison with the haplo-
types of patrilineal relatives.
If the population sample meets these criteria, then
this sampling is the optimal reference for a particular
region based on present living persons. Especially for a
Y-chromosomal study these criteria for the patrilineal
genealogy and the surname of the donor himself, are
essential for the quality of the research.
Second, in addition to the selection of research partici-
pants based on their genealogical data, it is also relevant
to link genealogical information to haploid markers to
obtain a virtual sampling in time and to estimate the
relevance and time periods of past demographic events.
At this moment the genetic-genealogical approach is
only possible based on the Y chromosome or mtDNA,
which makes these markers still relevant in the whole-
genome era. Although this genetic-genealogical approach
provides only indications about the temporal changes in
haplogroup frequencies in a population, this provides a
more objective manner to explain an observed population
genetic pattern. Today, the problem is that geneticists
will link a genetic pattern by ‘historical cherry-picking’
to a well-known historical event which is able to explain
the observed pattern (Jobling, 2012). It is tempting to
link a drastic change in the population genetic pattern
immediately to well-known events such as the fall of the
Western Roman Empire in 476 or the European coloni-
zation of America after 1492. However, it may also be
dealing with a similar or even more drastic demographi-
cal event which occurred recently for which there is less
information available (Altmann et al., 2012). Therefore,
the genetic-genealogical approach may provide more
objective indications to explain a genetic pattern in asso-
ciation with other academic disciplines, such as histori-
cal sciences, demography, archeology, anthropology and
To collect all the required genealogical data in a time-
efficient manner and with a high quality control,
(genetic) genealogical databases in collaboration with
genealogical societies may be interesting. Genealogists
are interested in this research field and they are often
willing to give a DNA sample if their genetic data is not
yet available on the internet. Moreover, genealogists
usually have a good basic knowledge of what to expect
from their genetic results and what the risks are, which
is of course important for ethical reasons. Therefore,
there are many opportunities for fruitful cooperation
between the academic community and the genealogical
associations in the future.
Maarten H.D. Larmuseau is postdoctoral fellow of the
FWO-Vlaanderen (Research Foundation-Flanders). MvO
was supported in part by the Netherlands Forensic Insti-
tute (NFI) and by a grant from the Netherlands Genomics
Initiative (NGI)/Netherlands Organization for Scientific
Research (NWO) within the framework of the Forensic
Genomics Consortium Netherlands (FGCN). The authors
want to thank Francesc Calafell, Franz Manni, Hendrik
Larmuseau, Tom Havenith, Carla Verissimo and one anon-
ymous referee for their comments on an earlier version of
the manuscript. Thanks also to Emma Bourne for improv-
ing the manuscript linguistically.
Alter G, Mandemakers K, Gutmann M. 2009. Defining and dis-
tributing longitudinal historical data in a general way
through an intermediate structure. Hist Soc Res 34:78–114.
Altmann DM, Balloux F, Boyton RJ. 2012. Diverse approaches
to analysing the history of human and pathogen evolution:
how to tell the story of the past 70,000 years. Introduction.
Philos Trans R Soc B-Biol Sci 367:765–769.
Anderson KG. 2006. How well does paternity confidence match
actual paternity? Evidence from worldwide nonpaternity
rates. Curr Anthropol 47:513–520.
Avise JC. 2000. Phylogeography: the history and formation of
species. Cambridge, MA: Harvard University Press.
Balanovsky O, Rootsi S, Pshenichnov A, Kivisild T, Churnosov
M, Evseeva I, Pocheshkhova E, Boldyreva M, Yankovsky N,
Balanovska E, et al. 2008. Two sources of the Russian patri-
lineal heritage in their Eurasian context. Am J Hum Genet
Bherer C, Labuda D, Roy-Gagnon MH, Houde L, Tremblay M,
V? ezina H. 2011. Admixed ancestry and stratification of Que-
bec regional populations. Am J Phys Anthropol 144:432–441.
Boattini A, Luiselli D, Sazzini M, Useli A, Tagarelli G, Pettener
D. 2011. Linking Italy and the Balkans. A Y-chromosome per-
spective from the Arbereshe of Calabria. Ann Hum Biol
Boattini A, Useli A, Pettener D. 2012. Reconstructing past
genetic structures in recently transformed populations: sur-
names and Y chromosomes in the Upper Savio Valley (Cen-
tral Apennines, Italy).
Bolnick DA, Fullwiley D, Duster T, Cooper RS, Fujimura JH,
Kahn J, Kaufman JS, Marks J, Morning A, Nelson A, et al.
HUMAN POPULATION GENETICS AND SAMPLING DESIGN
American Journal of Physical Anthropology
2007. The science and business of genetic ancestry testing.
Bowden GR, Balaresque P, King TE, Hansen Z, Lee AC, Pergl-
Wilson G, Hurley E, Roberts SJ, Waite P, Jesch J, et al. 2008.
Excavating past population structures by surname-based
sampling: the genetic legacy of the Vikings in northwest Eng-
land. Mol Biol Evol 25:301–309.
Charchar FJ, Bloomer LDS, Barnes TA, Cowley MJ, Nelson CP,
Wang YZ, Denniff M, Debiec R, Christofidou P, Nankervis S,
et al. 2012. Inheritance of coronary artery disease in men: an
analysis of the role of the Y chromosome. Lancet 379:915–
Colonna V, Nutile T, Ferrucci RR, Fardella G, Aversano M, Bar-
bujani G, Ciullo M. 2009. Comparing population structure as
inferred from genealogical versus genetic information. Eur J
Hum Genet 17:1635–1641.
Colonna V, Pagani L, Xue Y, Tyler-Smith C. 2011. A world in a
grain of sand: human history from genetic data. Genome Biol
Cruciani F, Trombetta B, Antonelli C, Pascone R, Valesini G,
Scalzi V, Vona G, Melegh B, Zagradisnik B, Assum G, et al.
revealed by Y chromosome SNPs M269, U106 and U152. For-
ensic Sci Int-Genet 5:E49–E52.
Cuveliers EL, Larmuseau MHD, Hellemans B, Verherstraeten
SLNA, Volckaert FAM, Maes GE. 2012. Multi-marker esti-
mate of genetic connectivity of sole (Solea solea) in the North-
East Atlantic Ocean. Marine Biol 159:1239–1253.
Darlu P, Bloothooft G, Boattini A, Brouwer L, Brouwer M, Bru-
net G, Chareille P, Cheshire J, Coates R, Dr€ ager K, et al.
2012. The family name as socio-cultural feature and genetic
metaphor: from concepts to methods. Hum Biol 84:169–214.
de Knijff P. 2007. Hidden African ancestors: hidden secrets of
your ancestors. Eur J Hum Genet 15:509–510.
Foster EA, Jobling MA, Taylor PG, Donnelly P, de Knijff P,
Mieremet R, Zerjal T, Tyler-Smith C. 1998. Jefferson fathered
slave’s last child. Nature 396:27–28.
Gagnon A, Heyer E. 2001. Fragmentation of the Quebec popula-
tion genetic pool (Canada): evidence from the genetic contri-
bution of founders per region in the 17th and 18th centuries.
Am J Phys Anthropol 114:30–41.
Gill P, Ivanov PL, Kimpton C, Piercy R, Benson N, Tully G,
Evett I, Hagelberg E, Sullivan K. 1994. Identification of the
remains of the Romanov family by DNA analysis. Nat Genet
Helgason A, Hrafnkelsson B, Gulcher JR, Ward R, Stefansson
K. 2003. A populationwide coalescent analysis of Icelandic
matrilineal and patrilineal genealogies: evidence for a faster
evolutionary rate of mtDNA lineages than Y chromosomes.
Am J Hum Genet 72:1370–1388.
Jehaes E, Decorte R, Peneau A, Petrie JH, Boiry PA, Gilissen
A, Moisan JP, Van den Berghe H, Pascal O, Cassiman JJ.
1998. Mitochondrial DNA analysis on remains of a putative
son of Louis XVI, King of France and Marie-Antoinette. Eur J
Hum Genet 6:383–395.
Jehaes E, Pfeiffer H, Toprak K, Decorte R, Brinkmann B, Cassi-
man JJ. 2001. Mitochondrial DNA analysis of the putative
heart of Louis XVII, son of Louis XVI and Marie-Antoinette.
Eur J Hum Genet 9:185–190.
Jobling MA. 2012. The impact of recent events on human
genetic diversity. Philos Trans R Soc B-Biol Sci 367:793–799.
Jobling MA, Hurles ME, Tyler-Smith C. 2004. Human evolu-
tionary genetics: origins, peoples and disease. London/New
York: Garland Science Publishing. .
Kayser M, Lao O, Anslinger K, Augustin C, Bargel G, Edel-
mann J, Elias S, Heinrich M, Henke J, Henke L, et al. 2005.
Significant genetic differentiation between Poland and Ger-
many follows present-day political borders, as revealed by Y-
chromosome analysis. Hum Genet 117:428–443.
Kayser M, Vermeulen M, Knoblauch H, Schuster H, Krawczak
M, Roewer L. 2007. Relating two deep-rooted pedigrees from
Central Germany by high-resolution Y-STR haplotyping. For-
ensic Sci Int-Genet 1:125–128.
King TE, Bowden GR, Balaresque PL, Adams SM, Shanks ME,
belongs to a rare European lineage. Am J Phys Anthropol
King TE, Jobling MA. 2009. What’s in a name? Y chromosomes,
surnames and the genetic genealogy revolution. Trends Genet
King TE, Parkin EJ, Swinfield G, Cruciani F, Scozzari R, Rosa
A, Lim SK, Xue YL, Tyler-Smith C, Jobling MA. 2007b. Afri-
cans in Yorkshire? The deepest-rooting clade of the Y phylog-
eny within an English genealogy. Eur J Hum Genet 15:288–
Larmuseau MHD, Ottoni C, Raeymaekers JAM, Vanderheyden
N, Larmuseau HFM, Decorte R. 2012a. Temporal differentia-
tion across a West-European Y-chromosomal cline: genealogy
as a tool in human population genetics. Eur J Hum Genet
Larmuseau MHD, Vanderheyden N, Jacobs M, Coomans M,
Larno L, Decorte R. 2011. Micro-geographic distribution of Y-
chromosomal variationin the
region Brabant. Forensic Sci Int-Genet 5:95–99.
Larmuseau MHD, Vanoverbeke J, Gielis G, Vanderheyden N,
Larmuseau HFM, Decorte R. 2012b. In the name of the mi-
grant father: analysis of surname origin identifies historic
admixture events undetectable from genealogical records. He-
Marynissen A. 2011. Namen. In: van der Sijs N, editor. Dialec-
tatlas van het Nederlands. Amsterdam: Uitgeverij Bert Bak-
ker. p 300–353.
Moreau C, V? ezina H, Jomphe M, Lavoie EM, Roy-Gagnon MH,
and Labuda D. 2011. When genetics and genealogies tell dif-
ferent stories: maternal lineages in Gaspesia. Ann Hum
Moreau C, Vezina H, Yotova V, Hamon R, De Knijff P, Sinnett
D, Labuda D. 2009. Genetic heterogeneity in regional popula-
tions of Quebec-parental lineages in the Gasp? e Peninsula. Am
J Phys Anthropol 139:512–522.
Morozova I, Evsyukov A, Kon’kov A, Grosheva A, Zhukova O,
Rychkov S. 2012. Russian ethnic history inferred from mito-
chondrial DNA diversity. Am J Phys Anthropol 147:341–351.
NIDI. 2003. Bevolkingsatlas van Nederland: demografische ont-
wikkeling van 1850 tot heden. Rijswijk: Elmar B.V.
Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A,
Indap A, King KS, Bergmann S, Nelson MR, et al. 2008.
Genes mirror geography within Europe. Nature 456:274–274.
Rasmuson M. 2008. Genealogy and gene trees. Hereditas
Rogaev EI, Grigorenko AP, Moliaka YK, Faskhutdinova G, Golt-
sov A, Lahti A, Hildebrandt C, Kittler ELW, Morozova I.
2009. Genomic identification in the historical case of the
Nicholas II royal family. Proc Natl Acad Sci USA 106:5258–
Roy-Gagnon MH, Moreau C, Bherer C, St-Onge P, Sinnett D,
Laprise C, V? ezina H, Labuda D. 2011. Genomic and genealog-
ical investigation of the French Canadian founder population
structure. Hum Genet 129:521–531.
Scriver CR. 2001. Human genetics: lessons from Quebec popula-
tions. Annu Rev Genomics Hum Genet 2:69–101.
Silver L. 2011. Pieter Bruegel. New York: Abbeville Press.
Simms TM, Martinez E, Herrera KJ, Wright MR, Perez OA,
Hernandez M, Ramirez EC, McCartney Q, Herrera RJ. 2011.
Paternal lineages signal distinct genetic contributions from
British loyalists and continental Africans among different
Bahamian islands. Am J Phys Anthropol 146:594–608.
Simons J. 2007. Out of Africa. Fortune 155:37–40.
Sims LM, Garvey D, Ballantyne J. 2009. Improved resolution
haplogroup G phylogeny in the Y chromosome, revealed by a
set of newly characterized SNPs. PLoS One 4:e5792.
Stoneking M, Krause J. 2011. Learning about human popula-
tion history from ancient and modern genomes. Nat Rev
Strassmann BI, Kurapati NT, Hug BF, Burke EE, Gillespie BW,
Karafet TM, Hammer MF. 2012. Religion as a means to
assure paternity. Proc Natl Acad Sci USA 109:9781–9785.
M.H.D. LARMUSEAU ET AL.
American Journal of Physical Anthropology
Tulinius H. 2011. Multigenerational information: the example of
the Icelandic genealogy database. In: Dillner J, editor. Meth-
ods in biobanking, methods in molecular biology. New York:
Springer Science1Business Media. p 221–229.
Tyler-Smith C, Xue YL. 2012. Local, rural and British: a British
approach to sampling. Eur J Hum Genet 20:129–130.
Underhill PA, Kivisild T. 2007. Use of Y chromosome and mito-
chondrial DNA population structure in tracing human migra-
tions. Annu Rev Genet 41:539–564.
van Uytven R, Bruneel C, Koldeweij AM, van de Sande AWFM,
van Oudheusden JAFM. 2011. Geschiedenis van Brabant van
het hertogdom tot heden. Zwolle: Uitgeverij Waanders.
Weale ME, Weiss DA, Jager RF, Bradman N, Thomas MG.
2002. Y chromosome evidence for Anglo-Saxon mass migra-
tion. Mol Biol Evol 19:1008–1021.
Winney B, Boumertit A, Day T, Davison D, Echeta C, Evseeva
I, Hutnik K, Leslie S, Nicodemus K, Royrvik EC, et al. 2012.
People of the British Isles: preliminary analysis of genotypes
and surnames in a UK-control population. Eur J Hum Genet
Zalloua PA, Platt DE, El Sibai M, Khalife J, Makhoul N, Haber
M, Xue Y, Izaabel H, Bosch E, Adams SM, et al. 2008. Identi-
fying genetic traces of historical expansions: Phoenician foot-
prints in the Mediterranean. Am J Hum Genet 83:633–642.
HUMAN POPULATION GENETICS AND SAMPLING DESIGN
American Journal of Physical Anthropology