ArticlePDF Available

Abstract and Figures

The Y-chromosome has been widely used in ancestry inference based on its region-specific haplogroup distributions. However, there is always a debate on how informative such a single marker is for inferring an individual's genetic ancestry. Here, we compared genetic ancestry inferences at continental level made by Y-chromosomal haplogroups to those made by autosomal single-nucleotide polymorphisms in 1230 samples of Affymetrix Human Origins dataset. The highest ancestry proportions of a majority of individuals match the highest average continental-ancestry proportions in haplogroups A, B, D, H, I, K, L, T, O, and M. The high consistencies have not been observed in haplogroups E, C, G, J, N, Q, and R, but in some of their sublineages, such as E1a, E1b1a1, E1b1b1b1a, E2b1a, J1a2b, Q1a1a1, Q1a2a1a1, R1b1a2a1a, and R2. Although the consistencies of Y-chromosomal and autosomal continental ancestry vary among haplogroups, Y-chromosome could provide valuable clues for individual's continental ancestry.
Content may be subject to copyright.
Volume 2 | Issue 4 | October - December 2016
ISSN 2349-5014
E-ISSN 2455-0094
Institute of Evidence Law and Forensic Science,
China University of Political Science and Law
Collaborative Innovation Center of Judicial Civilization, China
Sponsored by : The “111 Plan” of China – Evidence Science Innovation and Talent Base
The relevant physical trace in criminal
A Gold Nanoparticle-Enhanced Surface Plasmon
Resonance Aptasensor for the Detection of TNT
Meta-analysis of the association between
serotonin transporter polymorphisms and sudden
infant death syndrome
The consistencies of Y chromosomal and
autosomal continental ancestry vary among
Journal of Forensic Science and Medicine • Volume 2 • Issue 4 • October - December 2016 • Pages 123-????
Submit Your Paper Today!
Sign up for e-Table of Contents
Please register here to receive an 'e-Table of Contents' directly to your inbox for every issue of
Journal of Forensic Science and Medicine
as soon as they are published:
Editor Contact Details
For further details please visit:
Front Cover
Back Cover
The JFSM iOS App for iPAD
is here!
© 2016 Journal of Forensic Science and Medicine | Published by Wolters Kluwer - Medknow 229
Brief Communication
With the advantages of lack of recombination, strict paternal
inheritance, small effective population size, low mutation
rate, sufcient markers, and population‑specic haplotype
distribution, Y-chromosome has been widely used in
anthropology, population genetics, and forensic genetics to
understand population genetic structure, population history,
andforensicidentications.[1] Y-chromosome has also inspired
widespread public interest to trace paternal ancestors and
been commercially used by many companies. A very famous
example is the Y-chromosomal type of Genghis Khan,
which was supposed to belong to the “star cluster” under
haplogroup C3*-M217 (xM48)[2] and has gained extensive
attention and attracted numerous consumers to get tested.
However, as Y-chromosome is only a single marker and suffers
from severe genetic drift, such simple ancestry analyses tend to
overlook the contribution of the vast majority of an individual’s
ancestors to his/her genome.[3,4]
There are also many alternative ancestry inference
methods, such as testing mitochondrial DNA (mtDNA),[4]
genome-wide short tandem repeat (STR)[5] or single-nucleotide
polymorphism (SNP),[6] and ancestry informative
markers (AIMs).[7] The mtDNA is maternally inherited and
has been widely used to trace maternal history. Genome-wide
STRs, SNPs, and AIMs are usually applied to inferring a
detailed composition of an individual’s ancestry. However,
some recent genome-wide studies have revealed frequent
discrepancies between ancestry inferences using mtDNA
versus autosomal SNPs.[4,8] The mtDNA case reminds us to
rethink how much ancestry information that Y-chromosome
could give and the accuracy of Y-chromosomal ancestry
inference compared to that of genome-wide ancestry
The Consistencies of Y‑Chromosomal and Autosomal
Continental Ancestry Varying among Haplogroups
Chuan‑Chao Wang1,2, Lei Shang3, Hui‑Yuan Yeh4, Lan‑Hai Wei5
1Department of Genetics, Harvard Medical School, Boston, MA, USA, 2Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena,
Germany, 3Key Laboratory of Forensic Genetics, Institute of Forensic Science, Ministry of Public Security, Beijing, China, 4School of Humanities and Social Sciences,
Nanyang Technological University, Singapore, 5State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology,
Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
debate on how informative such a single marker is for inferring an individual’s genetic ancestry. Here, we compared genetic ancestry inferences
at continental level made by Y-chromosomal haplogroups to those made by autosomal single-nucleotide polymorphisms in 1230 samples of
Affymetrix Human Origins dataset. The highest ancestry proportions of a majority of individuals match the highest average continental-ancestry
proportions in haplogroups A, B, D, H, I, K, L, T, O, and M. The high consistencies have not been observed in haplogroups E, C, G, J, N, Q,
and R, but in some of their sublineages, such as E1a, E1b1a1, E1b1b1b1a, E2b1a, J1a2b, Q1a1a1, Q1a2a1a1, R1b1a2a1a, and R2. Although
the consistencies of Y-chromosomal and autosomal continental ancestry vary among haplogroups, Y-chromosome could provide valuable
clues for individual’s continental ancestry.
Key words: Ancestry inference, autosomal single-nucleotide polymorphism, Y-chromosome
Access this article online
Quick Response Code:
Address for correspondence: Dr. Chuan‑Chao Wang,
Department of Genetics, Harvard Medical School,
Boston, MA, USA.
Department of Archaeogenetics, Max Planck Institute for the Science of
Human History, Jena, Germany.
This is an open access article distributed under the terms of the Creative Commons
Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix,
tweak, and build upon the work non-commercially, as long as the author is credited and
the new creations are licensed under the identical terms.
For reprints contact:
How to cite this article: Wang CC, Shang L, Yeh HY, Wei LH. The
Consistencies of Y-Chromosomal and Autosomal Continental Ancestry
Varying among Haplogroups. J Forensic Sci Med 2016;2:229-32.
Received: 08-05-2016
Revised: 10-08-2016
Accepted: 11-09-2016
Wang, et al.: Y Chromosomal and Autosomal Ancestry
Journal of Forensic Science and Medicine ¦ October 2016 ¦ Volume 2 ¦ Issue 4
estimation. Here, we presented a comprehensive analysis using
Y-chromosomal and genome-wide autosomal SNP data of more
than 1200 male individuals from Affymetrix Human Origins
dataset[6] to directly and quantitatively assess the consistency
of Y-chromosomal and autosomal continental ancestry.
MaterIals and Methods
The Y-chromosomal and autosomal genotype data for
1230 male individuals were extracted from Affymetrix
Human Origins dataset[6] using EIGENSOFT[9] and PLINK.[10]
Y‑chromosomal haplogroups were classied based on the
International Society of Genetic Genealogy phylogenetic
tree at January 28, 2015 ( We used
ADMIXTURE v. 1.23[11] to estimate ancestry proportions
for 1230 males with 594,924 autosomal SNPs. Each run
involved 100 replicates with different random starting seeds,
default 5-fold cross-validation, and varying the number
of ancestral populations K from 2 to 12. At K = 8, the
samples were well assigned to eight continental regions:
Africa, Middle East, Europe, South Asia, East Asia, Siberia,
Oceania, and Americas. The average continental-ancestry
proportions within each Y-chromosomal haplogroup, standard
deviation (SD) of individual continental-ancestry percentages
for each continental region in each haplogroup, mean pairwise
Euclidean distance (d) within each haplogroup, and consistency
scores were all calculated according to Emery et al.[4] The
graphical displays for ancestry plot were carried out in
R statistical software v3.0.2.[12]
The Human Origin dataset contained male samples of
worldwide lineages from Y-chromosomal haplogroups A, B,
C, D, E, F, G, H, I, J, K, L, M, N, O, Q, R, S, and T [Table S1].
All the haplogroups, except A, B, K, M, and S, were found
in more than one continent. Haplogroups A and B were only
discovered in Africa whereas K, M, and S were only presented
in Oceania. Likewise, haplogroups A and B had predominately
African ancestry whereas K, M, and S had predominately
Oceanian ancestry [Table S2 and Figure 1]. Haplogroups C,
D, and O were frequent in populations from East Asia. The
East Asian ancestry proportions in haplogroups D and O were
extremely higher than those of other continents. East Asian and
Siberian ancestry seemed to contribute equally to individuals of
haplogroup C [Table S2 and Figure 1]. Haplogroup E reached
high frequencies in Africa and Middle East. The African and
Middle Eastern ancestries are also the two main components
for individuals of haplogroup E. Haplogroups L, H, and R
were frequent in South Asia, and R was also found at very high
frequency in Europe. Haplogroups L and H had predominately
South Asian ancestry. The maximum ancestry proportion
of haplogroup R was from Europe, and the second- and
third-highest ancestry proportions were from South Asia and
Middle East, respectively. The frequencies of haplogroups I,
Q, and T were enriched in Europe, Americas, and Middle East,
respectively. Similarly, the ancestry proportions of the above
three regions also reached highest in haplogroups I, Q, and T,
respectively. Haplogroups G, J, and N were found in various
regions [Table S1] and their genetic ancestries also varied.
Collectively, we found a good correlation between haplogroup
frequencies and continental-ancestry proportions.
We then estimated the SD of individual continental-ancestry
percentages within each haplogroup. The continental-ancestry
proportions varied considerably among individuals in the
majority of haplogroups, especially in haplogroups E, N, K,
Q, and R (SD > 0.3) [Table S3]. We also calculated the mean
pairwise Euclidean distance between continental-ancestry
proportions among individuals within each haplogroup,
which is a quantitative measure of the inter-individual
variability.[4] Consistent with the SD results, the mean pairwise
Euclidean distances in haplogroups E, C, K, N, Q, and R were
high (>0.5) [Table S4], suggesting that these haplogroups are
not very informative for inferring individual’s continental
ancestry. In contrary, the distance in haplogroups A, B, D,
K, O, and M were relatively low [Table S4], indicating a
strong association between geographic-ancestry compositions
and a certain haplogroup. To directly and quantitatively
assess how informative Y-chromosome is in inferring an
individual’s genetic ancestry, we calculated the consistency
score within each haplogroup. The score was the proportion
of individuals with continental ancestry >50% matching
highest continental-ancestry component of the haplogroup
in our dataset. The consistency ranged from 0.333 to 1.000
with a mean of 0.697 in major haplogroups from A to
R, and about 65% of Y-chromosomal haplogroups had a
consistency score >50% [Table S4]. The high consistency
had been observed in haplogroups A, B, D, H, I, K, L, T, O,
and M, meaning that these haplogroups could be regarded as
having substantial genetic ancestry from their corresponding
continents. The consistency values in haplogroups C, G, J, N,
each of these haplogroups to a certain continent.
The haplogroups with high SD, high Euclidean distance,
and low consistency are all continent-wide distributed
lineages. However, some of their sublineages are regional
specic.Forinstance,Q1a2a1a1‑M3 is almost exclusively
distributed in Americas. It is very likely that these sublineages
also have exclusive continental ancestry. The frequency of
Q1a2a1a1 in our dataset reached 36.2% of all Q individuals.
This lineage, with more than 90% of American ancestry
and a consistency value of 0.88, could be undisputedly
classied intoAmericas. Haplogroups E1a, E1b1a1, and
E2b1a, comprising more than half of all haplogroup E
individuals, had exclusive African ancestry with extremely
high consistency scores (nearly 1) and low Euclidean
distance and SD. In addition, haplogroup E1b1b1b1a,
accounting for 12% of all haplogroup E individuals, could
be reasonably assigned to Middle East with a consistency
value of 0.87. Haplogroup R1b1a2a1a, making up 32.6% of
all R individuals, had a strong association with Europe, while
R2 samples had substantial South Asian genetic ancestry.
Wang, et al.: Y Chromosomal and Autosomal Ancestry
Journal of Forensic Science and Medicine ¦ October 2016 ¦ Volume 2 ¦ Issue 4 231
Figure 1: (a) Haplogroup‑averaged continental‑ancestry proportions; (b) Individual continental‑ancestry proportions in the male individuals of Affymetrix
Human Origins Dataset
Wang, et al.: Y Chromosomal and Autosomal Ancestry
Journal of Forensic Science and Medicine ¦ October 2016 ¦ Volume 2 ¦ Issue 4
Similarly, haplogroup J1a2b was associated with Middle East,
and Q1a1a1 could be assigned into East Asia.
We directly compared the genetic ancestry revealed by
Y-chromosomal haplogroups to those inferred from
genome-wide autosomal SNPs in a worldwide dataset. The
continental-ancestry compositions varied among individuals
of the same Y-chromosomal haplogroup judging from high
SDs. About 70% of the Y-chromosomal haplogroups could
be assigned to be associated with certain continents due to
ancestry proportions of a majority of individuals match
the highest average continental-ancestry proportions in
haplogroups A, B, D, H, I, K, L, T, O, and M. Although the
high consistencies have not been observed in haplogroups E,
C, G, J, N, Q, and R, some of their sublineages, such as E1a,
E1b1a1, E1b1b1b1a, E2b1a, J1a2b, Q1a1a1, Q1a2a1a1,
R1b1a2a1a, and R2 corresponded well with certain continents.
The Y-chromosome seemed like to give higher prediction
accuracy for individual ancestries than the mtDNA.[4] This
phenomenon might be caused by sex-biased migrations,
which refers to a higher female migration rate in human
populations.[13] A series of studies have revealed that the
among-population components of genetic variation are higher
for the Y-chromosome than for the mtDNA, indicating that the
Y-chromosomes tend to be more localized geographically.[13,14]
The Y-chromosome in a way could provide valuable clues for
individual’s continental ancestry, but it probably neglected
many other detailed ancestry information. One, or at most
two, top ancestry components could be well represented
by majority of Y-chromosomal haplogroups, whereas other
ancestry information is lost. For instance, the highest South
Asian ancestry proportions have been detected in individuals
of haplogroup H. Meanwhile, East Asia, Europe, and Middle
East each have contributed more than 10% of genetic
ancestry to many individuals of haplogroup H, which could
addition, the Y‑chromosomal haplogroup classications in
this study were not very informative. The rough assignment
might lose some information of a certain lineage and probably
have resulted in bias conclusion. For example, sublineages
of haplogroup C have distinct geographic distributions.[15]
However, we do not have enough markers in this dataset to
identify the detailed phylogeny of haplogroup C individuals,
resulting in the inconclusive ancestry inference of this
CCW is supported by the Max Planck Society and Harvard
Medical School.
Financial support and sponsorship
Conflicts of interest
1. Jobling MA, Tyler-Smith C. The human Y chromosome: An evolutionary
marker comes of age. Nat Rev Genet 2003;4:598-612.
2. Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, et al. The genetic
legacy of the Mongols. Am J Hum Genet 2003;72:717-21.
3. Shriver MD, Kittles RA. Genetic ancestry and the search for personalized
genetic histories. Nat Rev Genet 2004;5:611-8.
4. Emery LS, Magnaye KM, Bigham AW, Akey JM, Bamshad MJ.
Estimates of continental ancestry vary widely among individuals with
the same mtDNA haplogroup. Am J Hum Genet 2015;96:183-93.
5. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK,
Zhivotovsky LA, et al. Genetic structure of human populations. Science
6. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K,
et al. Ancient human genomes suggest three ancestral populations for
present-day Europeans. Nature 2014;513:409-13.
7. Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, et al.
Ethnic‑afliation estimation by use of population‑specic DNA
markers. Am J Hum Genet 1997;60:957-64.
8. Poetsch M, Wiegand A, Harder M, Blöhm R, Rakotomavo N,
Freitag-Wolf S, et al. Determination of population origin: A comparison
of autosomal SNPs, Y-chromosomal and mtDNA haplogroups using a
Malagasy population as example. Eur J Hum Genet 2013;21:1423-8.
9. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to
population stratication in genome‑wide association studies. Nat Rev
Genet 2010;11:459-63.
10. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA,
Bender D, et al. PLINK: A tool set for whole-genome association and
population-based linkage analyses. Am J Hum Genet 2007;81:559-75.
11. Alexander DH, Novembre J, Lange K. Fast model-based estimation of
ancestry in unrelated individuals. Genome Res 2009;19:1655-64.
12. R Core Team. R: A Language and Environment for Statistical
Computing. Vienna: R Foundation for Statistical Computing; 2013.
13. Seielstad MT, Minch E, Cavalli-Sforza LL. Genetic evidence for a
higher female migration rate in humans. Nat Genet 1998;20:278-80.
14. Lippold S, Xu H, Ko A, Li M, Renaud G, Butthof A, et al. Human paternal
and maternal demographic histories: Insights from high-resolution Y
chromosome and mtDNA sequences. Investig Genet 2014;5:13.
15. Zhong H, Shi H, Qi XB, Xiao CJ, Jin L, Ma RZ, et al. Global distribution
of Y-chromosome haplogroup C reveals the prehistoric migration routes
of African exodus and early settlement in East Asia. J Hum Genet
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Background Comparisons of maternally-inherited mitochondrial DNA (mtDNA) and paternally-inherited non-recombining Y chromosome (NRY) variation have provided important insights into the impact of sex-biased processes (such as migration, residence pattern, and so on) on human genetic variation. However, such comparisons have been limited by the different molecular methods typically used to assay mtDNA and NRY variation (for example, sequencing hypervariable segments of the control region for mtDNA vs. genotyping SNPs and/or STR loci for the NRY). Here, we report a simple capture array method to enrich Illumina sequencing libraries for approximately 500 kb of NRY sequence, which we use to generate NRY sequences from 623 males from 51 populations in the CEPH Human Genome Diversity Panel (HGDP). We also obtained complete mtDNA genome sequences from the same individuals, allowing us to compare maternal and paternal histories free of any ascertainment bias. Results We identified 2,228 SNPs in the NRY sequences and 2,163 SNPs in the mtDNA sequences. Our results confirm the controversial assertion that genetic differences between human populations on a global scale are bigger for the NRY than for mtDNA, although the differences are not as large as previously suggested. More importantly, we find substantial regional variation in patterns of mtDNA versus NRY variation. Model-based simulations indicate very small ancestral effective population sizes (<100) for the out-of-Africa migration as well as for many human populations. We also find that the ratio of female effective population size to male effective population size (Nf/Nm) has been greater than one throughout the history of modern humans, and has recently increased due to faster growth in Nf than Nm. Conclusions The NRY and mtDNA sequences provide new insights into the paternal and maternal histories of human populations, and the methods we introduce here should be widely applicable for further such studies.
Full-text available
We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages.
Full-text available
Y-chromosomal and mitochondrial DNA (mtDNA) polymorphisms have been used for population studies for a long time. However, there is another possibility to define the origin of a population: autosomal single-nucleotide polymorphisms (SNPs) whose allele frequencies differ considerably in different populations. In an attempt to compare the usefulness of these approaches we studied a population from Madagascar using all the three mentioned approaches. Former investigations of Malagasy maternal (mtDNA) and paternal (Y chromosome) lineages have led to the assumption that the Malagasy are an admixed population with an African and Asian-Indonesian heritage. Our additional study demonstrated that more than two-third of the Malagasy investigated showed clearly a West African genotype regarding only the autosomal SNPs despite the fact that 64% had an Asian mtDNA and more than 70% demonstrated an Asian-Indonesian heritage in either mtDNA or Y-chromosomal haplogroup or both. Nonetheless, the admixture of the Malagasy could be confirmed. A clear African or Asian-Indonesian heritage according to all the three DNA approaches investigated was only found in 14% and 1% of male samples, respectively. Not even the European or Northern African influences, detected in 9% of males (Y-chromosomal analysis) and 11% of samples (autosomal SNPs) were consistent. No Malagasy in our samples showed a European or Northern African origin in both categories. So, the analysis of autosomal SNPs could confirm the admixed character of the Malagasy population, even if it pointed to a greater African influence as detectable by Y-chromosomal or mtDNA analysis.European Journal of Human Genetics advance online publication, 24 April 2013; doi:10.1038/ejhg.2013.51.
Full-text available
The regional distribution of an ancient Y-chromosome haplogroup C-M130 (Hg C) in Asia provides an ideal tool of dissecting prehistoric migration events. We identified 465 Hg C individuals out of 4284 males from 140 East and Southeast Asian populations. We genotyped these Hg C individuals using 12 Y-chromosome biallelic markers and 8 commonly used Y-short tandem repeats (Y-STRs), and performed phylogeographic analysis in combination with the published data. The results show that most of the Hg C subhaplogroups have distinct geographical distribution and have undergone long-time isolation, although Hg C individuals are distributed widely across Eurasia. Furthermore, a general south-to-north and east-to-west cline of Y-STR diversity is observed with the highest diversity in Southeast Asia. The phylogeographic distribution pattern of Hg C supports a single coastal 'Out-of-Africa' route by way of the Indian subcontinent, which eventually led to the early settlement of modern humans in mainland Southeast Asia. The northward expansion of Hg C in East Asia started approximately 40 thousand of years ago (KYA) along the coastline of mainland China and reached Siberia approximately 15 KYA and finally made its way to the Americas.
Full-text available
Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.
Full-text available
During the past 10 years, DNA analysis has revolutionized the determination of identity in a forensic context. Statements about the biological identity of two human DNA samples now can be made with complete confidence. Although DNA markers are very powerful for distinguishing among individuals, most offer little power to distinguish ethnicity or to support any statement about the physical characteristics of an individual. Through a search of the literature and of unpublished data on allele frequencies we have identified a panel of population-specific genetic markers that enable robust ethnic-affiliation estimation for major U.S. resident populations. In this report, we identify these loci and present their levels of allele-frequency differential between ethnically defined samples, and we demonstrate, using log-likelihood analysis, that this panel of markers provides significant statistical power for ethnic-affiliation estimation. In addition to their use in forensic ethnic-affiliation estimation, population-specific genetic markers are very useful in both population- and individual-level admixture estimation and in mapping genes by use of the linkage disequilibrium created when populations hybridize.
Full-text available
Mitochondrial DNA and the Y chromosome have been used extensively in the study of modern human origins and other phylogenetic questions, but not in the context of their sex-specific modes of transmission. mtDNA is transmitted exclusively by females, whereas the Y chromosome is passed only among males. As a result, differences in the reproductive output or migration rate of males and females will influence the geographic patterns and relative level of genetic diversity on the Y chromosome, autosomes and mtDNA (ref. 1). We have found that Y chromosome variants tend to be more localized geographically than those of mtDNA and the autosomes. The fraction of variation within human populations for Y chromosome single nucleotide polymorphisms (SNPs) is 35.5%, versus 80-85% for the autosomes and mtDNA (refs 6-8). A higher female than male migration rate (via patrilocality, the tendency for a wife to move into her husband's natal household) explains most of this discrepancy, because diverse Y chromosomes would enter a population at a lower rate than mtDNA or the autosomes. Polygyny may also contribute, but the reduction of variation within populations that we measure for the Y chromosome, relative to the autosomes and mitochondrial DNA, is of such magnitude that differences in the effective population sizes of the sexes alone are insufficient to produce the observation.
The association between a geographical region and an mtDNA haplogroup(s) has provided the basis for using mtDNA haplogroups to infer an individual's place of origin and genetic ancestry. Although it is well known that ancestry inferences using mtDNA haplogroups and those using genome-wide markers are frequently discrepant, little empirical information exists on the magnitude and scope of such discrepancies between multiple mtDNA haplogroups and worldwide populations. We compared genetic-ancestry inferences made by mtDNA-haplogroup membership to those made by autosomal SNPs in ∼940 samples of the Human Genome Diversity Panel and recently admixed populations from the 1000 Genomes Project. Continental-ancestry proportions often varied widely among individuals sharing the same mtDNA haplogroup. For only half of mtDNA haplogroups did the highest average continental-ancestry proportion match the highest continental-ancestry proportion of a majority of individuals with that haplogroup. Prediction of an individual's mtDNA haplogroup from his or her continental-ancestry proportions was often incorrect. Collectively, these results indicate that for most individuals in the worldwide populations sampled, mtDNA-haplogroup membership provides limited information about either continental ancestry or continental region of origin. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Genome-wide association (GWA) studies are an effective approach for identifying genetic variants associated with disease risk. GWA studies can be confounded by population stratification--systematic ancestry differences between cases and controls--which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities.